Skip to content

feat(skills): add cost-aware-llm-pipeline skill#219

Open
shimo4228 wants to merge 1 commit intoaffaan-m:mainfrom
shimo4228:feat/skills/cost-aware-llm-pipeline
Open

feat(skills): add cost-aware-llm-pipeline skill#219
shimo4228 wants to merge 1 commit intoaffaan-m:mainfrom
shimo4228:feat/skills/cost-aware-llm-pipeline

Conversation

@shimo4228
Copy link

@shimo4228 shimo4228 commented Feb 14, 2026

Description

Adds a new skill for cost-optimized LLM API usage. This skill covers four composable patterns:

  • Model Routing: Automatically select cheaper models (Haiku) for simple tasks, reserving expensive models (Sonnet/Opus) for complex ones
  • Immutable Cost Tracking: Track cumulative API spend with frozen dataclasses
  • Narrow Retry Logic: Retry only on transient errors (network, rate limit), fail fast on permanent errors
  • Prompt Caching: Cache long system prompts to reduce token costs

Type of Change

  • feat: New feature

Motivation

There is currently no skill in ECC that addresses LLM API cost optimization. As LLM-powered applications scale, cost control becomes critical. This skill provides battle-tested patterns extracted from production use with the Anthropic API.

Checklist

  • Tests pass locally (node tests/run-all.js)
  • Validation scripts pass
  • Follows conventional commits format
  • Updated relevant documentation
  • Focused on one domain/technology
  • Includes practical code examples
  • Under 500 lines
  • Tested with Claude Code

Summary by CodeRabbit

  • Documentation
    • Added a new skill guide for cost-aware LLM pipeline patterns and best practices, covering model routing by task complexity, cost tracking mechanisms, retry strategies with exponential backoff, and prompt caching to minimize API costs. Includes pricing references and anti-patterns to avoid.

Cost optimization patterns for LLM API usage combining model routing,
budget tracking, retry logic, and prompt caching.
@coderabbitai
Copy link

coderabbitai bot commented Feb 14, 2026

📝 Walkthrough

Walkthrough

Introduces a cost-aware LLM pipeline skill document that provides guidance patterns for controlling API costs through model routing, immutable cost tracking, controlled retries with exponential backoff, and prompt caching strategies. Includes thresholds, function definitions, workflow composition, and best practices.

Changes

Cohort / File(s) Summary
Cost-Aware LLM Pipeline Guidance
skills/cost-aware-llm-pipeline/SKILL.md
New skill documentation defining patterns for cost optimization in LLM pipelines, including model routing by task complexity, immutable cost tracking mechanisms, retry logic with exponential backoff, and prompt caching. Covers activation criteria, control flow, error handling, pricing reference, best practices, and anti-patterns.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Suggested reviewers

  • affaan-m

Poem

🐰 A pipeline so wise, with budgets in mind,
Routes tasks with care, cost patterns refined,
Cache prompts once more, don't repeat the call,
Track every expense, immutably—all!
Our fluffy friend cheers for economies small! ✨

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(skills): add cost-aware-llm-pipeline skill' directly and clearly summarizes the main change: adding a new cost-aware LLM pipeline skill to the codebase.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@skills/cost-aware-llm-pipeline/SKILL.md`:
- Around line 153-160: Update the "Pricing Reference (2025-2026)" table rows for
Haiku 4.5, Sonnet 4.5, and Opus 4.5 in SKILL.md: change Haiku 4.5 rates to Input
$1.00 / Output $5.00; change Sonnet 4.5 to Input $3.00 / Output $15.00 for ≤200K
context (and note the alternate $6.00 / $22.50 for >200K context) and update its
Relative Cost to ≈3x vs Haiku; change Opus 4.5 rates to Input $5.00 / Output
$25.00 and set its Relative Cost to ≈5x vs Haiku; keep the table header and
formatting intact and ensure the Sonnet large-context footnote or parenthetical
is clearly indicated next to the Sonnet 4.5 row.
🧹 Nitpick comments (3)
skills/cost-aware-llm-pipeline/SKILL.md (3)

74-75: Consider clarifying budget boundary behavior.

The over_budget property uses > which allows spending exactly the budget limit. If this is intentional, consider adding a docstring or comment to clarify that meeting the budget exactly is acceptable. If you want to prevent reaching the limit, use >= instead.


147-147: Consider showing the cost calculation.

While the ellipsis placeholders are acceptable for documentation, it would be helpful to show or reference how to calculate cost_usd from input_tokens and output_tokens using the pricing table. This would make the example more complete and actionable.

Example cost calculation
# Example cost calculation based on pricing table
def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
    """Calculate cost in USD based on model and token counts."""
    pricing = {
        MODEL_HAIKU: (0.80 / 1_000_000, 4.00 / 1_000_000),    # input, output per token
        MODEL_SONNET: (3.00 / 1_000_000, 15.00 / 1_000_000),
    }
    input_rate, output_rate = pricing.get(model, (0, 0))
    return (input_tokens * input_rate) + (output_tokens * output_rate)

# Then in line 147:
record = CostRecord(
    model=model, 
    input_tokens=response.usage.input_tokens,
    output_tokens=response.usage.output_tokens,
    cost_usd=calculate_cost(model, response.usage.input_tokens, response.usage.output_tokens)
)

92-102: Move the comment about non-retryable errors into the docstring.

The code after line 100 is unreachable because the function always returns, raises, or propagates an exception within the loop. The comment on line 101 documents important behavior but should be placed in the docstring instead to avoid unreachable code.

The Anthropic SDK error classes (APIConnectionError, RateLimitError, InternalServerError) are correct and properly imported.

♻️ Suggested improvement
 def call_with_retry(func, *, max_retries: int = _MAX_RETRIES):
-    """Retry only on transient errors, fail fast on others."""
+    """Retry only on transient errors, fail fast on others.
+    
+    Non-retryable errors (AuthenticationError, BadRequestError, etc.) 
+    will raise immediately.
+    """
     for attempt in range(max_retries):
         try:
             return func()
         except _RETRYABLE_ERRORS:
             if attempt == max_retries - 1:
                 raise
             time.sleep(2 ** attempt)  # Exponential backoff

Comment on lines +153 to +160
## Pricing Reference (2025-2026)

| Model | Input ($/1M tokens) | Output ($/1M tokens) | Relative Cost |
|-------|---------------------|----------------------|---------------|
| Haiku 4.5 | $0.80 | $4.00 | 1x |
| Sonnet 4.5 | $3.00 | $15.00 | ~4x |
| Opus 4.5 | $15.00 | $75.00 | ~19x |

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Anthropic Claude Haiku 4.5 Sonnet 4.5 Opus 4.5 pricing 2025 2026 per million tokens

💡 Result:

Anthropic Claude 4.5 API pricing (USD) — per 1M tokens (2025–2026)

Model Input ($/1M) Output ($/1M)
Claude Haiku 4.5 $1 $5 [1][2]
Claude Sonnet 4.5 $3 (≤200K prompt) / $6 (>200K prompt) $15 (≤200K prompt) / $22.50 (>200K prompt) [1][3]
Claude Opus 4.5 $5 $25 [1]

Sources: Anthropic/Claude pricing page and model pages. [1][2][3]


Update pricing table with 2025-2026 rates.

The pricing information is outdated. Current 2025-2026 rates for Anthropic Claude models are:

  • Haiku 4.5: $1.00 / $5.00 (not $0.80 / $4.00)
  • Sonnet 4.5: $3.00 / $15.00 for ≤200K context (or $6.00 / $22.50 for >200K context)
  • Opus 4.5: $5.00 / $25.00 (not $15.00 / $75.00)

Update the table and recalculate relative costs (Sonnet ≈3x, Opus ≈5x vs. Haiku, not 4x and 19x).

🤖 Prompt for AI Agents
In `@skills/cost-aware-llm-pipeline/SKILL.md` around lines 153 - 160, Update the
"Pricing Reference (2025-2026)" table rows for Haiku 4.5, Sonnet 4.5, and Opus
4.5 in SKILL.md: change Haiku 4.5 rates to Input $1.00 / Output $5.00; change
Sonnet 4.5 to Input $3.00 / Output $15.00 for ≤200K context (and note the
alternate $6.00 / $22.50 for >200K context) and update its Relative Cost to ≈3x
vs Haiku; change Opus 4.5 rates to Input $5.00 / Output $25.00 and set its
Relative Cost to ≈5x vs Haiku; keep the table header and formatting intact and
ensure the Sonnet large-context footnote or parenthetical is clearly indicated
next to the Sonnet 4.5 row.

@affaan-m
Copy link
Owner

[openclaw-bot:pr-review]

Automated Review - CI checks are passing! ✅

Hi @shimo4228, thanks for this contribution! All CI checks have passed successfully.

A maintainer will review this PR shortly. In the meantime, please ensure:

  • The PR description explains the changes
  • Tests cover the new/modified functionality
  • No breaking changes (or they're documented)

This is an automated review from OpenClaw.

Copy link
Owner

@affaan-m affaan-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review: doc-only changes look good. Approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants