feat(skills): add cost-aware-llm-pipeline skill#219
feat(skills): add cost-aware-llm-pipeline skill#219shimo4228 wants to merge 1 commit intoaffaan-m:mainfrom
Conversation
Cost optimization patterns for LLM API usage combining model routing, budget tracking, retry logic, and prompt caching.
📝 WalkthroughWalkthroughIntroduces a cost-aware LLM pipeline skill document that provides guidance patterns for controlling API costs through model routing, immutable cost tracking, controlled retries with exponential backoff, and prompt caching strategies. Includes thresholds, function definitions, workflow composition, and best practices. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@skills/cost-aware-llm-pipeline/SKILL.md`:
- Around line 153-160: Update the "Pricing Reference (2025-2026)" table rows for
Haiku 4.5, Sonnet 4.5, and Opus 4.5 in SKILL.md: change Haiku 4.5 rates to Input
$1.00 / Output $5.00; change Sonnet 4.5 to Input $3.00 / Output $15.00 for ≤200K
context (and note the alternate $6.00 / $22.50 for >200K context) and update its
Relative Cost to ≈3x vs Haiku; change Opus 4.5 rates to Input $5.00 / Output
$25.00 and set its Relative Cost to ≈5x vs Haiku; keep the table header and
formatting intact and ensure the Sonnet large-context footnote or parenthetical
is clearly indicated next to the Sonnet 4.5 row.
🧹 Nitpick comments (3)
skills/cost-aware-llm-pipeline/SKILL.md (3)
74-75: Consider clarifying budget boundary behavior.The
over_budgetproperty uses>which allows spending exactly the budget limit. If this is intentional, consider adding a docstring or comment to clarify that meeting the budget exactly is acceptable. If you want to prevent reaching the limit, use>=instead.
147-147: Consider showing the cost calculation.While the ellipsis placeholders are acceptable for documentation, it would be helpful to show or reference how to calculate
cost_usdfrominput_tokensandoutput_tokensusing the pricing table. This would make the example more complete and actionable.Example cost calculation
# Example cost calculation based on pricing table def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float: """Calculate cost in USD based on model and token counts.""" pricing = { MODEL_HAIKU: (0.80 / 1_000_000, 4.00 / 1_000_000), # input, output per token MODEL_SONNET: (3.00 / 1_000_000, 15.00 / 1_000_000), } input_rate, output_rate = pricing.get(model, (0, 0)) return (input_tokens * input_rate) + (output_tokens * output_rate) # Then in line 147: record = CostRecord( model=model, input_tokens=response.usage.input_tokens, output_tokens=response.usage.output_tokens, cost_usd=calculate_cost(model, response.usage.input_tokens, response.usage.output_tokens) )
92-102: Move the comment about non-retryable errors into the docstring.The code after line 100 is unreachable because the function always returns, raises, or propagates an exception within the loop. The comment on line 101 documents important behavior but should be placed in the docstring instead to avoid unreachable code.
The Anthropic SDK error classes (
APIConnectionError,RateLimitError,InternalServerError) are correct and properly imported.♻️ Suggested improvement
def call_with_retry(func, *, max_retries: int = _MAX_RETRIES): - """Retry only on transient errors, fail fast on others.""" + """Retry only on transient errors, fail fast on others. + + Non-retryable errors (AuthenticationError, BadRequestError, etc.) + will raise immediately. + """ for attempt in range(max_retries): try: return func() except _RETRYABLE_ERRORS: if attempt == max_retries - 1: raise time.sleep(2 ** attempt) # Exponential backoff
| ## Pricing Reference (2025-2026) | ||
|
|
||
| | Model | Input ($/1M tokens) | Output ($/1M tokens) | Relative Cost | | ||
| |-------|---------------------|----------------------|---------------| | ||
| | Haiku 4.5 | $0.80 | $4.00 | 1x | | ||
| | Sonnet 4.5 | $3.00 | $15.00 | ~4x | | ||
| | Opus 4.5 | $15.00 | $75.00 | ~19x | | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Anthropic Claude Haiku 4.5 Sonnet 4.5 Opus 4.5 pricing 2025 2026 per million tokens
💡 Result:
Anthropic Claude 4.5 API pricing (USD) — per 1M tokens (2025–2026)
| Model | Input ($/1M) | Output ($/1M) |
|---|---|---|
| Claude Haiku 4.5 | $1 | $5 [1][2] |
| Claude Sonnet 4.5 | $3 (≤200K prompt) / $6 (>200K prompt) | $15 (≤200K prompt) / $22.50 (>200K prompt) [1][3] |
| Claude Opus 4.5 | $5 | $25 [1] |
Sources: Anthropic/Claude pricing page and model pages. [1][2][3]
Update pricing table with 2025-2026 rates.
The pricing information is outdated. Current 2025-2026 rates for Anthropic Claude models are:
- Haiku 4.5: $1.00 / $5.00 (not $0.80 / $4.00)
- Sonnet 4.5: $3.00 / $15.00 for ≤200K context (or $6.00 / $22.50 for >200K context)
- Opus 4.5: $5.00 / $25.00 (not $15.00 / $75.00)
Update the table and recalculate relative costs (Sonnet ≈3x, Opus ≈5x vs. Haiku, not 4x and 19x).
🤖 Prompt for AI Agents
In `@skills/cost-aware-llm-pipeline/SKILL.md` around lines 153 - 160, Update the
"Pricing Reference (2025-2026)" table rows for Haiku 4.5, Sonnet 4.5, and Opus
4.5 in SKILL.md: change Haiku 4.5 rates to Input $1.00 / Output $5.00; change
Sonnet 4.5 to Input $3.00 / Output $15.00 for ≤200K context (and note the
alternate $6.00 / $22.50 for >200K context) and update its Relative Cost to ≈3x
vs Haiku; change Opus 4.5 rates to Input $5.00 / Output $25.00 and set its
Relative Cost to ≈5x vs Haiku; keep the table header and formatting intact and
ensure the Sonnet large-context footnote or parenthetical is clearly indicated
next to the Sonnet 4.5 row.
|
[openclaw-bot:pr-review] Automated Review - CI checks are passing! ✅ Hi @shimo4228, thanks for this contribution! All CI checks have passed successfully. A maintainer will review this PR shortly. In the meantime, please ensure:
This is an automated review from OpenClaw. |
affaan-m
left a comment
There was a problem hiding this comment.
Automated review: doc-only changes look good. Approving.
Description
Adds a new skill for cost-optimized LLM API usage. This skill covers four composable patterns:
Type of Change
feat:New featureMotivation
There is currently no skill in ECC that addresses LLM API cost optimization. As LLM-powered applications scale, cost control becomes critical. This skill provides battle-tested patterns extracted from production use with the Anthropic API.
Checklist
node tests/run-all.js)Summary by CodeRabbit