Skip to content

Token usage step summary does not break down usage by model when inline sub-agents use model overrides #31501

@ryckmansm

Description

@ryckmansm

Context

When a workflow defines inline sub-agents with per-agent model: overrides (e.g. model: claude-haiku-4.5), the "Token Usage" step summary only reports usage against the root/default model. Sub-agent invocations using a different model are silently aggregated into the same row or omitted, making it impossible to audit multi-model workflows accurately.

Analysis

I investigated this with an agent. Here are the findings:

Affected artifact: The "Parse token usage for step summary" step uses the parse_token_usage.cjs action, which reads from agent_usage.json.

Root cause: agent_usage.json appears to contain only aggregated totals without a model dimension. The per-request model attribution is available in firewall-audit-logs/api-proxy-logs/token-usage.jsonl, where each line records the model used for that API call. The parse_token_usage.cjs action does not appear to read this file.

Reproducer: Define a workflow with an inline sub-agent using model: claude-haiku-4.5 (or any model other than the workflow default). After a run, the step summary shows only the root model, with no separate row for the sub-agent's model.

Expected Behavior

Per the Effective Tokens Specification §5.1, each invocation MAY use a different model. Per §8.1, all LLM calls MUST be included in usage accounting. The token summary table should display one row per distinct model used across the workflow run, not just the root model.

Proposed Implementation Plan

I'm not part of the core team — I'm providing this plan for a core team member to implement with an agent.

  1. Investigate parse_token_usage.cjs (likely in .github/workflows/ or an actions bundle):

    • Confirm it reads agent_usage.json only
    • Confirm token-usage.jsonl contains a model field per entry
  2. Update parse_token_usage.cjs (or the action that calls it):

    • After reading agent_usage.json for the primary totals, additionally parse firewall-audit-logs/api-proxy-logs/token-usage.jsonl
    • Group entries by model, summing input_tokens, cache_creation_tokens, output_tokens, and reasoning_tokens per group
    • If a model appears only in token-usage.jsonl (not in agent_usage.json), include it as an additional row
  3. Update the step summary output:

    • Render a breakdown table with one row per model (e.g., gpt-4o, claude-haiku-4.5)
    • Keep the existing "Total" row aggregating all models for backward-compatibility
  4. Add tests if a test harness exists for this action:

    • Single model: output should match existing behavior
    • Two models: output should include two separate rows plus a total
  5. Update documentation if the token usage summary is documented anywhere (e.g., docs/):

    • Note that multi-model workflows produce a per-model breakdown

Question for the Team

Is this a known limitation or is there an existing tracking issue? I wasn't sure whether to label this as a bug (the summary is incorrect/incomplete) or enhancement (the feature was never designed for multi-model runs). Happy to clarify or refine the plan based on feedback.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions