Context
When a workflow defines inline sub-agents with per-agent model: overrides (e.g. model: claude-haiku-4.5), the "Token Usage" step summary only reports usage against the root/default model. Sub-agent invocations using a different model are silently aggregated into the same row or omitted, making it impossible to audit multi-model workflows accurately.
Analysis
I investigated this with an agent. Here are the findings:
Affected artifact: The "Parse token usage for step summary" step uses the parse_token_usage.cjs action, which reads from agent_usage.json.
Root cause: agent_usage.json appears to contain only aggregated totals without a model dimension. The per-request model attribution is available in firewall-audit-logs/api-proxy-logs/token-usage.jsonl, where each line records the model used for that API call. The parse_token_usage.cjs action does not appear to read this file.
Reproducer: Define a workflow with an inline sub-agent using model: claude-haiku-4.5 (or any model other than the workflow default). After a run, the step summary shows only the root model, with no separate row for the sub-agent's model.
Expected Behavior
Per the Effective Tokens Specification §5.1, each invocation MAY use a different model. Per §8.1, all LLM calls MUST be included in usage accounting. The token summary table should display one row per distinct model used across the workflow run, not just the root model.
Proposed Implementation Plan
I'm not part of the core team — I'm providing this plan for a core team member to implement with an agent.
-
Investigate parse_token_usage.cjs (likely in .github/workflows/ or an actions bundle):
- Confirm it reads
agent_usage.json only
- Confirm
token-usage.jsonl contains a model field per entry
-
Update parse_token_usage.cjs (or the action that calls it):
- After reading
agent_usage.json for the primary totals, additionally parse firewall-audit-logs/api-proxy-logs/token-usage.jsonl
- Group entries by
model, summing input_tokens, cache_creation_tokens, output_tokens, and reasoning_tokens per group
- If a model appears only in
token-usage.jsonl (not in agent_usage.json), include it as an additional row
-
Update the step summary output:
- Render a breakdown table with one row per model (e.g.,
gpt-4o, claude-haiku-4.5)
- Keep the existing "Total" row aggregating all models for backward-compatibility
-
Add tests if a test harness exists for this action:
- Single model: output should match existing behavior
- Two models: output should include two separate rows plus a total
-
Update documentation if the token usage summary is documented anywhere (e.g., docs/):
- Note that multi-model workflows produce a per-model breakdown
Question for the Team
Is this a known limitation or is there an existing tracking issue? I wasn't sure whether to label this as a bug (the summary is incorrect/incomplete) or enhancement (the feature was never designed for multi-model runs). Happy to clarify or refine the plan based on feedback.
Context
When a workflow defines inline sub-agents with per-agent
model:overrides (e.g.model: claude-haiku-4.5), the "Token Usage" step summary only reports usage against the root/default model. Sub-agent invocations using a different model are silently aggregated into the same row or omitted, making it impossible to audit multi-model workflows accurately.Analysis
I investigated this with an agent. Here are the findings:
Affected artifact: The "Parse token usage for step summary" step uses the
parse_token_usage.cjsaction, which reads fromagent_usage.json.Root cause:
agent_usage.jsonappears to contain only aggregated totals without a model dimension. The per-request model attribution is available infirewall-audit-logs/api-proxy-logs/token-usage.jsonl, where each line records the model used for that API call. Theparse_token_usage.cjsaction does not appear to read this file.Reproducer: Define a workflow with an inline sub-agent using
model: claude-haiku-4.5(or any model other than the workflow default). After a run, the step summary shows only the root model, with no separate row for the sub-agent's model.Expected Behavior
Per the Effective Tokens Specification §5.1, each invocation MAY use a different model. Per §8.1, all LLM calls MUST be included in usage accounting. The token summary table should display one row per distinct model used across the workflow run, not just the root model.
Proposed Implementation Plan
Investigate
parse_token_usage.cjs(likely in.github/workflows/or an actions bundle):agent_usage.jsononlytoken-usage.jsonlcontains amodelfield per entryUpdate
parse_token_usage.cjs(or the action that calls it):agent_usage.jsonfor the primary totals, additionally parsefirewall-audit-logs/api-proxy-logs/token-usage.jsonlmodel, summinginput_tokens,cache_creation_tokens,output_tokens, andreasoning_tokensper grouptoken-usage.jsonl(not inagent_usage.json), include it as an additional rowUpdate the step summary output:
gpt-4o,claude-haiku-4.5)Add tests if a test harness exists for this action:
Update documentation if the token usage summary is documented anywhere (e.g.,
docs/):Question for the Team
Is this a known limitation or is there an existing tracking issue? I wasn't sure whether to label this as a
bug(the summary is incorrect/incomplete) orenhancement(the feature was never designed for multi-model runs). Happy to clarify or refine the plan based on feedback.