Token usage step summary does not break down usage by model when inline sub-agents use model overrides

### Context

When a workflow defines inline sub-agents with per-agent `model:` overrides (e.g. `model: claude-haiku-4.5`), the **"Token Usage"** step summary only reports usage against the root/default model. Sub-agent invocations using a different model are silently aggregated into the same row or omitted, making it impossible to audit multi-model workflows accurately.

### Analysis

I investigated this with an agent. Here are the findings:

**Affected artifact**: The "Parse token usage for step summary" step uses the `parse_token_usage.cjs` action, which reads from `agent_usage.json`.

**Root cause**: `agent_usage.json` appears to contain only aggregated totals without a model dimension. The per-request model attribution is available in `firewall-audit-logs/api-proxy-logs/token-usage.jsonl`, where each line records the model used for that API call. The `parse_token_usage.cjs` action does not appear to read this file.

**Reproducer**: Define a workflow with an inline sub-agent using `model: claude-haiku-4.5` (or any model other than the workflow default). After a run, the step summary shows only the root model, with no separate row for the sub-agent's model.

### Expected Behavior

Per the [Effective Tokens Specification &sect;5.1](https://github.github.com/gh-aw/reference/effective-tokens-specification/), each invocation MAY use a different model. Per &sect;8.1, all LLM calls MUST be included in usage accounting. The token summary table should display one row per distinct model used across the workflow run, not just the root model.

### Proposed Implementation Plan

> I'm not part of the core team &mdash; I'm providing this plan for a core team member to implement with an agent.

1. **Investigate `parse_token_usage.cjs`** (likely in `.github/workflows/` or an actions bundle):
   - Confirm it reads `agent_usage.json` only
   - Confirm `token-usage.jsonl` contains a `model` field per entry

2. **Update `parse_token_usage.cjs`** (or the action that calls it):
   - After reading `agent_usage.json` for the primary totals, additionally parse `firewall-audit-logs/api-proxy-logs/token-usage.jsonl`
   - Group entries by `model`, summing `input_tokens`, `cache_creation_tokens`, `output_tokens`, and `reasoning_tokens` per group
   - If a model appears only in `token-usage.jsonl` (not in `agent_usage.json`), include it as an additional row

3. **Update the step summary output**:
   - Render a breakdown table with one row per model (e.g., `gpt-4o`, `claude-haiku-4.5`)
   - Keep the existing "Total" row aggregating all models for backward-compatibility

4. **Add tests** if a test harness exists for this action:
   - Single model: output should match existing behavior
   - Two models: output should include two separate rows plus a total

5. **Update documentation** if the token usage summary is documented anywhere (e.g., `docs/`):
   - Note that multi-model workflows produce a per-model breakdown

### Question for the Team

Is this a known limitation or is there an existing tracking issue? I wasn't sure whether to label this as a `bug` (the summary is incorrect/incomplete) or `enhancement` (the feature was never designed for multi-model runs). Happy to clarify or refine the plan based on feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token usage step summary does not break down usage by model when inline sub-agents use model overrides #31501

Context

Analysis

Expected Behavior

Proposed Implementation Plan

Question for the Team

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Token usage step summary does not break down usage by model when inline sub-agents use model overrides #31501

Description

Context

Analysis

Expected Behavior

Proposed Implementation Plan

Question for the Team

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions