Skip to content

fix(chat): merge consecutive reasoning steps into one pill#724

Open
blove wants to merge 2 commits into
mainfrom
fix/merge-reasoning-pills
Open

fix(chat): merge consecutive reasoning steps into one pill#724
blove wants to merge 2 commits into
mainfrom
fix/merge-reasoning-pills

Conversation

@blove

@blove blove commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

What

A multi-step agent turn (reason → tool → reason → tool → answer) rendered a stack of separate "Thought for 1s" pills — one per assistant reasoning message, with the hidden tool messages between them taking no visible space. The wall of near-identical chips was noisy and pushed the actual answer down.

This collapses a reasoning run — a maximal sequence of consecutive assistant reasoning steps separated only by hidden tool messages — into a single pill rendered once at the run's first step:

Thought for 5s · 3 steps

Expanding it shows the joined reasoning from every step. Single-step turns are unchanged: the [label] falls back to the existing Thought for {duration} text when N == 1.

How

  • chat.component.ts: template renders the reasoning pill only at a run's start (reasoningRunStart(i)), fed by an aggregated reasoningRun(i) (joins content, sums reasoningDurationMs, counts steps, derives the streaming flag + merged label).
  • Reuses the existing chat-reasoning [label] input and its resolvedLabel() fallback — no primitive changes.

Verification

  • nx test chat green; nx build chat green.
  • Live-verified against the examples/ag-ui research-subagent flow (reason → subagent → reason → answer): three "Thought for 1s" chips collapsed into one "Thought for 5s · 3 steps" pill.

🤖 Generated with Claude Code

A multi-step agent turn (reason → tool → reason → tool → answer) produced
a stack of separate "Thought for 1s" pills — one per assistant reasoning
message, with the hidden tool messages between them. This collapses a
reasoning RUN (a maximal sequence of consecutive assistant reasoning
steps separated only by hidden tool messages) into a single pill rendered
at the run's first step: "Thought for {total} · {N} steps", expandable to
the joined reasoning. Single-step turns keep the normal
"Thought for {duration}" pill (label falls back when N == 1).

Adds reasoningRunStart()/reasoningRun() helpers on ChatComponent; uses the
existing chat-reasoning [label] input + resolvedLabel() fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
threadplane Ignored Ignored Preview Jun 20, 2026 5:25am

Request Review

@blove blove enabled auto-merge (squash) June 20, 2026 05:18

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval: this PR received an intelligent (AI) code review. See the review comments on this PR.

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Claude finished @blove's task in 3m 8s —— View job


Code Review

  • Read CLAUDE.md for repo guidelines
  • Read the PR diff
  • Review the changed file in detail
  • Post findings

Summary

The approach is sound. reasoningRunStart / reasoningRun correctly identify run boundaries (skipping hidden tool messages, breaking on any non-tool non-reasoning message), the streaming flag is wired to the last step in the run, and the chat-reasoning [label] override cleanly handles the merged-pill text without touching the primitive.

Two issues flagged inline:

1. Missing unit tests (moderate) — reasoningRunStart and reasoningRun have non-trivial branching (tool-message gaps, streaming detection, duration summation, N==1 vs N>1 label) but no coverage in chat.component.spec.ts. Given this is purely class-method logic, tests don't require template compilation.

2. formatDuration(0) fallback in multi-step label (low) — when all steps have reasoningDurationMs: undefined, the label reads "Thought for <1s · N steps" because durationMs ?? 0 passes 0 to formatDuration. <1s implies speed rather than absent data. Suggested fix in the inline comment.

Everything else checks out:

  • prevAssistant correctly handles human/non-assistant interruptions between runs.
  • reasoningRun collected from the run-start index prevents double-rendering on non-start steps.
  • isReasoningStreaming reuse is correct: during active streaming, the last step in the run is the tail message.
  • No Angular subscription leaks, no unsafe casts, no public API regressions.

? `Thought for ${formatDuration(durationMs ?? 0)} · ${steps.length} steps`
: undefined;
return { content, durationMs, streaming, label };
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coveragereasoningRunStart and reasoningRun are the core of this feature but have no unit tests. The logic has several non-trivial branches worth covering:

  • Single reasoning step (no merge) → label is undefined, streaming comes from isReasoningStreaming.
  • Two steps separated by a tool message → pill merges both.
  • Run ends when a non-reasoning, non-tool message follows.
  • All reasoningDurationMs values are undefineddurationMs stays undefined, label falls back to formatDuration(0) = "<1s".
  • First step is the tail (streaming) vs last step is the tail.

These are good candidates for straight unit tests on the class methods (no template compile needed).

Fix this →

const label =
steps.length > 1
? `Thought for ${formatDuration(durationMs ?? 0)} · ${steps.length} steps`
: undefined;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When every step has reasoningDurationMs: undefined, durationMs is undefined here, so durationMs ?? 0 passes 0 to formatDuration, producing "Thought for <1s · N steps"<1s implies it was fast, but the real meaning is "no timing data". Consider omitting the duration portion when unknown:

Suggested change
: undefined;
? steps.length > 1 && durationMs !== undefined
? `Thought for ${formatDuration(durationMs)} · ${steps.length} steps`
: steps.length > 1
? `${steps.length} steps`
: undefined
: undefined;

Or at minimum document the fallback intent with a comment. (Low severity — timing data is almost always present in practice.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant