Skip to content

feat(streaming): add reasoning events for OpenAI and Anthropic#174

Merged
JackChen-me merged 4 commits into
open-multi-agent:mainfrom
SiMinus:feat/stream-reasoning-content
Apr 28, 2026
Merged

feat(streaming): add reasoning events for OpenAI and Anthropic#174
JackChen-me merged 4 commits into
open-multi-agent:mainfrom
SiMinus:feat/stream-reasoning-content

Conversation

@SiMinus
Copy link
Copy Markdown
Contributor

@SiMinus SiMinus commented Apr 25, 2026

Implemented native reasoning-streaming support with a minimal, explicit extension of the streaming contract.

What changed

Area Change
Streaming contract Added a new reasoning event to StreamEvent, so reasoning deltas are no longer conflated with normal text.
Response model Added a ReasoningBlock content type and retained reasoning in LLMResponse.content instead of exposing it only during streaming.
OpenAI streaming Updated OpenAIAdapter.stream() to map delta.reasoning_content to reasoning events and preserve the full reasoning trace in the final done response.
OpenAI non-streaming Updated OpenAI completion parsing so non-streaming responses can also retain reasoning_content.
Anthropic mapping Mapped Anthropic thinking / thinking_delta into the same unified reasoning event/block shape.
Cross-provider serialization Updated message serialization paths so retained reasoning can safely round-trip through provider adapters.
Runner output Updated the runner text extraction path to render retained reasoning as <think>...</think> instead of dropping it.

Tests

  • npm run lint
  • npm test -- tests/openai-adapter.test.ts tests/anthropic-adapter.test.ts
  • npm test (48 test files, 688 tests)

My solutions to open design questions

  • LLMResponse.content: I retained reasoning in the final response as a separate ReasoningBlock, rather than surfacing it only during streaming.
  • Anthropic thinking: I mapped Anthropic thinking / thinking_delta and OpenAI-compatible reasoning_content into one unified reasoning event/block, not two provider-specific event types.
  • maxTokenBudget: In this implementation, retained reasoning is carried forward as part of the conversation context, so it effectively counts toward later token budget usage; this is an implementation consequence, not a separately designed budgeting rule.

Copy link
Copy Markdown
Member

@JackChen-me JackChen-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two regressions before this can merge.

  1. runner.ts:195 extractText prepends <think>...</think> to the final text. With outputSchema, that flows into extractJSON's case-3 fallback (indexOf('{') to lastIndexOf('}')), which grabs the wrong span whenever the reasoning contains {. Either skip reasoning blocks in extractText or use a text-only extractor for the structured path. No test covers the combo.

  2. fromAnthropicContentBlock and the content_block_delta switch use (block as { ... }).field casts. Unnecessary: SDK 0.52 already has ThinkingBlock and ThinkingDelta in the relevant unions (messages.d.ts:138, 326), so case 'thinking' / case 'thinking_delta' on the original typed switches narrows correctly. The default branch also loses exhaustiveness as written.

Side note: ThinkingBlock.signature is dropped on the way in, which breaks extended-thinking continuation. OK to handle as a follow-up.

JackChen-me added a commit that referenced this pull request Apr 25, 2026
- Restructure flow, sync README_zh, document quantized sampling.
- Refresh positioning, branding, and section layout (logo, banners, github social card).
- Polish Quick Start and consolidate provider tables.
- Drop forward reference to PR #174 in features bullet.
- Rework structure, factual fixes, add Featured Partner doc.
- Fix brand asset rendering: light banner no longer leaks the dark body background; dark banner and github-social regain previously-clipped content (terminal-log arrows, "★ open source" badge). Root cause: overflow:hidden on stage ancestors clipping the forced 1280px target during extraction.
- Replace fabricated token figures (12,847t / 8,402t / 3,914t) on the banner right column with action labels (design / impl / parallel / review). The numbers conflicted with how totalTokenUsage actually aggregates (input+output, includes coordinator decompose+synthesize, merges same-name agents).
@JackChen-me
Copy link
Copy Markdown
Member

@codex

@SiMinus
Copy link
Copy Markdown
Contributor Author

SiMinus commented Apr 27, 2026

Thanks for the review. I addressed both blocking regressions.

Changes made:

Review item Fix
extractText prepended <think>...</think> into final output Updated AgentRunner text extraction to ignore ReasoningBlock and only join TextBlocks, so structured-output parsing no longer sees reasoning content.
Missing structured-output coverage Added a regression test where reasoning contains a misleading { and the final text contains valid JSON. The structured output path now validates the JSON correctly.
Anthropic casts bypassed SDK narrowing Removed the unnecessary (block as ...) / (delta as ...) casts and switched directly on block.type / delta.type, restoring SDK union narrowing and exhaustiveness behavior.

I left ThinkingBlock.signature out of this PR since it was called out as acceptable follow-up work.

Validation:

  • npm test -- tests/structured-output.test.ts tests/anthropic-adapter.test.ts tests/openai-adapter.test.ts
  • npm run lint
  • npm test (48 test files, 689 tests)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 102740913e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/llm/openai-common.ts
Comment on lines +225 to +228
const reasoningText = getOpenAIReasoningText(message)
if (reasoningText.length > 0) {
const reasoningBlock: ReasoningBlock = { type: 'reasoning', text: reasoningText }
content.push(reasoningBlock)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Count reasoning blocks in token-budget estimation

This change persists reasoning text in LLMResponse.content, but token estimation still ignores reasoning blocks (it only counts text, tool_result, tool_use, and image). In runs that use context compaction/truncation based on estimateTokens, long reasoning traces will be undercounted, so compaction can be skipped until the next provider call fails on context length. Please include reasoning text in token estimation so retained reasoning actually participates in budget control.

Useful? React with 👍 / 👎.

@SiMinus
Copy link
Copy Markdown
Contributor Author

SiMinus commented Apr 27, 2026

Implemented the review follow-ups for native reasoning support.

This update keeps the original reasoning streaming design, while fixing the merge-blocking regressions and token-estimation gap:

Area Update
Structured output AgentRunner now excludes ReasoningBlock from final text extraction, so reasoning traces cannot interfere with outputSchema JSON parsing.
Regression coverage Added a structured-output test where reasoning contains a misleading { and final text contains valid JSON.
Anthropic adapter Removed unnecessary casts and restored SDK union narrowing for thinking / thinking_delta.
Token estimation estimateTokens() now counts retained reasoning text so context compaction and summarization thresholds account for reasoning traces.
Token test coverage Added a focused test for reasoning token estimation.

Validation:

  • npm run lint
  • npm test -- tests/structured-output.test.ts tests/anthropic-adapter.test.ts tests/openai-adapter.test.ts
  • npm test -- tests/tokens.test.ts tests/context-strategy.test.ts tests/structured-output.test.ts
  • npm test (49 test files, 690 tests)

@JackChen-me
Copy link
Copy Markdown
Member

Thanks, the latest round addresses the structured-output and SDK-narrowing regressions cleanly. One outbound-serialization concern before merge.

anthropic.ts:67, openai-common.ts:143/179, and gemini.ts:88 wrap reasoning text as <think>...</think> and push it back into provider history as ordinary assistant/user text. The content was real, but the format is non-roundtrip: what the model emitted (e.g. an Anthropic ThinkingBlock plus TextBlock) gets flattened into a single text block on the next turn, and the model sees a different shape than what it actually produced.

Concrete costs:

  • Context pollution: <think>...</think> becomes part of assistant text on subsequent turns, where the model has no contract that says "ignore this region."
  • Token waste: reasoning traces are typically the largest part of a turn, and they now ride along on every subsequent request.
  • Blocks the future Anthropic native path: extended-thinking continuation across tool-use turns expects ThinkingBlockParam (with signature) in history, not text. Once we want that, the text-wrap codepath has to come out anyway.

Simplest fix: skip reasoning blocks when serializing outbound. Stream events still fire, LLMResponse.content still retains reasoning for consumers, history just stops carrying it.

// anthropic.ts toAnthropicContentBlockParam
-    case 'reasoning': {
-      const param: TextBlockParam = { type: 'text', text: `<think>${block.text}</think>` }
-      return param
-    }

Same delete in toOpenAIUserMessage, toOpenAIAssistantMessage, and toGeminiContents (and the toThinkTag helper goes with them). Anthropic native thinking continuation (with signature) is a separate follow-up. Main currently degrades thinking blocks to a fallback text block anyway, so this PR doesn't introduce a regression on that front.

One caveat worth flagging: local <think>-tag models (DeepSeek R1, QwQ) emit reasoning as part of their normal text output, not as a separate channel, so they're unaffected by the outbound rule above; their tags will keep appearing in text blocks the way they do today. If we want a different policy for those later, it can be a separate issue.


Process note for next time, not blocking this PR: #169 explicitly says "not a green light to implement until the StreamEvent extension is designed," and lists three open design questions. This PR ended up answering all three (retain reasoning in LLMResponse.content, unify Anthropic + OpenAI into one event/block, count toward token budget). The answers themselves are reasonable, but the agreement should land in the issue before the implementation PR opens, so reviewers can push back on the design separately from the code. I'll split those into their own issues so they're tracked, not buried in this thread:

  • Retain reasoning in LLMResponse.content vs. stream-only
  • Unify or split Anthropic thinking vs. OpenAI reasoning_content block types
  • Token budget treatment for reasoning text

Drop those four cases and this is good to go.

@SiMinus
Copy link
Copy Markdown
Contributor Author

SiMinus commented Apr 28, 2026

Thanks, addressed the outbound serialization concern.

What changed:

Area Update
Anthropic outbound history ReasoningBlock is now filtered out before converting framework messages to Anthropic message content.
OpenAI outbound history Removed the <think>...</think> wrapping path from both user and assistant message serialization.
Gemini outbound history ReasoningBlock is now filtered out before converting framework messages to Gemini parts.
Preserved behavior reasoning stream events still fire, LLMResponse.content still retains ReasoningBlock, and token estimation still counts retained reasoning.

This means provider history no longer gets polluted with reasoning text serialized as ordinary assistant/user text. Local models that emit literal <think> tags inside normal text blocks are unchanged, as noted.

Validation:

  • npm run lint
  • npm test -- tests/openai-adapter.test.ts tests/anthropic-adapter.test.ts tests/gemini-adapter-contract.test.ts tests/tokens.test.ts tests/structured-output.test.ts
  • npm test (49 test files, 690 tests)

@JackChen-me
Copy link
Copy Markdown
Member

LGTM, merging.

@JackChen-me JackChen-me merged commit dccd4eb into open-multi-agent:main Apr 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants