feat(streaming): add reasoning events for OpenAI and Anthropic by SiMinus · Pull Request #174 · open-multi-agent/open-multi-agent

SiMinus · 2026-04-25T10:12:26Z

Implemented native reasoning-streaming support with a minimal, explicit extension of the streaming contract.

What changed

Area	Change
Streaming contract	Added a new `reasoning` event to `StreamEvent`, so reasoning deltas are no longer conflated with normal text.
Response model	Added a `ReasoningBlock` content type and retained reasoning in `LLMResponse.content` instead of exposing it only during streaming.
OpenAI streaming	Updated `OpenAIAdapter.stream()` to map `delta.reasoning_content` to `reasoning` events and preserve the full reasoning trace in the final `done` response.
OpenAI non-streaming	Updated OpenAI completion parsing so non-streaming responses can also retain `reasoning_content`.
Anthropic mapping	Mapped Anthropic `thinking` / `thinking_delta` into the same unified `reasoning` event/block shape.
Cross-provider serialization	Updated message serialization paths so retained reasoning can safely round-trip through provider adapters.
Runner output	Updated the runner text extraction path to render retained reasoning as `<think>...</think>` instead of dropping it.

Tests

✅ npm run lint
✅ npm test -- tests/openai-adapter.test.ts tests/anthropic-adapter.test.ts
✅ npm test (48 test files, 688 tests)

My solutions to open design questions

LLMResponse.content: I retained reasoning in the final response as a separate ReasoningBlock, rather than surfacing it only during streaming.
Anthropic thinking: I mapped Anthropic thinking / thinking_delta and OpenAI-compatible reasoning_content into one unified reasoning event/block, not two provider-specific event types.
maxTokenBudget: In this implementation, retained reasoning is carried forward as part of the conversation context, so it effectively counts toward later token budget usage; this is an implementation consequence, not a separately designed budgeting rule.

JackChen-me

Two regressions before this can merge.

runner.ts:195 extractText prepends <think>...</think> to the final text. With outputSchema, that flows into extractJSON's case-3 fallback (indexOf('{') to lastIndexOf('}')), which grabs the wrong span whenever the reasoning contains {. Either skip reasoning blocks in extractText or use a text-only extractor for the structured path. No test covers the combo.
fromAnthropicContentBlock and the content_block_delta switch use (block as { ... }).field casts. Unnecessary: SDK 0.52 already has ThinkingBlock and ThinkingDelta in the relevant unions (messages.d.ts:138, 326), so case 'thinking' / case 'thinking_delta' on the original typed switches narrows correctly. The default branch also loses exhaustiveness as written.

Side note: ThinkingBlock.signature is dropped on the way in, which breaks extended-thinking continuation. OK to handle as a follow-up.

- Restructure flow, sync README_zh, document quantized sampling. - Refresh positioning, branding, and section layout (logo, banners, github social card). - Polish Quick Start and consolidate provider tables. - Drop forward reference to PR #174 in features bullet. - Rework structure, factual fixes, add Featured Partner doc. - Fix brand asset rendering: light banner no longer leaks the dark body background; dark banner and github-social regain previously-clipped content (terminal-log arrows, "★ open source" badge). Root cause: overflow:hidden on stage ancestors clipping the forced 1280px target during extraction. - Replace fabricated token figures (12,847t / 8,402t / 3,914t) on the banner right column with action labels (design / impl / parallel / review). The numbers conflicted with how totalTokenUsage actually aggregates (input+output, includes coordinator decompose+synthesize, merges same-name agents).

JackChen-me · 2026-04-27T03:49:16Z

@codex

SiMinus · 2026-04-27T03:52:53Z

Thanks for the review. I addressed both blocking regressions.

Changes made:

Review item	Fix
`extractText` prepended `<think>...</think>` into final output	Updated `AgentRunner` text extraction to ignore `ReasoningBlock` and only join `TextBlock`s, so structured-output parsing no longer sees reasoning content.
Missing structured-output coverage	Added a regression test where reasoning contains a misleading `{` and the final text contains valid JSON. The structured output path now validates the JSON correctly.
Anthropic casts bypassed SDK narrowing	Removed the unnecessary `(block as ...)` / `(delta as ...)` casts and switched directly on `block.type` / `delta.type`, restoring SDK union narrowing and exhaustiveness behavior.

I left ThinkingBlock.signature out of this PR since it was called out as acceptable follow-up work.

Validation:

✅ npm test -- tests/structured-output.test.ts tests/anthropic-adapter.test.ts tests/openai-adapter.test.ts
✅ npm run lint
✅ npm test (48 test files, 689 tests)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 102740913e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T03:55:22Z

+  const reasoningText = getOpenAIReasoningText(message)
+  if (reasoningText.length > 0) {
+    const reasoningBlock: ReasoningBlock = { type: 'reasoning', text: reasoningText }
+    content.push(reasoningBlock)


Count reasoning blocks in token-budget estimation

This change persists reasoning text in LLMResponse.content, but token estimation still ignores reasoning blocks (it only counts text, tool_result, tool_use, and image). In runs that use context compaction/truncation based on estimateTokens, long reasoning traces will be undercounted, so compaction can be skipped until the next provider call fails on context length. Please include reasoning text in token estimation so retained reasoning actually participates in budget control.

Useful? React with 👍 / 👎.

SiMinus · 2026-04-27T06:59:28Z

Implemented the review follow-ups for native reasoning support.

This update keeps the original reasoning streaming design, while fixing the merge-blocking regressions and token-estimation gap:

Area	Update
Structured output	`AgentRunner` now excludes `ReasoningBlock` from final text extraction, so reasoning traces cannot interfere with `outputSchema` JSON parsing.
Regression coverage	Added a structured-output test where reasoning contains a misleading `{` and final text contains valid JSON.
Anthropic adapter	Removed unnecessary casts and restored SDK union narrowing for `thinking` / `thinking_delta`.
Token estimation	`estimateTokens()` now counts retained reasoning text so context compaction and summarization thresholds account for reasoning traces.
Token test coverage	Added a focused test for reasoning token estimation.

Validation:

✅ npm run lint
✅ npm test -- tests/structured-output.test.ts tests/anthropic-adapter.test.ts tests/openai-adapter.test.ts
✅ npm test -- tests/tokens.test.ts tests/context-strategy.test.ts tests/structured-output.test.ts
✅ npm test (49 test files, 690 tests)

JackChen-me · 2026-04-27T08:41:42Z

Thanks, the latest round addresses the structured-output and SDK-narrowing regressions cleanly. One outbound-serialization concern before merge.

anthropic.ts:67, openai-common.ts:143/179, and gemini.ts:88 wrap reasoning text as <think>...</think> and push it back into provider history as ordinary assistant/user text. The content was real, but the format is non-roundtrip: what the model emitted (e.g. an Anthropic ThinkingBlock plus TextBlock) gets flattened into a single text block on the next turn, and the model sees a different shape than what it actually produced.

Concrete costs:

Context pollution: <think>...</think> becomes part of assistant text on subsequent turns, where the model has no contract that says "ignore this region."
Token waste: reasoning traces are typically the largest part of a turn, and they now ride along on every subsequent request.
Blocks the future Anthropic native path: extended-thinking continuation across tool-use turns expects ThinkingBlockParam (with signature) in history, not text. Once we want that, the text-wrap codepath has to come out anyway.

Simplest fix: skip reasoning blocks when serializing outbound. Stream events still fire, LLMResponse.content still retains reasoning for consumers, history just stops carrying it.

// anthropic.ts toAnthropicContentBlockParam
-    case 'reasoning': {
-      const param: TextBlockParam = { type: 'text', text: `<think>${block.text}</think>` }
-      return param
-    }

Same delete in toOpenAIUserMessage, toOpenAIAssistantMessage, and toGeminiContents (and the toThinkTag helper goes with them). Anthropic native thinking continuation (with signature) is a separate follow-up. Main currently degrades thinking blocks to a fallback text block anyway, so this PR doesn't introduce a regression on that front.

One caveat worth flagging: local <think>-tag models (DeepSeek R1, QwQ) emit reasoning as part of their normal text output, not as a separate channel, so they're unaffected by the outbound rule above; their tags will keep appearing in text blocks the way they do today. If we want a different policy for those later, it can be a separate issue.

Process note for next time, not blocking this PR: #169 explicitly says "not a green light to implement until the StreamEvent extension is designed," and lists three open design questions. This PR ended up answering all three (retain reasoning in LLMResponse.content, unify Anthropic + OpenAI into one event/block, count toward token budget). The answers themselves are reasonable, but the agreement should land in the issue before the implementation PR opens, so reviewers can push back on the design separately from the code. I'll split those into their own issues so they're tracked, not buried in this thread:

Retain reasoning in LLMResponse.content vs. stream-only
Unify or split Anthropic thinking vs. OpenAI reasoning_content block types
Token budget treatment for reasoning text

Drop those four cases and this is good to go.

SiMinus · 2026-04-28T04:47:44Z

Thanks, addressed the outbound serialization concern.

What changed:

Area	Update
Anthropic outbound history	`ReasoningBlock` is now filtered out before converting framework messages to Anthropic message content.
OpenAI outbound history	Removed the `<think>...</think>` wrapping path from both user and assistant message serialization.
Gemini outbound history	`ReasoningBlock` is now filtered out before converting framework messages to Gemini parts.
Preserved behavior	`reasoning` stream events still fire, `LLMResponse.content` still retains `ReasoningBlock`, and token estimation still counts retained reasoning.

This means provider history no longer gets polluted with reasoning text serialized as ordinary assistant/user text. Local models that emit literal <think> tags inside normal text blocks are unchanged, as noted.

Validation:

✅ npm run lint
✅ npm test -- tests/openai-adapter.test.ts tests/anthropic-adapter.test.ts tests/gemini-adapter-contract.test.ts tests/tokens.test.ts tests/structured-output.test.ts
✅ npm test (49 test files, 690 tests)

JackChen-me · 2026-04-28T17:21:19Z

LGTM, merging.

feat(streaming): add reasoning events for OpenAI and Anthropic

47ae23e

JackChen-me requested changes Apr 25, 2026

View reviewed changes

JackChen-me mentioned this pull request Apr 25, 2026

docs(README): refresh positioning, branding, and section layout #176

Merged

4 tasks

fix(streaming): address reasoning review regressions

1027409

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

fix(streaming): address reasoning review follow-ups

4e83906

fix(llm): skip reasoning blocks in provider history

a9171a5

JackChen-me approved these changes Apr 28, 2026

View reviewed changes

JackChen-me merged commit dccd4eb into open-multi-agent:main Apr 28, 2026
4 checks passed

JackChen-me mentioned this pull request Apr 29, 2026

Native streaming support for reasoning_content / <think> blocks #169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(streaming): add reasoning events for OpenAI and Anthropic#174

feat(streaming): add reasoning events for OpenAI and Anthropic#174
JackChen-me merged 4 commits into
open-multi-agent:mainfrom
SiMinus:feat/stream-reasoning-content

SiMinus commented Apr 25, 2026

Uh oh!

JackChen-me left a comment

Uh oh!

JackChen-me commented Apr 27, 2026

Uh oh!

SiMinus commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

SiMinus commented Apr 27, 2026

Uh oh!

JackChen-me commented Apr 27, 2026

Uh oh!

SiMinus commented Apr 28, 2026

Uh oh!

JackChen-me commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SiMinus commented Apr 25, 2026

What changed

Tests

My solutions to open design questions

Uh oh!

JackChen-me left a comment

Choose a reason for hiding this comment

Uh oh!

JackChen-me commented Apr 27, 2026

Uh oh!

SiMinus commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

SiMinus commented Apr 27, 2026

Uh oh!

JackChen-me commented Apr 27, 2026

Uh oh!

SiMinus commented Apr 28, 2026

Uh oh!

JackChen-me commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants