feat(streaming): add reasoning events for OpenAI and Anthropic#174
Conversation
JackChen-me
left a comment
There was a problem hiding this comment.
Two regressions before this can merge.
-
runner.ts:195extractTextprepends<think>...</think>to the final text. WithoutputSchema, that flows intoextractJSON's case-3 fallback (indexOf('{')tolastIndexOf('}')), which grabs the wrong span whenever the reasoning contains{. Either skip reasoning blocks inextractTextor use a text-only extractor for the structured path. No test covers the combo. -
fromAnthropicContentBlockand thecontent_block_deltaswitch use(block as { ... }).fieldcasts. Unnecessary: SDK 0.52 already hasThinkingBlockandThinkingDeltain the relevant unions (messages.d.ts:138, 326), socase 'thinking'/case 'thinking_delta'on the original typed switches narrows correctly. The default branch also loses exhaustiveness as written.
Side note: ThinkingBlock.signature is dropped on the way in, which breaks extended-thinking continuation. OK to handle as a follow-up.
- Restructure flow, sync README_zh, document quantized sampling. - Refresh positioning, branding, and section layout (logo, banners, github social card). - Polish Quick Start and consolidate provider tables. - Drop forward reference to PR #174 in features bullet. - Rework structure, factual fixes, add Featured Partner doc. - Fix brand asset rendering: light banner no longer leaks the dark body background; dark banner and github-social regain previously-clipped content (terminal-log arrows, "★ open source" badge). Root cause: overflow:hidden on stage ancestors clipping the forced 1280px target during extraction. - Replace fabricated token figures (12,847t / 8,402t / 3,914t) on the banner right column with action labels (design / impl / parallel / review). The numbers conflicted with how totalTokenUsage actually aggregates (input+output, includes coordinator decompose+synthesize, merges same-name agents).
|
Thanks for the review. I addressed both blocking regressions. Changes made:
I left Validation:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 102740913e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const reasoningText = getOpenAIReasoningText(message) | ||
| if (reasoningText.length > 0) { | ||
| const reasoningBlock: ReasoningBlock = { type: 'reasoning', text: reasoningText } | ||
| content.push(reasoningBlock) |
There was a problem hiding this comment.
Count reasoning blocks in token-budget estimation
This change persists reasoning text in LLMResponse.content, but token estimation still ignores reasoning blocks (it only counts text, tool_result, tool_use, and image). In runs that use context compaction/truncation based on estimateTokens, long reasoning traces will be undercounted, so compaction can be skipped until the next provider call fails on context length. Please include reasoning text in token estimation so retained reasoning actually participates in budget control.
Useful? React with 👍 / 👎.
|
Implemented the review follow-ups for native reasoning support. This update keeps the original reasoning streaming design, while fixing the merge-blocking regressions and token-estimation gap:
Validation:
|
|
Thanks, the latest round addresses the structured-output and SDK-narrowing regressions cleanly. One outbound-serialization concern before merge.
Concrete costs:
Simplest fix: skip reasoning blocks when serializing outbound. Stream events still fire, // anthropic.ts toAnthropicContentBlockParam
- case 'reasoning': {
- const param: TextBlockParam = { type: 'text', text: `<think>${block.text}</think>` }
- return param
- }Same delete in One caveat worth flagging: local Process note for next time, not blocking this PR: #169 explicitly says "not a green light to implement until the StreamEvent extension is designed," and lists three open design questions. This PR ended up answering all three (retain reasoning in
Drop those four cases and this is good to go. |
|
Thanks, addressed the outbound serialization concern. What changed:
This means provider history no longer gets polluted with reasoning text serialized as ordinary assistant/user text. Local models that emit literal Validation:
|
|
LGTM, merging. |
Implemented native reasoning-streaming support with a minimal, explicit extension of the streaming contract.
What changed
reasoningevent toStreamEvent, so reasoning deltas are no longer conflated with normal text.ReasoningBlockcontent type and retained reasoning inLLMResponse.contentinstead of exposing it only during streaming.OpenAIAdapter.stream()to mapdelta.reasoning_contenttoreasoningevents and preserve the full reasoning trace in the finaldoneresponse.reasoning_content.thinking/thinking_deltainto the same unifiedreasoningevent/block shape.<think>...</think>instead of dropping it.Tests
npm run lintnpm test -- tests/openai-adapter.test.ts tests/anthropic-adapter.test.tsnpm test(48test files,688tests)My solutions to open design questions
LLMResponse.content: I retained reasoning in the final response as a separateReasoningBlock, rather than surfacing it only during streaming.thinking: I mapped Anthropicthinking/thinking_deltaand OpenAI-compatiblereasoning_contentinto one unifiedreasoningevent/block, not two provider-specific event types.maxTokenBudget: In this implementation, retained reasoning is carried forward as part of the conversation context, so it effectively counts toward later token budget usage; this is an implementation consequence, not a separately designed budgeting rule.