feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200)#209
Conversation
…-agent#200) Phase-1 follow-up to open-multi-agent#205. Owner explicitly carved this out as separate work in their RFC open-multi-agent#200 reply: "OpenAI reasoning_effort is worth adding, but keep it separate from the phase-1 protocol fix." The IR field `AgentConfig.thinking.effort` was added in open-multi-agent#205 but only wired up for Anthropic / Gemini. This PR adds the OpenAI side. Changes: - openai.ts chat() and stream(): one-line `reasoning_effort: options.thinking?.effort` added before `...options.extraBody` so extraBody can override (consistent with how the other sampling params are ordered). - types.ts AgentConfig.thinking JSDoc: refreshed to document per-provider mapping now that OpenAI is wired up. Design notes: - The `'minimal'` effort value (gpt-5) isn't yet in the SDK 4.104 type union (`'low' | 'medium' | 'high' | null`), but the existing `as ChatCompletionCreateParamsNonStreaming` cast at the bottom of the params object already covers this, consistent with how `top_k` / `min_p` are forwarded. Test verifies pass-through. - `thinking.enabled` and `thinking.budgetTokens` are intentionally ignored by the OpenAI adapter — OpenAI's reasoning surface is qualitative (`effort`) only, and the explicit-effort design was the owner's open-question resolution in open-multi-agent#200. - Scope deliberately tight: openai.ts only. Other OpenAI-compatible adapters (copilot, azure-openai, deepseek, grok, minimax, qiniu) are unchanged — they each route to providers with their own reasoning semantics, so a separate PR per adapter is the right unit. +6 tests covering: chat forwarding, stream forwarding, gpt-5 'minimal' pass-through, omission when thinking absent, omission when effort unset (with budgetTokens set, to verify enabled/budgetTokens are correctly ignored), extraBody override of effort. 770/770 passing.
Both adapters implement LLMAdapter directly (rather than extending
OpenAIAdapter like grok/minimax/deepseek/qiniu do), so they didn't pick
up the reasoning_effort forwarding from the openai.ts change in this PR.
Mirror the same pattern to close the gap:
- Add `reasoning_effort: options.thinking?.effort` to chat() and stream()
on both adapters.
- Add `as ChatCompletionCreateParams{NonStreaming,Streaming}` casts on
the params objects (didn't exist before because neither adapter
forwarded any param outside the SDK's declared union — the cast is
needed to pass the IR's `'minimal'` value through, same as openai.ts).
+8 tests across copilot-adapter and azure-openai-adapter suites: chat
forwarding, stream forwarding, gpt-5 'minimal' pass-through, omission
when thinking absent / effort unset.
778/778 passing.
…ch openai
The Copilot and Azure-OpenAI adapters were quietly dropping seven
AgentConfig fields that openai.ts has always forwarded:
frequency_penalty, presence_penalty, top_p, top_k, min_p,
parallel_tool_calls, extraBody (spread)
A user setting `topP: 0.9` or `extraBody: { logit_bias: {...} }` on a
Copilot/Azure agent would see no effect on the wire, with no warning —
the AgentConfig accepted the field, the adapter silently ignored it.
Mirror the openai.ts param object exactly: same field set, same ordering
(sampling first, ...extraBody, then structural fields so extraBody can
override sampling defaults but cannot clobber model/messages/tools/stream).
This was discovered while tracing the reasoning_effort change — it's not
strictly part of open-multi-agent#200, but the Copilot/Azure adapters were already in
this PR's surface area for the reasoning_effort fix and the silent-drop
behaviour is the same bug class as the budget-vs-maxTokens issue you
caught on open-multi-agent#205. Folding the parity fix in while the diff is open.
+6 tests (3 per adapter):
- forwards full sampling-param set + extraBody
- extraBody overrides sampling params (field-ordering contract)
- extraBody cannot override structural fields (model/stream)
784/784 passing.
…ted by these endpoints Mid-PR correction. The previous "mirror openai.ts exactly" reasoning was sloppy: openai.ts forwards `top_k` and `min_p` deliberately for the local-server case (`provider: 'openai', baseURL: 'http://localhost:8000'` pointing at vLLM), but Copilot and Azure OpenAI both have hard-coded cloud endpoints with no equivalent escape hatch. Verified against real docs: - Azure OpenAI Chat Completions reference lists `frequency_penalty` / `presence_penalty` / `top_p` / `parallel_tool_calls` / `logit_bias`, but explicitly does NOT include `top_k` or `min_p`. - vLLM docs confirm `top_k` / `min_p` are vLLM-only sampling extensions not part of the OpenAI Chat Completions spec. - Copilot uses `api.githubcopilot.com` — also a fixed cloud proxy that normalises to standard OpenAI Chat Completions. So forwarding these as top-level fields on Copilot / Azure would be dead weight at best and a 400 at worst. Users who genuinely need them on either adapter can still pass via `extraBody` (e.g. for a local proxy that re-routes to vLLM). Removed from copilot.ts and azure-openai.ts param objects in both chat() and stream(); updated cast comments to drop the `top_k` / `min_p` mention; replaced the "forwards full set" tests with two narrower tests each: (a) forwards the OpenAI-cloud-compatible subset, (b) does NOT forward `top_k` / `min_p` (regression guard). 786/786 passing.
…official After verifying my assumptions per-field (per @MyPrototypeWhat's review push), confirmed: - Azure OpenAI: docs explicitly list frequency_penalty / presence_penalty / top_p / parallel_tool_calls (verbatim quotes from learn.microsoft.com). - Copilot: GitHub doesn't publish a public chat/completions API reference. The available references are community-reverse-engineered (e.g. Alorse/copilot-to-api), and the documented request examples only show messages/model/max_tokens/temperature/stream/tools/tool_choice — the four sampling fields aren't called out. Forwarding them anyway because: 1. Copilot's design promise is OpenAI Chat Completions compatibility. 2. The four fields are standard OpenAI fields, so the proxy promise should cover them. 3. Per-model behaviour (some routed reasoning models may ignore them) is out of scope for the adapter — same situation as openai.ts forwarding topP to OpenAI's o-series, which the model itself ignores. 4. Removing them would re-introduce the silent-drop bug for users whose models DO honour the fields. Added a JSDoc caveat in the chat() params block so future maintainers see the verification gap and don't mistake the forwarding for a verified contract.
… sampling fields Per @MyPrototypeWhat's review: "既然没有公开的证据证明,那就保持原状呗?" ("If there's no public evidence, just keep the original state, right?") Agreed — Copilot's API isn't publicly documented by GitHub, and the reverse-engineered references only show model / messages / max_tokens / temperature / tools / stream / tool_choice. Forwarding additional sampling fields (frequency_penalty, presence_penalty, top_p, parallel_tool_calls) and extraBody on the assumption that "Copilot is OpenAI-compatible" was a guess, not evidence. Reverting the Copilot side of the sampling-param backfill from this PR. The Copilot adapter now stays at its pre-PR field surface PLUS the new `reasoning_effort` (which is the actual open-multi-agent#200 work and verified against the SDK's typed union). Replaced the "forwards full set" + "does NOT forward top_k/min_p" tests with one consolidated "does NOT forward unverified fields" regression test, plus a positive test that the documented subset (temperature / max_tokens / tools) still works. Azure-OpenAI side is unchanged — Microsoft Learn explicitly documents frequency_penalty / presence_penalty / top_p / parallel_tool_calls / logit_bias on the Azure OpenAI Chat Completions endpoint, so that backfill stays. 784/784 passing.
…-level cast on reasoning_effort
The object-level `as ChatCompletionCreateParams{NonStreaming,Streaming}`
cast was originally added to cover the IR's `'minimal'` reasoning effort
value (gpt-5), which isn't in SDK 4.104's type union. After last commit's
revert, that's the ONLY remaining type mismatch in either adapter — every
other field they forward is already in the SDK union.
A blanket object-level cast is overkill for a single field. Replace with
a narrow field-level `as 'low' | 'medium' | 'high' | null | undefined` on
the `reasoning_effort` line itself. Side benefit: TypeScript still
type-checks the rest of the params object, so a future typo / wrong-type
field would now be caught at compile time instead of slipping past the
broad cast.
(Note: openai.ts keeps its object-level cast because it forwards `top_k`
and `min_p` — those are field NAMES not in the SDK type at all, not
just unsupported values, so a field-level cast doesn't help.)
Removed the now-unused `ChatCompletionCreateParamsNonStreaming` and
`ChatCompletionCreateParamsStreaming` imports from both files.
784/784 passing. Lint clean.
…erification process The previous comment block leaked PR-review process into source code: - "Mirrors the field ordering contract in openai.ts" — decorative pointer - "(Confirmed against the Azure OpenAI Chat Completions API reference, which lists frequency_penalty/presence_penalty/top_p/parallel_tool_calls but not top_k or min_p.)" — verification work, belongs in commit/PR description, not in long-lived source Kept the two genuinely load-bearing parts: - Field ordering rationale (sampling first → extraBody → structural) — without this, future contributors will reorder - Single-line note on why top_k/min_p are absent — defends against future "you forgot these" PRs
|
@JackChen-me — one design question before this leaves draft. The IR field
Three paths:
I'd lean (2) for consistency with the existing extraBody pattern, but (1) is genuinely cleaner if you're open to bumping the SDK. (3) is fine to ship as-is too. What's your call? |
…ous") After @MyPrototypeWhat's audit ("是否必须的?"), trimmed three categories: 1. Copilot chat() narrow-surface comment: 9 lines → 3 lines. The explicit list of forwarded/not-forwarded fields was redundant with the source code immediately below it. Kept the load-bearing WHY ("Copilot's API isn't publicly documented"). 2. Three test rationale comments removed entirely (4-6 lines each). The test names already convey both WHAT and WHY: - "forwards the OpenAI-cloud-compatible sampling params and extraBody" - "does NOT forward vLLM-only top_k / min_p (Azure runs MS-hosted OpenAI models)" - "does NOT forward sampling params or extraBody that lack public Copilot spec coverage" The verification process and bug history live in PR open-multi-agent#209's description and the per-commit messages, not in long-lived test source. Net -21 lines. 784/784 still passing. Lint clean.
|
@MyPrototypeWhat Agree, let's take option 2. Remove |
…JackChen-me) Per owner's call on open-multi-agent#209: option 2. Narrow `ThinkingConfig.effort` to the SDK 4.104-declared values (`'low' | 'medium' | 'high'`) and document that newer API values (gpt-5 `'minimal'`, GPT-5.5 `'none'`) travel via `extraBody: { reasoning_effort: '<value>' }` — the same escape hatch already used for vLLM-only `top_k` / `min_p`. Cleanup: - src/types.ts: ThinkingConfig.effort union narrowed; JSDoc points users to extraBody for newer SDK-untyped values. - src/llm/openai.ts: object-level cast comment trimmed (no longer needs to mention 'minimal'; cast still required for top_k/min_p/extraBody). - src/llm/copilot.ts + src/llm/azure-openai.ts: field-level cast on `reasoning_effort` removed entirely. The IR type now satisfies the SDK type union, so no boundary cast is needed. - tests/openai-adapter.test.ts + tests/azure-openai-adapter.test.ts: rewrote the 'minimal' tests to verify the extraBody route (the now- supported path) instead of the old IR-direct route. - tests/copilot-adapter.test.ts: deleted the 'minimal' test outright. Copilot doesn't forward extraBody (narrow surface, no public spec), so neither route exists for Copilot users wanting 'minimal'. They use 'low' instead, or wait for SDK bump. 783/783 passing. Lint clean. Net -7 lines. No behaviour change for users on `'low' | 'medium' | 'high'` (the entire SDK-typed surface).
|
Done in
|
|
LGTM. |
Summary
Phase-1 follow-up to #205 (RFC #200). PR is draft while we iterate on scope.
What changed
1. Forward
thinking.effortasreasoning_effortacross the OpenAI track — the original #200 carve-out. Wires up the IR field added in #205.2. Backfill missing sampling params on Azure-OpenAI only — discovered while tracing #1. Same silent-drop bug class as the
budget-vs-maxTokensissue caught on #205, but only applied to Azure where I have docs to verify each field.Copilot is intentionally NOT broadened beyond
reasoning_effort— see the verification note below.Per-field verification (the part that took several iterations)
temperaturefrequency_penaltypresence_penaltytop_pparallel_tool_callstop_kmin_preasoning_effortextraBody(spread)Azure verification: fetched
learn.microsoft.com/.../azure/ai-services/openai/referencedirectly — quoted definitions for the 4 sampling params +parallel_tool_calls+logit_bias;top_k/min_pexplicitly not listed.Copilot verification: GitHub does not publish a public Copilot Chat Completions API reference. Reverse-engineered community sources (Alorse/copilot-to-api etc.) show only
model/messages/max_tokens/temperature/stream/tools/tool_choice. Forwarding additional fields would have been a guess based on "Copilot is OpenAI-compatible by design", which isn't evidence. Keep the Copilot field surface at pre-PR baseline +reasoning_effortonly.Coverage after this PR
reasoning_effortopenaigrok/minimax/deepseek/qiniuextends OpenAIAdaptercopilotazure-openaiTests (+18 across 3 files)
tests/openai-adapter.test.ts(+6):reasoning_effortchat/stream forwarding, gpt-5'minimal', absent omission, effort-unset omission,extraBodyoverridetests/copilot-adapter.test.ts(+6): fourreasoning_efforttests + two conservative-surface guards (does NOT forward unverified sampling fields; still forwards documented temperature/max_tokens/tools)tests/azure-openai-adapter.test.ts(+8): fourreasoning_efforttests + four sampling-parity tests (cloud subset forwarded,top_k/min_pNOT forwarded, extraBody overrides sampling, extraBody cannot override structural)784/784passing.npm run lintclean.Test plan
npm test— 784/784npm run lint— clean'minimal'reasoning effort pass-throughCommits in this PR
4fcfb96—feat(openai): addreasoning_effortto openai.tsc87c683—feat(copilot, azure-openai): addreasoning_effortto copilot + azure0490b8e—fix(copilot, azure-openai): backfill sampling params (initial overshoot — includedtop_k/min_p)1285c4b—fix(copilot, azure-openai): droptop_k/min_pafter verifying neither endpoint accepts themd95d8c4—docs(copilot): disclose that the API spec is reverse-engineered (later superseded byb7e674a)b7e674a—revert(copilot): keep Copilot param surface narrow — no public spec for the 4 sampling fieldsHappy to squash on merge or keep granular history — your call.