feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200) by MyPrototypeWhat · Pull Request #209 · open-multi-agent/open-multi-agent

MyPrototypeWhat · 2026-05-06T06:42:28Z

Summary

Phase-1 follow-up to #205 (RFC #200). PR is draft while we iterate on scope.

What changed

1. Forward thinking.effort as reasoning_effort across the OpenAI track — the original #200 carve-out. Wires up the IR field added in #205.

2. Backfill missing sampling params on Azure-OpenAI only — discovered while tracing #1. Same silent-drop bug class as the budget-vs-maxTokens issue caught on #205, but only applied to Azure where I have docs to verify each field.

Copilot is intentionally NOT broadened beyond reasoning_effort — see the verification note below.

Per-field verification (the part that took several iterations)

Field	OpenAI	Azure OpenAI	Copilot	Verification source
`temperature`	✅	✅	✅	Pre-PR baseline; in Copilot reverse-eng docs
`frequency_penalty`	✅	✅	NOT forwarded	Azure: docs verbatim. Copilot: no public spec — keep narrow
`presence_penalty`	✅	✅	NOT forwarded	Azure: docs verbatim. Copilot: no public spec — keep narrow
`top_p`	✅	✅	NOT forwarded	Azure: docs verbatim. Copilot: no public spec — keep narrow
`parallel_tool_calls`	✅	✅	NOT forwarded	Azure: docs verbatim. Copilot: no public spec — keep narrow
`top_k`	✅ vLLM-only escape hatch	❌ explicitly absent from Azure docs	❌ vLLM-only, no local escape	Azure docs + vLLM docs
`min_p`	✅ vLLM-only escape hatch	❌ explicitly absent from Azure docs	❌ vLLM-only, no local escape	Azure docs + vLLM docs
`reasoning_effort`	✅	✅	✅	SDK 4.104 type union
`extraBody` (spread)	✅ pre-existing	✅ added in this PR	NOT added	Same Copilot reasoning

Azure verification: fetched learn.microsoft.com/.../azure/ai-services/openai/reference directly — quoted definitions for the 4 sampling params + parallel_tool_calls + logit_bias; top_k/min_p explicitly not listed.

Copilot verification: GitHub does not publish a public Copilot Chat Completions API reference. Reverse-engineered community sources (Alorse/copilot-to-api etc.) show only model/messages/max_tokens/temperature/stream/tools/tool_choice. Forwarding additional fields would have been a guess based on "Copilot is OpenAI-compatible by design", which isn't evidence. Keep the Copilot field surface at pre-PR baseline + reasoning_effort only.

Coverage after this PR

Adapter	`reasoning_effort`	Cloud sampling parity
`openai`	✅ this PR	✅ unchanged
`grok` / `minimax` / `deepseek` / `qiniu`	✅ inherited via `extends OpenAIAdapter`	✅ inherited
`copilot`	✅ this PR	— kept narrow (no public spec)
`azure-openai`	✅ this PR	✅ backfilled (verified)

Tests (+18 across 3 files)

tests/openai-adapter.test.ts (+6): reasoning_effort chat/stream forwarding, gpt-5 'minimal', absent omission, effort-unset omission, extraBody override
tests/copilot-adapter.test.ts (+6): four reasoning_effort tests + two conservative-surface guards (does NOT forward unverified sampling fields; still forwards documented temperature/max_tokens/tools)
tests/azure-openai-adapter.test.ts (+8): four reasoning_effort tests + four sampling-parity tests (cloud subset forwarded, top_k/min_p NOT forwarded, extraBody overrides sampling, extraBody cannot override structural)

784/784 passing. npm run lint clean.

Test plan

npm test — 784/784
npm run lint — clean
Per-field verification per the table above
Object-level cast confirmed working for 'minimal' reasoning effort pass-through

Commits in this PR

4fcfb96 — feat(openai): add reasoning_effort to openai.ts
c87c683 — feat(copilot, azure-openai): add reasoning_effort to copilot + azure
0490b8e — fix(copilot, azure-openai): backfill sampling params (initial overshoot — included top_k/min_p)
1285c4b — fix(copilot, azure-openai): drop top_k/min_p after verifying neither endpoint accepts them
d95d8c4 — docs(copilot): disclose that the API spec is reverse-engineered (later superseded by b7e674a)
b7e674a — revert(copilot): keep Copilot param surface narrow — no public spec for the 4 sampling fields

Happy to squash on merge or keep granular history — your call.

…-agent#200) Phase-1 follow-up to open-multi-agent#205. Owner explicitly carved this out as separate work in their RFC open-multi-agent#200 reply: "OpenAI reasoning_effort is worth adding, but keep it separate from the phase-1 protocol fix." The IR field `AgentConfig.thinking.effort` was added in open-multi-agent#205 but only wired up for Anthropic / Gemini. This PR adds the OpenAI side. Changes: - openai.ts chat() and stream(): one-line `reasoning_effort: options.thinking?.effort` added before `...options.extraBody` so extraBody can override (consistent with how the other sampling params are ordered). - types.ts AgentConfig.thinking JSDoc: refreshed to document per-provider mapping now that OpenAI is wired up. Design notes: - The `'minimal'` effort value (gpt-5) isn't yet in the SDK 4.104 type union (`'low' | 'medium' | 'high' | null`), but the existing `as ChatCompletionCreateParamsNonStreaming` cast at the bottom of the params object already covers this, consistent with how `top_k` / `min_p` are forwarded. Test verifies pass-through. - `thinking.enabled` and `thinking.budgetTokens` are intentionally ignored by the OpenAI adapter — OpenAI's reasoning surface is qualitative (`effort`) only, and the explicit-effort design was the owner's open-question resolution in open-multi-agent#200. - Scope deliberately tight: openai.ts only. Other OpenAI-compatible adapters (copilot, azure-openai, deepseek, grok, minimax, qiniu) are unchanged — they each route to providers with their own reasoning semantics, so a separate PR per adapter is the right unit. +6 tests covering: chat forwarding, stream forwarding, gpt-5 'minimal' pass-through, omission when thinking absent, omission when effort unset (with budgetTokens set, to verify enabled/budgetTokens are correctly ignored), extraBody override of effort. 770/770 passing.

Both adapters implement LLMAdapter directly (rather than extending OpenAIAdapter like grok/minimax/deepseek/qiniu do), so they didn't pick up the reasoning_effort forwarding from the openai.ts change in this PR. Mirror the same pattern to close the gap: - Add `reasoning_effort: options.thinking?.effort` to chat() and stream() on both adapters. - Add `as ChatCompletionCreateParams{NonStreaming,Streaming}` casts on the params objects (didn't exist before because neither adapter forwarded any param outside the SDK's declared union — the cast is needed to pass the IR's `'minimal'` value through, same as openai.ts). +8 tests across copilot-adapter and azure-openai-adapter suites: chat forwarding, stream forwarding, gpt-5 'minimal' pass-through, omission when thinking absent / effort unset. 778/778 passing.

…ch openai The Copilot and Azure-OpenAI adapters were quietly dropping seven AgentConfig fields that openai.ts has always forwarded: frequency_penalty, presence_penalty, top_p, top_k, min_p, parallel_tool_calls, extraBody (spread) A user setting `topP: 0.9` or `extraBody: { logit_bias: {...} }` on a Copilot/Azure agent would see no effect on the wire, with no warning — the AgentConfig accepted the field, the adapter silently ignored it. Mirror the openai.ts param object exactly: same field set, same ordering (sampling first, ...extraBody, then structural fields so extraBody can override sampling defaults but cannot clobber model/messages/tools/stream). This was discovered while tracing the reasoning_effort change — it's not strictly part of open-multi-agent#200, but the Copilot/Azure adapters were already in this PR's surface area for the reasoning_effort fix and the silent-drop behaviour is the same bug class as the budget-vs-maxTokens issue you caught on open-multi-agent#205. Folding the parity fix in while the diff is open. +6 tests (3 per adapter): - forwards full sampling-param set + extraBody - extraBody overrides sampling params (field-ordering contract) - extraBody cannot override structural fields (model/stream) 784/784 passing.

…ted by these endpoints Mid-PR correction. The previous "mirror openai.ts exactly" reasoning was sloppy: openai.ts forwards `top_k` and `min_p` deliberately for the local-server case (`provider: 'openai', baseURL: 'http://localhost:8000'` pointing at vLLM), but Copilot and Azure OpenAI both have hard-coded cloud endpoints with no equivalent escape hatch. Verified against real docs: - Azure OpenAI Chat Completions reference lists `frequency_penalty` / `presence_penalty` / `top_p` / `parallel_tool_calls` / `logit_bias`, but explicitly does NOT include `top_k` or `min_p`. - vLLM docs confirm `top_k` / `min_p` are vLLM-only sampling extensions not part of the OpenAI Chat Completions spec. - Copilot uses `api.githubcopilot.com` — also a fixed cloud proxy that normalises to standard OpenAI Chat Completions. So forwarding these as top-level fields on Copilot / Azure would be dead weight at best and a 400 at worst. Users who genuinely need them on either adapter can still pass via `extraBody` (e.g. for a local proxy that re-routes to vLLM). Removed from copilot.ts and azure-openai.ts param objects in both chat() and stream(); updated cast comments to drop the `top_k` / `min_p` mention; replaced the "forwards full set" tests with two narrower tests each: (a) forwards the OpenAI-cloud-compatible subset, (b) does NOT forward `top_k` / `min_p` (regression guard). 786/786 passing.

@MyPrototypeWhat

…official After verifying my assumptions per-field (per @MyPrototypeWhat's review push), confirmed: - Azure OpenAI: docs explicitly list frequency_penalty / presence_penalty / top_p / parallel_tool_calls (verbatim quotes from learn.microsoft.com). - Copilot: GitHub doesn't publish a public chat/completions API reference. The available references are community-reverse-engineered (e.g. Alorse/copilot-to-api), and the documented request examples only show messages/model/max_tokens/temperature/stream/tools/tool_choice — the four sampling fields aren't called out. Forwarding them anyway because: 1. Copilot's design promise is OpenAI Chat Completions compatibility. 2. The four fields are standard OpenAI fields, so the proxy promise should cover them. 3. Per-model behaviour (some routed reasoning models may ignore them) is out of scope for the adapter — same situation as openai.ts forwarding topP to OpenAI's o-series, which the model itself ignores. 4. Removing them would re-introduce the silent-drop bug for users whose models DO honour the fields. Added a JSDoc caveat in the chat() params block so future maintainers see the verification gap and don't mistake the forwarding for a verified contract.

@MyPrototypeWhat

… sampling fields Per @MyPrototypeWhat's review: "既然没有公开的证据证明,那就保持原状呗?" ("If there's no public evidence, just keep the original state, right?") Agreed — Copilot's API isn't publicly documented by GitHub, and the reverse-engineered references only show model / messages / max_tokens / temperature / tools / stream / tool_choice. Forwarding additional sampling fields (frequency_penalty, presence_penalty, top_p, parallel_tool_calls) and extraBody on the assumption that "Copilot is OpenAI-compatible" was a guess, not evidence. Reverting the Copilot side of the sampling-param backfill from this PR. The Copilot adapter now stays at its pre-PR field surface PLUS the new `reasoning_effort` (which is the actual open-multi-agent#200 work and verified against the SDK's typed union). Replaced the "forwards full set" + "does NOT forward top_k/min_p" tests with one consolidated "does NOT forward unverified fields" regression test, plus a positive test that the documented subset (temperature / max_tokens / tools) still works. Azure-OpenAI side is unchanged — Microsoft Learn explicitly documents frequency_penalty / presence_penalty / top_p / parallel_tool_calls / logit_bias on the Azure OpenAI Chat Completions endpoint, so that backfill stays. 784/784 passing.

…-level cast on reasoning_effort The object-level `as ChatCompletionCreateParams{NonStreaming,Streaming}` cast was originally added to cover the IR's `'minimal'` reasoning effort value (gpt-5), which isn't in SDK 4.104's type union. After last commit's revert, that's the ONLY remaining type mismatch in either adapter — every other field they forward is already in the SDK union. A blanket object-level cast is overkill for a single field. Replace with a narrow field-level `as 'low' | 'medium' | 'high' | null | undefined` on the `reasoning_effort` line itself. Side benefit: TypeScript still type-checks the rest of the params object, so a future typo / wrong-type field would now be caught at compile time instead of slipping past the broad cast. (Note: openai.ts keeps its object-level cast because it forwards `top_k` and `min_p` — those are field NAMES not in the SDK type at all, not just unsupported values, so a field-level cast doesn't help.) Removed the now-unused `ChatCompletionCreateParamsNonStreaming` and `ChatCompletionCreateParamsStreaming` imports from both files. 784/784 passing. Lint clean.

…erification process The previous comment block leaked PR-review process into source code: - "Mirrors the field ordering contract in openai.ts" — decorative pointer - "(Confirmed against the Azure OpenAI Chat Completions API reference, which lists frequency_penalty/presence_penalty/top_p/parallel_tool_calls but not top_k or min_p.)" — verification work, belongs in commit/PR description, not in long-lived source Kept the two genuinely load-bearing parts: - Field ordering rationale (sampling first → extraBody → structural) — without this, future contributors will reorder - Single-line note on why top_k/min_p are absent — defends against future "you forgot these" PRs

MyPrototypeWhat · 2026-05-06T10:34:17Z

@JackChen-me — one design question before this leaves draft.

'minimal' first appeared in [email protected] (the GPT-5 release in Aug 2025). The OMA pin is 4.104.0. Worth noting [email protected] later swapped 'minimal' → 'none' for GPT-5.5, so the union keeps moving.

Three paths:

Bump SDK to ≥5.12.1 (or latest 6.x). The cast disappears. But it's a major-version jump — out of scope for feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200) #209, would want its own PR with regression sweep.
Drop 'minimal' from the IR. All casts go. gpt-5 users who want 'minimal' use extraBody: { reasoning_effort: 'minimal' } — same escape hatch we already point at for vLLM-only top_k / min_p. This is consistent with how the rest of OMA handles "API value newer than SDK type".
Keep current state. Cast is real but narrow (one line per call site, doesn't disable type checking on other fields).

I'd lean (2) for consistency with the existing extraBody pattern, but (1) is genuinely cleaner if you're open to bumping the SDK. (3) is fine to ship as-is too. What's your call?

@MyPrototypeWhat

…ous") After @MyPrototypeWhat's audit ("是否必须的?"), trimmed three categories: 1. Copilot chat() narrow-surface comment: 9 lines → 3 lines. The explicit list of forwarded/not-forwarded fields was redundant with the source code immediately below it. Kept the load-bearing WHY ("Copilot's API isn't publicly documented"). 2. Three test rationale comments removed entirely (4-6 lines each). The test names already convey both WHAT and WHY: - "forwards the OpenAI-cloud-compatible sampling params and extraBody" - "does NOT forward vLLM-only top_k / min_p (Azure runs MS-hosted OpenAI models)" - "does NOT forward sampling params or extraBody that lack public Copilot spec coverage" The verification process and bug history live in PR open-multi-agent#209's description and the per-commit messages, not in long-lived test source. Net -21 lines. 784/784 still passing. Lint clean.

JackChen-me · 2026-05-06T11:00:54Z

@MyPrototypeWhat Agree, let's take option 2.

Remove minimal from the IR for this PR and clean up the casts/tests that only exist for it. No SDK bump here.

…JackChen-me

…JackChen-me) Per owner's call on open-multi-agent#209: option 2. Narrow `ThinkingConfig.effort` to the SDK 4.104-declared values (`'low' | 'medium' | 'high'`) and document that newer API values (gpt-5 `'minimal'`, GPT-5.5 `'none'`) travel via `extraBody: { reasoning_effort: '<value>' }` — the same escape hatch already used for vLLM-only `top_k` / `min_p`. Cleanup: - src/types.ts: ThinkingConfig.effort union narrowed; JSDoc points users to extraBody for newer SDK-untyped values. - src/llm/openai.ts: object-level cast comment trimmed (no longer needs to mention 'minimal'; cast still required for top_k/min_p/extraBody). - src/llm/copilot.ts + src/llm/azure-openai.ts: field-level cast on `reasoning_effort` removed entirely. The IR type now satisfies the SDK type union, so no boundary cast is needed. - tests/openai-adapter.test.ts + tests/azure-openai-adapter.test.ts: rewrote the 'minimal' tests to verify the extraBody route (the now- supported path) instead of the old IR-direct route. - tests/copilot-adapter.test.ts: deleted the 'minimal' test outright. Copilot doesn't forward extraBody (narrow surface, no public spec), so neither route exists for Copilot users wanting 'minimal'. They use 'low' instead, or wait for SDK bump. 783/783 passing. Lint clean. Net -7 lines. No behaviour change for users on `'low' | 'medium' | 'high'` (the entire SDK-typed surface).

MyPrototypeWhat · 2026-05-06T11:19:22Z

Done in 97c3799:

ThinkingConfig.effort narrowed to 'low' | 'medium' | 'high'
All reasoning_effort casts removed from copilot.ts and azure-openai.ts
openai.ts cast comment updated (cast still needed for top_k/min_p/extraBody, just not for 'minimal' anymore)
'minimal' tests on openai + azure-openai converted to verify the extraBody route
'minimal' test on copilot deleted (Copilot has neither IR route nor extraBody route — 'low' works for users; if someone really needs 'minimal' on a gpt-5-routed Copilot model, separate PR can add extraBody to copilot.ts)
types.ts JSDoc on ThinkingConfig.effort points users at extraBody for newer SDK-untyped values

783/783 passing. Flipping to ready-for-review now.

JackChen-me · 2026-05-06T17:27:29Z

LGTM.

MyPrototypeWhat added 2 commits May 6, 2026 14:41

MyPrototypeWhat marked this pull request as draft May 6, 2026 07:56

MyPrototypeWhat changed the title ~~feat(openai): forward thinking.effort as reasoning_effort (#200)~~ feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200) May 6, 2026

MyPrototypeWhat added 5 commits May 6, 2026 16:06

MyPrototypeWhat marked this pull request as ready for review May 6, 2026 10:55

JackChen-me merged commit 450e8da into open-multi-agent:main May 6, 2026
4 checks passed

JackChen-me mentioned this pull request May 13, 2026

[Proposal] Unified reasoning preservation across Anthropic / OpenAI / Gemini adapters #200

Closed

MyPrototypeWhat mentioned this pull request May 13, 2026

[Proposal] Cross-provider reasoning fallback (<thinking> text downgrade) #223

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200)#209

feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200)#209
JackChen-me merged 10 commits into
open-multi-agent:mainfrom
MyPrototypeWhat:feat/openai-reasoning-effort

MyPrototypeWhat commented May 6, 2026 •

edited

Loading

Uh oh!

MyPrototypeWhat commented May 6, 2026

Uh oh!

JackChen-me commented May 6, 2026

Uh oh!

MyPrototypeWhat commented May 6, 2026

Uh oh!

JackChen-me commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MyPrototypeWhat commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Per-field verification (the part that took several iterations)

Coverage after this PR

Tests (+18 across 3 files)

Test plan

Commits in this PR

Uh oh!

MyPrototypeWhat commented May 6, 2026

Uh oh!

JackChen-me commented May 6, 2026

Uh oh!

MyPrototypeWhat commented May 6, 2026

Uh oh!

JackChen-me commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MyPrototypeWhat commented May 6, 2026 •

edited

Loading