Skip to content

feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200)#209

Merged
JackChen-me merged 10 commits into
open-multi-agent:mainfrom
MyPrototypeWhat:feat/openai-reasoning-effort
May 6, 2026
Merged

feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200)#209
JackChen-me merged 10 commits into
open-multi-agent:mainfrom
MyPrototypeWhat:feat/openai-reasoning-effort

Conversation

@MyPrototypeWhat
Copy link
Copy Markdown
Contributor

@MyPrototypeWhat MyPrototypeWhat commented May 6, 2026

Summary

Phase-1 follow-up to #205 (RFC #200). PR is draft while we iterate on scope.

What changed

1. Forward thinking.effort as reasoning_effort across the OpenAI track — the original #200 carve-out. Wires up the IR field added in #205.

2. Backfill missing sampling params on Azure-OpenAI only — discovered while tracing #1. Same silent-drop bug class as the budget-vs-maxTokens issue caught on #205, but only applied to Azure where I have docs to verify each field.

Copilot is intentionally NOT broadened beyond reasoning_effort — see the verification note below.

Per-field verification (the part that took several iterations)

Field OpenAI Azure OpenAI Copilot Verification source
temperature Pre-PR baseline; in Copilot reverse-eng docs
frequency_penalty NOT forwarded Azure: docs verbatim. Copilot: no public spec — keep narrow
presence_penalty NOT forwarded Azure: docs verbatim. Copilot: no public spec — keep narrow
top_p NOT forwarded Azure: docs verbatim. Copilot: no public spec — keep narrow
parallel_tool_calls NOT forwarded Azure: docs verbatim. Copilot: no public spec — keep narrow
top_k ✅ vLLM-only escape hatch ❌ explicitly absent from Azure docs ❌ vLLM-only, no local escape Azure docs + vLLM docs
min_p ✅ vLLM-only escape hatch ❌ explicitly absent from Azure docs ❌ vLLM-only, no local escape Azure docs + vLLM docs
reasoning_effort SDK 4.104 type union
extraBody (spread) ✅ pre-existing ✅ added in this PR NOT added Same Copilot reasoning

Azure verification: fetched learn.microsoft.com/.../azure/ai-services/openai/reference directly — quoted definitions for the 4 sampling params + parallel_tool_calls + logit_bias; top_k/min_p explicitly not listed.

Copilot verification: GitHub does not publish a public Copilot Chat Completions API reference. Reverse-engineered community sources (Alorse/copilot-to-api etc.) show only model/messages/max_tokens/temperature/stream/tools/tool_choice. Forwarding additional fields would have been a guess based on "Copilot is OpenAI-compatible by design", which isn't evidence. Keep the Copilot field surface at pre-PR baseline + reasoning_effort only.

Coverage after this PR

Adapter reasoning_effort Cloud sampling parity
openai ✅ this PR ✅ unchanged
grok / minimax / deepseek / qiniu ✅ inherited via extends OpenAIAdapter ✅ inherited
copilot ✅ this PR — kept narrow (no public spec)
azure-openai ✅ this PR ✅ backfilled (verified)

Tests (+18 across 3 files)

  • tests/openai-adapter.test.ts (+6): reasoning_effort chat/stream forwarding, gpt-5 'minimal', absent omission, effort-unset omission, extraBody override
  • tests/copilot-adapter.test.ts (+6): four reasoning_effort tests + two conservative-surface guards (does NOT forward unverified sampling fields; still forwards documented temperature/max_tokens/tools)
  • tests/azure-openai-adapter.test.ts (+8): four reasoning_effort tests + four sampling-parity tests (cloud subset forwarded, top_k/min_p NOT forwarded, extraBody overrides sampling, extraBody cannot override structural)

784/784 passing. npm run lint clean.

Test plan

  • npm test — 784/784
  • npm run lint — clean
  • Per-field verification per the table above
  • Object-level cast confirmed working for 'minimal' reasoning effort pass-through

Commits in this PR

  1. 4fcfb96feat(openai): add reasoning_effort to openai.ts
  2. c87c683feat(copilot, azure-openai): add reasoning_effort to copilot + azure
  3. 0490b8efix(copilot, azure-openai): backfill sampling params (initial overshoot — included top_k/min_p)
  4. 1285c4bfix(copilot, azure-openai): drop top_k/min_p after verifying neither endpoint accepts them
  5. d95d8c4docs(copilot): disclose that the API spec is reverse-engineered (later superseded by b7e674a)
  6. b7e674arevert(copilot): keep Copilot param surface narrow — no public spec for the 4 sampling fields

Happy to squash on merge or keep granular history — your call.

…-agent#200)

Phase-1 follow-up to open-multi-agent#205. Owner explicitly carved this out as separate
work in their RFC open-multi-agent#200 reply: "OpenAI reasoning_effort is worth adding,
but keep it separate from the phase-1 protocol fix."

The IR field `AgentConfig.thinking.effort` was added in open-multi-agent#205 but only
wired up for Anthropic / Gemini. This PR adds the OpenAI side.

Changes:
- openai.ts chat() and stream(): one-line `reasoning_effort: options.thinking?.effort`
  added before `...options.extraBody` so extraBody can override (consistent
  with how the other sampling params are ordered).
- types.ts AgentConfig.thinking JSDoc: refreshed to document per-provider
  mapping now that OpenAI is wired up.

Design notes:
- The `'minimal'` effort value (gpt-5) isn't yet in the SDK 4.104 type
  union (`'low' | 'medium' | 'high' | null`), but the existing `as
  ChatCompletionCreateParamsNonStreaming` cast at the bottom of the params
  object already covers this, consistent with how `top_k` / `min_p` are
  forwarded. Test verifies pass-through.
- `thinking.enabled` and `thinking.budgetTokens` are intentionally ignored
  by the OpenAI adapter — OpenAI's reasoning surface is qualitative
  (`effort`) only, and the explicit-effort design was the owner's
  open-question resolution in open-multi-agent#200.
- Scope deliberately tight: openai.ts only. Other OpenAI-compatible
  adapters (copilot, azure-openai, deepseek, grok, minimax, qiniu) are
  unchanged — they each route to providers with their own reasoning
  semantics, so a separate PR per adapter is the right unit.

+6 tests covering: chat forwarding, stream forwarding, gpt-5 'minimal'
pass-through, omission when thinking absent, omission when effort unset
(with budgetTokens set, to verify enabled/budgetTokens are correctly
ignored), extraBody override of effort.

770/770 passing.
Both adapters implement LLMAdapter directly (rather than extending
OpenAIAdapter like grok/minimax/deepseek/qiniu do), so they didn't pick
up the reasoning_effort forwarding from the openai.ts change in this PR.
Mirror the same pattern to close the gap:

- Add `reasoning_effort: options.thinking?.effort` to chat() and stream()
  on both adapters.
- Add `as ChatCompletionCreateParams{NonStreaming,Streaming}` casts on
  the params objects (didn't exist before because neither adapter
  forwarded any param outside the SDK's declared union — the cast is
  needed to pass the IR's `'minimal'` value through, same as openai.ts).

+8 tests across copilot-adapter and azure-openai-adapter suites: chat
forwarding, stream forwarding, gpt-5 'minimal' pass-through, omission
when thinking absent / effort unset.

778/778 passing.
@MyPrototypeWhat MyPrototypeWhat marked this pull request as draft May 6, 2026 07:56
…ch openai

The Copilot and Azure-OpenAI adapters were quietly dropping seven
AgentConfig fields that openai.ts has always forwarded:

  frequency_penalty, presence_penalty, top_p, top_k, min_p,
  parallel_tool_calls, extraBody (spread)

A user setting `topP: 0.9` or `extraBody: { logit_bias: {...} }` on a
Copilot/Azure agent would see no effect on the wire, with no warning —
the AgentConfig accepted the field, the adapter silently ignored it.

Mirror the openai.ts param object exactly: same field set, same ordering
(sampling first, ...extraBody, then structural fields so extraBody can
override sampling defaults but cannot clobber model/messages/tools/stream).

This was discovered while tracing the reasoning_effort change — it's not
strictly part of open-multi-agent#200, but the Copilot/Azure adapters were already in
this PR's surface area for the reasoning_effort fix and the silent-drop
behaviour is the same bug class as the budget-vs-maxTokens issue you
caught on open-multi-agent#205. Folding the parity fix in while the diff is open.

+6 tests (3 per adapter):
- forwards full sampling-param set + extraBody
- extraBody overrides sampling params (field-ordering contract)
- extraBody cannot override structural fields (model/stream)

784/784 passing.
@MyPrototypeWhat MyPrototypeWhat changed the title feat(openai): forward thinking.effort as reasoning_effort (#200) feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200) May 6, 2026
…ted by these endpoints

Mid-PR correction. The previous "mirror openai.ts exactly" reasoning was
sloppy: openai.ts forwards `top_k` and `min_p` deliberately for the
local-server case (`provider: 'openai', baseURL: 'http://localhost:8000'`
pointing at vLLM), but Copilot and Azure OpenAI both have hard-coded cloud
endpoints with no equivalent escape hatch.

Verified against real docs:
- Azure OpenAI Chat Completions reference lists `frequency_penalty` /
  `presence_penalty` / `top_p` / `parallel_tool_calls` / `logit_bias`,
  but explicitly does NOT include `top_k` or `min_p`.
- vLLM docs confirm `top_k` / `min_p` are vLLM-only sampling extensions
  not part of the OpenAI Chat Completions spec.
- Copilot uses `api.githubcopilot.com` — also a fixed cloud proxy that
  normalises to standard OpenAI Chat Completions.

So forwarding these as top-level fields on Copilot / Azure would be dead
weight at best and a 400 at worst. Users who genuinely need them on
either adapter can still pass via `extraBody` (e.g. for a local proxy
that re-routes to vLLM).

Removed from copilot.ts and azure-openai.ts param objects in both chat()
and stream(); updated cast comments to drop the `top_k` / `min_p`
mention; replaced the "forwards full set" tests with two narrower tests
each: (a) forwards the OpenAI-cloud-compatible subset, (b) does NOT
forward `top_k` / `min_p` (regression guard).

786/786 passing.
…official

After verifying my assumptions per-field (per @MyPrototypeWhat's review
push), confirmed:

- Azure OpenAI: docs explicitly list frequency_penalty / presence_penalty
  / top_p / parallel_tool_calls (verbatim quotes from learn.microsoft.com).
- Copilot: GitHub doesn't publish a public chat/completions API
  reference. The available references are community-reverse-engineered
  (e.g. Alorse/copilot-to-api), and the documented request examples only
  show messages/model/max_tokens/temperature/stream/tools/tool_choice —
  the four sampling fields aren't called out.

Forwarding them anyway because:
1. Copilot's design promise is OpenAI Chat Completions compatibility.
2. The four fields are standard OpenAI fields, so the proxy promise
   should cover them.
3. Per-model behaviour (some routed reasoning models may ignore them) is
   out of scope for the adapter — same situation as openai.ts forwarding
   topP to OpenAI's o-series, which the model itself ignores.
4. Removing them would re-introduce the silent-drop bug for users whose
   models DO honour the fields.

Added a JSDoc caveat in the chat() params block so future maintainers
see the verification gap and don't mistake the forwarding for a verified
contract.
… sampling fields

Per @MyPrototypeWhat's review: "既然没有公开的证据证明,那就保持原状呗?"
("If there's no public evidence, just keep the original state, right?")

Agreed — Copilot's API isn't publicly documented by GitHub, and the
reverse-engineered references only show model / messages / max_tokens /
temperature / tools / stream / tool_choice. Forwarding additional
sampling fields (frequency_penalty, presence_penalty, top_p,
parallel_tool_calls) and extraBody on the assumption that "Copilot is
OpenAI-compatible" was a guess, not evidence.

Reverting the Copilot side of the sampling-param backfill from this PR.
The Copilot adapter now stays at its pre-PR field surface PLUS the new
`reasoning_effort` (which is the actual open-multi-agent#200 work and verified against
the SDK's typed union).

Replaced the "forwards full set" + "does NOT forward top_k/min_p" tests
with one consolidated "does NOT forward unverified fields" regression
test, plus a positive test that the documented subset (temperature /
max_tokens / tools) still works.

Azure-OpenAI side is unchanged — Microsoft Learn explicitly documents
frequency_penalty / presence_penalty / top_p / parallel_tool_calls /
logit_bias on the Azure OpenAI Chat Completions endpoint, so that
backfill stays.

784/784 passing.
…-level cast on reasoning_effort

The object-level `as ChatCompletionCreateParams{NonStreaming,Streaming}`
cast was originally added to cover the IR's `'minimal'` reasoning effort
value (gpt-5), which isn't in SDK 4.104's type union. After last commit's
revert, that's the ONLY remaining type mismatch in either adapter — every
other field they forward is already in the SDK union.

A blanket object-level cast is overkill for a single field. Replace with
a narrow field-level `as 'low' | 'medium' | 'high' | null | undefined` on
the `reasoning_effort` line itself. Side benefit: TypeScript still
type-checks the rest of the params object, so a future typo / wrong-type
field would now be caught at compile time instead of slipping past the
broad cast.

(Note: openai.ts keeps its object-level cast because it forwards `top_k`
and `min_p` — those are field NAMES not in the SDK type at all, not
just unsupported values, so a field-level cast doesn't help.)

Removed the now-unused `ChatCompletionCreateParamsNonStreaming` and
`ChatCompletionCreateParamsStreaming` imports from both files.

784/784 passing. Lint clean.
…erification process

The previous comment block leaked PR-review process into source code:
- "Mirrors the field ordering contract in openai.ts" — decorative pointer
- "(Confirmed against the Azure OpenAI Chat Completions API reference,
  which lists frequency_penalty/presence_penalty/top_p/parallel_tool_calls
  but not top_k or min_p.)" — verification work, belongs in commit/PR
  description, not in long-lived source

Kept the two genuinely load-bearing parts:
- Field ordering rationale (sampling first → extraBody → structural)
  — without this, future contributors will reorder
- Single-line note on why top_k/min_p are absent — defends against
  future "you forgot these" PRs
@MyPrototypeWhat
Copy link
Copy Markdown
Contributor Author

@JackChen-me — one design question before this leaves draft.

The IR field AgentConfig.thinking.effort was declared in #205 as 'minimal' | 'low' | 'medium' | 'high'. SDK 4.104's ReasoningEffort union is 'low' | 'medium' | 'high' | null — no 'minimal'. Result: every place that hands the value to the SDK needs a cast (currently a narrow field-level as 'low' | 'medium' | 'high' | null | undefined in copilot.ts and azure-openai.ts).

'minimal' first appeared in [email protected] (the GPT-5 release in Aug 2025). The OMA pin is 4.104.0. Worth noting [email protected] later swapped 'minimal' → 'none' for GPT-5.5, so the union keeps moving.

Three paths:

  1. Bump SDK to ≥5.12.1 (or latest 6.x). The cast disappears. But it's a major-version jump — out of scope for feat(openai-track): forward reasoning_effort + backfill sampling-param parity in copilot/azure (#200) #209, would want its own PR with regression sweep.
  2. Drop 'minimal' from the IR. All casts go. gpt-5 users who want 'minimal' use extraBody: { reasoning_effort: 'minimal' } — same escape hatch we already point at for vLLM-only top_k / min_p. This is consistent with how the rest of OMA handles "API value newer than SDK type".
  3. Keep current state. Cast is real but narrow (one line per call site, doesn't disable type checking on other fields).

I'd lean (2) for consistency with the existing extraBody pattern, but (1) is genuinely cleaner if you're open to bumping the SDK. (3) is fine to ship as-is too. What's your call?

…ous")

After @MyPrototypeWhat's audit ("是否必须的?"), trimmed three categories:

1. Copilot chat() narrow-surface comment: 9 lines → 3 lines.
   The explicit list of forwarded/not-forwarded fields was redundant with
   the source code immediately below it. Kept the load-bearing WHY
   ("Copilot's API isn't publicly documented").

2. Three test rationale comments removed entirely (4-6 lines each).
   The test names already convey both WHAT and WHY:
   - "forwards the OpenAI-cloud-compatible sampling params and extraBody"
   - "does NOT forward vLLM-only top_k / min_p (Azure runs MS-hosted OpenAI models)"
   - "does NOT forward sampling params or extraBody that lack public Copilot spec coverage"
   The verification process and bug history live in PR open-multi-agent#209's description
   and the per-commit messages, not in long-lived test source.

Net -21 lines. 784/784 still passing. Lint clean.
@MyPrototypeWhat MyPrototypeWhat marked this pull request as ready for review May 6, 2026 10:55
@JackChen-me
Copy link
Copy Markdown
Member

@MyPrototypeWhat Agree, let's take option 2.

Remove minimal from the IR for this PR and clean up the casts/tests that only exist for it. No SDK bump here.

…JackChen-me)

Per owner's call on open-multi-agent#209: option 2. Narrow `ThinkingConfig.effort` to the
SDK 4.104-declared values (`'low' | 'medium' | 'high'`) and document
that newer API values (gpt-5 `'minimal'`, GPT-5.5 `'none'`) travel via
`extraBody: { reasoning_effort: '<value>' }` — the same escape hatch
already used for vLLM-only `top_k` / `min_p`.

Cleanup:
- src/types.ts: ThinkingConfig.effort union narrowed; JSDoc points users
  to extraBody for newer SDK-untyped values.
- src/llm/openai.ts: object-level cast comment trimmed (no longer needs
  to mention 'minimal'; cast still required for top_k/min_p/extraBody).
- src/llm/copilot.ts + src/llm/azure-openai.ts: field-level cast on
  `reasoning_effort` removed entirely. The IR type now satisfies the SDK
  type union, so no boundary cast is needed.
- tests/openai-adapter.test.ts + tests/azure-openai-adapter.test.ts:
  rewrote the 'minimal' tests to verify the extraBody route (the now-
  supported path) instead of the old IR-direct route.
- tests/copilot-adapter.test.ts: deleted the 'minimal' test outright.
  Copilot doesn't forward extraBody (narrow surface, no public spec),
  so neither route exists for Copilot users wanting 'minimal'. They
  use 'low' instead, or wait for SDK bump.

783/783 passing. Lint clean. Net -7 lines.

No behaviour change for users on `'low' | 'medium' | 'high'` (the entire
SDK-typed surface).
@MyPrototypeWhat
Copy link
Copy Markdown
Contributor Author

Done in 97c3799:

  • ThinkingConfig.effort narrowed to 'low' | 'medium' | 'high'
  • All reasoning_effort casts removed from copilot.ts and azure-openai.ts
  • openai.ts cast comment updated (cast still needed for top_k/min_p/extraBody, just not for 'minimal' anymore)
  • 'minimal' tests on openai + azure-openai converted to verify the extraBody route
  • 'minimal' test on copilot deleted (Copilot has neither IR route nor extraBody route — 'low' works for users; if someone really needs 'minimal' on a gpt-5-routed Copilot model, separate PR can add extraBody to copilot.ts)
  • types.ts JSDoc on ThinkingConfig.effort points users at extraBody for newer SDK-untyped values

783/783 passing. Flipping to ready-for-review now.

@JackChen-me
Copy link
Copy Markdown
Member

LGTM.

@JackChen-me JackChen-me merged commit 450e8da into open-multi-agent:main May 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants