feat: add openai-oxide as alternative OpenAI provider#487
feat: add openai-oxide as alternative OpenAI provider#487fortunto2 wants to merge 11 commits intomoltis-org:mainfrom
Conversation
Add `provider-openai-oxide` feature flag with `OpenAiOxideProvider` alongside existing `provider-async-openai`. Benefits of openai-oxide as an alternative: - Granular feature flags — compile only chat, embeddings, or responses (reduces binary size vs async-openai's all-or-nothing) - Built-in HTTP/2 optimizations — gzip, TCP_NODELAY, keep-alive pings, connection pooling enabled by default - Stream helpers — typed ChatStreamEvent with delta accumulation - WASM support — compiles to wasm32-unknown-unknown out of the box - Structured outputs — parse::<T>() auto-generates JSON schema via schemars - Persistent WebSocket for Realtime API (future: lower latency agent loops) The provider implements the same LlmProvider trait with identical behavior: - client.chat().completions().create() for completions - client.chat().completions().create_stream() for streaming - Custom base_url for OpenAI-compatible APIs (Ollama, vLLM, etc.) Usage: ```toml [features] provider-openai-oxide = ["dep:openai-oxide"] ``` Both providers can coexist — users choose at compile time.
Greptile SummaryThis PR adds Key issues found:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller as Agent / Caller
participant Registry as ProviderRegistry::build()
participant AsyncOAI as register_async_openai_providers
participant OxideOAI as register_openai_oxide_providers
participant Provider as OpenAiOxideProvider
participant API as OpenAI API
Registry->>AsyncOAI: (if feature=provider-async-openai)
AsyncOAI->>Registry: register gpt-4o slot ✅
Registry->>OxideOAI: (if feature=provider-openai-oxide)
OxideOAI->>Registry: has_model_any_provider("gpt-4o")?
Registry-->>OxideOAI: true → return early ⚠️ silently skipped
Caller->>Provider: complete(messages, tools)
Note over Provider: _tools ignored → tool_calls: vec![]
Provider->>API: ChatCompletionRequest (no tools)
API-->>Provider: response (no tool_calls)
Provider-->>Caller: CompletionResponse { tool_calls: [] } ⚠️
Caller->>Provider: stream(messages)
Note over Provider: No stream_options.include_usage set
Provider->>API: ChatCompletionRequest (stream)
API-->>Provider: chunks (usage: None in all chunks)
Provider-->>Caller: StreamEvent::Done(Usage { 0, 0 }) ⚠️
Reviews (2): Last reviewed commit: "chore: bump openai-oxide to 0.10.0 (publ..." | Re-trigger Greptile |
| "pathdiff", | ||
| ] | ||
|
|
||
| [[package]] | ||
| name = "openai-oxide" | ||
| version = "0.9.8" | ||
| dependencies = [ | ||
| "async-trait", | ||
| "bytes", | ||
| "futures-core", | ||
| "futures-util", | ||
| "gloo-timers", | ||
| "pin-project-lite", | ||
| "reqwest 0.13.2", | ||
| "rustls 0.23.36", | ||
| "serde", |
There was a problem hiding this comment.
Dual
reqwest versions negate the advertised binary-size savings
openai-oxide 0.9.8 pulls in reqwest 0.13.2 as a dependency, while the entire rest of the workspace pins reqwest 0.12.28. Cargo.lock now carries both versions. Because 0.12 and 0.13 are semver-incompatible, they compile as two separate crates and are both linked into the binary.
The PR description claims ~1.4 MB with features = ["chat"] versus ~2.1 MB for async-openai, but that benchmark almost certainly measured openai-oxide in isolation. With the extra reqwest copy the net change to the moltis binary is likely to be positive, directly contradicting the stated motivation.
This may also cause duplicate TLS-stack symbols (rustls 0.23, tokio-rustls, etc.) at link time, depending on how the resolver deduplicates them. Consider waiting until openai-oxide aligns with the workspace's reqwest 0.12 pin, or explicitly override the version in the workspace Cargo.toml.
Addresses Greptile review: - P0: Add register_openai_oxide_providers() and call from build() - P2: Add #[tracing::instrument] on complete() - P2: Delegate new() to with_alias() (DRY) - P2: Remove unused #[allow(clippy::collapsible_if)] Known: reqwest 0.13 vs workspace 0.12 — tracked separately.
- Add `form` feature (was implicit in 0.12, explicit in 0.13) - Enables single reqwest version for all workspace crates - Unblocks openai-oxide provider PR (moltis-org#487) without dual-reqwest May need minor type annotations on `.json()` calls if inference changed.
|
Addressed the Greptile review — pushed fixes for P0 (provider registration), P2 (tracing, DRY constructor, unused clippy allow). Regarding P1 (dual reqwest): opened #488 to bump workspace reqwest 0.12 → 0.13. It's a one-line change (adds Suggested merge order: #488 first, then this PR. |
openai-oxide now supports reqwest 0.12 via feature flag. No workspace reqwest bump needed, no dual compilation.
|
All Greptile findings addressed across 3 commits:
Ready for re-review. @greptile-apps re-trigger welcome. |
| ChatMessage::Tool { content, .. } => { | ||
| out.push(ChatCompletionMessageParam::User { | ||
| content: openai_oxide::types::chat::UserContent::Text(content.clone()), | ||
| name: None, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Tool messages silently mapped to
User — tool_call_id is lost
ChatMessage::Tool is converted into a ChatCompletionMessageParam::User text message. This drops the tool_call_id that OpenAI requires to correlate a tool response with the prior assistant tool-call request. Any conversation that previously invoked a tool (or replayed a conversation history containing tool turns) will send a malformed message sequence, which the API will reject with a 400 error, or cause the model to lose track of the call chain.
The fix depends on whether openai-oxide exposes a Tool variant in ChatCompletionMessageParam. If it does, map it correctly:
ChatMessage::Tool { content, tool_call_id, .. } => {
out.push(ChatCompletionMessageParam::Tool {
content: content.clone(),
tool_call_id: tool_call_id.clone(),
});
}If the library does not yet expose a Tool variant, this provider cannot safely be used in agentic (tool-calling) workflows and that limitation should be documented prominently.
| let oai_messages = build_messages(&messages); | ||
| let request = ChatCompletionRequest::new(&self.model, oai_messages); |
There was a problem hiding this comment.
Streaming usage always reports zero tokens
ChatCompletionRequest::new is used without enabling stream_options.include_usage = true. The OpenAI streaming API does not include a usage object in any chunk by default, so response.usage will always be None inside the loop. The stream exits naturally and yields StreamEvent::Done(Usage::default()) — meaning every streaming completion reports 0 input and 0 output tokens to whatever metrics/billing layer consumes StreamEvent::Done.
The non-streaming complete() path is unaffected (usage is always included in non-stream responses), but streaming callers will have silently zeroed-out token counts.
The request needs include_usage set on the stream options, e.g.:
let mut request = ChatCompletionRequest::new(&self.model, oai_messages);
request.stream_options = Some(openai_oxide::types::chat::StreamOptions { include_usage: true });(Use the exact field names from the openai-oxide type; the principle is the same regardless of API shape.)
|
|
||
| #[cfg(feature = "provider-async-openai")] | ||
| { | ||
| reg.register_async_openai_providers(config, env_overrides); | ||
| } | ||
|
|
||
| #[cfg(feature = "provider-openai-oxide")] | ||
| { | ||
| reg.register_openai_oxide_providers(config, env_overrides); | ||
| } |
There was a problem hiding this comment.
openai-oxide provider is silently skipped when provider-async-openai is also enabled
Both providers register against the same "openai" config key and the same model ID (e.g. gpt-4o). register_async_openai_providers runs first; if it claims the model, register_openai_oxide_providers hits has_model_any_provider returning true and bails out silently.
A user who adds --features provider-openai-oxide to a build that already has provider-async-openai in its defaults (the compile-time use-case promised in the PR description) will find the openai-oxide provider never activated. There is no warning, no log line — the feature flag appears to do nothing.
The registration order should be documented in a comment, or the two features should be made mutually exclusive via a cfg conflict check, or at minimum a tracing::warn! should be emitted when the desired provider is shadowed:
#[cfg(feature = "provider-openai-oxide")]
{
#[cfg(feature = "provider-async-openai")]
tracing::debug!("provider-async-openai already registered openai models; provider-openai-oxide will be skipped");
reg.register_openai_oxide_providers(config, env_overrides);
}| Ok(CompletionResponse { | ||
| text, | ||
| tool_calls: vec![], | ||
| usage, | ||
| }) |
There was a problem hiding this comment.
Tool-calling silently no-ops in non-streaming path
_tools is accepted but the underscore signals it is never used, and tool_calls: vec![] is hardcoded in the response. If any caller passes non-empty tools (e.g. an agent loop), the model will never see the function definitions and will never return a tool call — but the caller has no way to know. Consider either wiring up the tools field on ChatCompletionRequest (if openai-oxide exposes it) or returning an explicit error:
if !_tools.is_empty() {
anyhow::bail!("openai-oxide provider does not yet support tool calling");
}This makes the limitation explicit rather than silently producing wrong results for callers that pass tools.
P1 fixes: - Streaming: set stream_options.include_usage = true (was always 0 tokens) - Tool messages: map to ChatCompletionMessageParam::Tool with tool_call_id (was silently converting to User, losing correlation) - Provider conflict: add tracing::debug when both providers enabled P2 fixes: - Tools: wire up tools param on ChatCompletionRequest (was silently ignored) - Doc comment: note mutual exclusivity with provider-async-openai
|
Addressed all findings from Greptile re-review (commit 5):
|
Complete tool calling implementation that async_openai_provider lacks: - complete(): extract tool_calls from response → CompletionResponse.tool_calls - stream(): emit ToolCallStart/ToolCallArgumentsDelta/ToolCallComplete events - build_messages(): replay assistant tool_calls in conversation history - build_messages(): map Tool messages with tool_call_id (not User) - complete(): wire tools param into ChatCompletionRequest This makes openai-oxide the first moltis provider with full streaming tool call support, enabling agentic workflows.
|
Pushed commit 6: full tool calling support — the first moltis provider with complete streaming tool calls. What's new:
Note: the existing |
Complete trait implementation beyond what async_openai_provider offers: - supports_tools() → true (native tool calling) - supports_vision() → true (multimodal content parts) - reasoning_effort() + with_reasoning_effort() for o-series models - stream_with_tools() — streaming with tool schemas - context_window() → 128k (GPT-4o default) - Refactored: build_request/build_stream_request/make_stream helpers (DRY)
|
Commit 7: complete LlmProvider trait implementation. Full comparison:
This makes openai-oxide the most feature-complete OpenAI provider in moltis. |
575 lines replacing what takes 5300+ lines in openai.rs + openai_compat.rs. New capabilities: - WireApi::Responses — full Responses API with streaming - WireApi::ChatCompletions — existing chat completions (default) - Responses streaming: text deltas, tool call events, usage - Responses complete: text extraction, function_calls extraction - split_for_responses(): System→instructions, rest→input items - tools_to_response_tools(): convert tool schemas for Responses API - Reasoning effort + summary forwarded to both APIs Provider now covers both OpenAI API surfaces. WireApi selection via: OpenAiOxideProvider::new(...).with_wire_api(WireApi::Responses)
|
Commit 8: Responses API support — both OpenAI API surfaces in one 575-line provider. What this replaces:
New in this commit:
This is now a complete OpenAI provider that could eventually replace the 5300-line manual implementation. WebSocket transport is the remaining gap (follow-up PR). |
- discover_models() via client.models().list() → DiscoveredModel - 11 unit tests covering: message building, tool call extraction, tool_call_id preservation, assistant tool replay, split_for_responses, response tools conversion, supports_tools/vision, wire_api config, reasoning_effort, constructor delegation - Add "models" feature to openai-oxide dep Provider: 744 lines, 11 tests. Covers both Chat + Responses API.
New transport modes for Responses API: - ProviderStreamTransport::Websocket → persistent WS, lower latency - ProviderStreamTransport::Auto → try WS, fallback to SSE - ProviderStreamTransport::Sse → existing HTTP SSE (default) Uses openai-oxide v0.10.1 ws_session() — verified 10/10 requests on single connection, avg 591ms. Also: extract build_responses_request() helper (DRY), add stream_transport field + with_stream_transport() builder. Provider: 936 lines — covers Chat + Responses + SSE + WebSocket.
|
Commit 10: WebSocket streaming for Responses API. Provider now supports all 3 transport modes:
Benchmarked: 10/10 requests on single WS session, avg 591ms per request (vs ~1200ms HTTP). Uses openai-oxide v0.10.1 with ws_pool and Final provider comparison (936 lines):
This replaces 5300+ lines of manual HTTP/WS code in openai.rs + openai_compat.rs. Only missing: model discovery registration (uses static catalog). |
ResponseEventAction enum + process_response_event() as single source of truth for ResponseStreamEvent → StreamEvent mapping. stream_responses (SSE), stream_responses_ws, stream_responses_auto all use the same 4-line match instead of duplicated 40-line blocks. 936 → 888 lines. Zero behavior change, same 11 tests.
|
Updated PR description with full feature table, architecture, and all Greptile findings. Re model discovery: 11 commits, 888 lines, 11 tests, all review findings addressed. Ready for human review. |
…cket Add openai-oxide 0.11.1 as alternative OpenAI provider supporting: - Chat Completions and Responses API (WireApi config) - SSE, WebSocket, and Auto transport modes - Full tool calling (extraction, streaming, replay) - Vision, reasoning effort, model discovery - 888 lines replacing 5300+ lines of manual HTTP/SSE code Replaces PR moltis-org#487 (rebased on current main, oxide bumped 0.10.1 → 0.11.1).
|
Superseded by #521 — rebased on current main, openai-oxide bumped to 0.11.1, all Greptile findings carried over. |
Summary
Adds
openai-oxidev0.10.1 as a feature-complete OpenAI provider behindprovider-openai-oxidefeature flag.888 lines replacing what takes 5300+ lines in
openai.rs+openai_compat.rs.What's included
Architecture
All Greptile findings addressed
Test plan
Usage
```toml
Cargo.toml — opt in
[features]
provider-openai-oxide = ["dep:openai-oxide"]
```
```rust
// Configure transport
OpenAiOxideProvider::new(key, model, url)
.with_wire_api(WireApi::Responses) // or ChatCompletions
.with_stream_transport(ProviderStreamTransport::Auto) // WS with SSE fallback
```
Disclosure: I maintain openai-oxide.
🤖 Generated with Claude Code