Skip to content

[FEATURE] Support previous_response_id for stateful multi-turn conversations in OpenAIResponsesModel #1957

@sagimedina1

Description

@sagimedina1

Problem Statement

The OpenAIResponsesModel currently resends the full conversation history on every turn, even though the Responses API natively
supports server-side conversation state via previous_response_id. The docstring at the top of openai_responses.py acknowledges
this:

"The Responses API can maintain conversation state server-side through 'previous_response_id'... Note: This implementation currently only implements the stateless approach."

For agentic applications with multi-turn conversations (10+ turns with tool calls), this means:

  • Token costs scale linearly with conversation length — turn 10 resends all previous turns
  • Latency increases as the input payload grows
  • Context window pressure — long conversations hit limits faster, not because of new content, but because of repeated history

This affects both OpenAI and xAI Responses API users, since both support previous_response_id.

Proposed Solution

  1. After each successful response in stream(), capture and store response.id from the completed response event.
  2. On subsequent calls, if a previous_response_id is available, pass it in the request instead of the full message history. Only
    send the new user message(s) and tool results in input.
  3. Fall back to the current stateless approach (full history) if:
    - No previous response ID exists (first turn)
    - The stored response has expired (30-day server retention)
    - The API returns an error indicating the previous response is invalid
  4. Expose a configuration option to enable/disable this behavior (e.g., stateful=True in config), defaulting to disabled for
    backward compatibility.

The key changes would be in:

  • stream() — capture response.id from response.completed event
  • _format_request() — conditionally pass previous_response_id instead of full history
  • State management — store the last response ID (could be returned as metadata alongside usage stats)

Use Case

We run a property management AI agent on Bedrock AgentCore using Strands. Each session is a multi-turn conversation where the PM
asks for analysis, creates action plans, drafts emails, and executes tasks. A typical session is 10-20 turns with heavy tool
use (each turn may involve 2-5 tool calls).

Today, turn 15 of a conversation resends all 14 previous turns plus their tool call/result pairs. With previous_response_id,
turn 15 would only send the new user message — the server already has the rest.

This would help with:

  • Cost reduction — estimated 40-60% input token savings for typical multi-turn sessions
  • Faster responses — less data to transmit and process per turn
  • Longer conversations — more room in the context window for actual content instead of repeated history

Alternatives Solutions

Application-level caching/summarization - can get messy

Additional Context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions