Prompt Caching
Maximizing Cache Hits
Set x-grok-conv-id (Chat Completions API)
The x-grok-conv-id HTTP header routes requests with the same conversation ID to the same server. Since cache entries are stored per-server, this maximizes your cache hit rate.
curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "x-grok-conv-id: conv_abc123" \
-d '{
"model": "grok-4.20-reasoning",
"messages": [
{"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
{"role": "user", "content": "What is prompt caching?"}
]
}'
Set prompt_cache_key (Responses API)
For the Responses API, use the prompt_cache_key field directly in the request body. It functions identically to setting x-grok-conv-id — it routes requests to the same server for cache reuse.
curl https://api.x.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-d '{
"model": "grok-4.20-reasoning",
"input": "What is prompt caching?",
"prompt_cache_key": "b79ad29b-b3f9-463c-bca6-041d5058d366"
}'
Set x-grok-conv-id metadata (gRPC API)
For the gRPC API using the xAI SDK, pass x-grok-conv-id as gRPC metadata to enable sticky routing for cache reuse.
Python
from xai_sdk import Client
from xai_sdk.chat import system, user
client = Client(
api_key="YOUR_API_KEY",
metadata=(("x-grok-conv-id", "conv_abc123"),),
)
chat = client.chat.create(model="grok-4.20-reasoning")
chat.append(system("You are Grok, a helpful and truthful AI assistant built by xAI."))
chat.append(user("What is prompt caching?"))
response = chat.sample()
print(f"Response: {response.content}")
print(f"Cached tokens: {response.usage.cached_prompt_text_tokens}")
Next
Did you find this page helpful?