AI Gateway for Multi-Provider LLMs
Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding.
📡 All agents connect via localhost:20128/v1 or cloud.omniroute.online/v1 — one config, unlimited models and quota
160+ providers, Memory System, Skills Framework, Vision Bridge, Responses API, 29 MCP tools, and much more.
Persistent conversational memory across sessions. Extraction, injection, retrieval, and summarization modules keep your AI context-aware between conversations.
Extensible skill registry with built-in and custom skills. Sandboxed execution, interception/injection pipeline, and 4 MCP skill tools for agent control.
Cross-provider image and vision support. Automatically translates image inputs between OpenAI, Anthropic, and Gemini formats. 51 test scenarios validated.
Full OpenAI Responses API support. Requests are internally translated to Chat Completions format, dispatched, and response-streamed back in Responses API events.
Use your ChatGPT Plus, Grok, Perplexity Pro, Blackbox, and Meta AI subscriptions as API providers. Just paste your session cookie — no API key needed.
Perplexity, Serper, Brave, Exa, Tavily, Google PSE, Linkup, SearchAPI, You.com, and SearXNG. Ground AI responses with real-time web data.
22 core tools + 3 memory tools + 4 skill tools. Budget guards, route simulation, session snapshots, DB health, pricing sync, and agent-assisted memory/skill management.
Cross-session context relay strategy. Transfer conversation context between providers seamlessly. 13 routing strategies including context-relay and context-optimized.
Install once, connect providers, and code non-stop with automatic 4-tier fallback routing.
Install globally, connect your providers, and start coding with smart auto-fallback routing.
Run one command to install OmniRoute globally on your system.
Open Dashboard and add your API keys or OAuth connections. Free providers available!
Configure Claude Code, Cursor, Cline, or any OpenAI-compatible tool.
Run OmniRoute as a container with persistent data volume.
See how OmniRoute compares to alternatives.
| Feature | OmniRoute | LiteLLM |
|---|---|---|
| Providers Supported | 160+ | 200+ |
| Free Tier Routing | ✓ | ✗ |
| Dashboard UI | ✓ | ✗ |
| Semantic Cache | ✓ | ✗ |
| Circuit Breaker | ✓ | ✓ |
| 13 Routing Strategies | ✓ | ✗ |
| LLM Evaluations | ✓ | ✗ |
| Translator Playground | ✓ | ✗ |
| CLI Tools Manager | ✓ | ✗ |
| Custom Combos | ✓ | ✗ |
| MCP Server (29 tools) | ✓ | ✗ |
| A2A Protocol (Agent-to-Agent) | ✓ | ✗ |
| Desktop App | ✓ | ✗ |
| Usage Analytics | ✓ | ✓ |
| Cost Management | ✓ | ✓ |
| Docker Deploy | ✓ | ✓ |
| Media Playground (Image/Video/Audio/TTS) | ✓ | ✗ |
| Registered Keys API | ✓ | ✗ |
| Auto-Combo Engine (Self-Healing) | ✓ | ✗ |
| Web Search Providers (5) | ✓ | ✗ |
| Per-Model Combo Routing | ✓ | ✗ |
| 160+ Provider Icons (SVG) | ✓ | ✗ |
| Memory System (Persistent) | ✓ | ✗ |
| Skills Framework (Extensible) | ✓ | ✗ |
| Vision Bridge (Cross-Provider) | ✓ | ✓ |
| Responses API | ✓ | ✓ |
| Web/Cookie Providers (5) | ✓ | ✗ |
| 10 Search Providers | ✓ | ✗ |
| Context Handoff / Relay | ✓ | ✗ |
| Self-hosted & Free | ✓ | ✓ |
Connect via OAuth, API Key, Web Cookie, or use completely free providers.
iFlow AI
Qwen Code
Kiro AI
Gemini CLI
Claude Code
OpenAI
Anthropic
Google AI
Antigravity
OpenClaw
Groq
DeepSeek
xAI (Grok)
Mistral
Together AI
Fireworks
Perplexity
Cerebras
Cohere
OpenRouter
GLM (ZhipuAI)
MiniMax
Moonshot
Nebius
NVIDIA
Sambanova
Novita AI
Chutes AI
Kluster AI
InfiniAI
Targon
AI21 Labs
Lambda
Lepton AI
Deepgram
Alibaba DashScope
LongCat AI
Pollinations
AI/ML API
Kimi Coding
Alibaba Coding
Ollama Cloud
Amazon Q
GitLab Duo
Cline (OAuth)
Kimi Coding
ChatGPT Web
Grok Web
Perplexity Web
Blackbox Web
CrofAI
Azure OpenAI
Amazon Bedrock
DeepInfra
Meta Llama API
Databricks
Snowflake Cortex
Venice.ai
Poe
Heroku AI
IBM watsonx
Runway
Brave Search
Exa Search
Tavily Search
Serper
Everything you need to route, monitor, and optimize your AI usage.
Subscription → API Key → Cheap → Free. Automatic switching when quota runs out, zero downtime.
When a model is unavailable, automatically falls back to sibling models in the same family before returning an error.
Priority, weighted, round-robin, context-relay, fill-first, P2C, random, least-used, cost-optimized, strict-random, auto-combo, LKGP, and context-optimized. Per-combo or global.
Auto-open and close per-provider with configurable cooldowns. Self-healing after failures.
Mutex + automatic rate-limiting for API key providers. Prevents quota exhaustion spikes.
5-second dedup window for duplicate requests. Saves tokens and prevents double-sends.
Two-tier cache (exact + semantic similarity) reduces cost and latency for repeated queries.
Seamless OpenAI ↔ Claude ↔ Gemini format translation. Use any model with any client.
Automatically parse and handle <think> tags from reasoning models like DeepSeek R1.
Built-in protection against prompt injection attacks on your AI endpoints.
Automatically selects the best model based on content type — coding, analysis, vision, summarization. 7 task types.
Live token consumption, reset countdown, and cost estimation per provider.
Full dashboard with tokens, costs, trends over time. Filter by provider, model, or period.
Track spending with editable per-model pricing. Set budget alerts and limits.
Dashboard with healthcheck per provider, token validation, and auto-refresh status.
Golden set testing with 4 match strategies: exact, contains, regex, custom JS function.
Built-in Chat Tester and Test Bench. Test any model in real-time from the dashboard.
Configure Claude Code, Codex, OpenClaw, Kilo, Droid, and Cline directly from the dashboard.
Create unlimited model combinations with 6 balancing strategies. Fine-tune routing per combo.
Add multiple accounts per provider. Round-robin load balancing and automatic failover.
Full media generation: Image (NanoBanana, SD WebUI, ComfyUI), Video, Music, Audio Transcription (2GB, Deepgram, AssemblyAI), and Text-to-Speech (ElevenLabs, Cartesia, PlayHT).
Sync config across devices via Cloudflare Workers. 300+ global edge locations.
Create scoped API keys with model restrictions, time-based access schedules, and enable/disable toggles.
Organize provider connections by environment (dev/prod). Accordion view with smart auto-switch.
Model Context Protocol server with 29 agent-control tools (22 core + 3 memory + 4 skills). 3 transports: stdio, SSE, Streamable HTTP. 10 scoped auth levels.
Agent-to-Agent orchestration with JSON-RPC 2.0, task streaming, SSE heartbeat, and smart-routing skill.
Native Electron app for Windows, macOS, and Linux. System tray, auto-update, offline support, single-instance lock.
3-tier pricing resolution synced from LiteLLM. User overrides → synced → defaults. Opt-in via settings.
Persistent memory across sessions with extraction, injection, retrieval, and summarization. 3 MCP memory tools for agent-controlled recall.
Extensible skill registry with built-in and custom skills. Sandboxed executor, interception/injection pipeline, A2A integration, and 4 MCP skill tools.
Cross-provider image support. Translates vision inputs between OpenAI, Anthropic, and Gemini formats automatically. 51 test scenarios.
Full OpenAI Responses API compatibility. TransformStream converts Chat Completions SSE chunks into Responses API event format.
Use ChatGPT Plus, Grok, Perplexity Pro, Blackbox, and Meta AI subscriptions as API providers via session cookies.
Provider audit module for compliance enforcement. Optional MITM proxy with cert management, DNS handling, and target routing.
Monitor everything in real-time. Manage providers, combos, analytics, and more.
Run locally, in a container, on a VM, or at the edge.
Install globally for local development
Container with persistent data volume
Deploy on Akamai, AWS, DigitalOcean
Edge deployment with D1 database