AI Agent Evaluations
Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.
View on GitHub
Last run date: March 17, 2026
Agent Performance Results
Model | Agent | Total Evals | Success Rate | Success Rate with AGENTS.md* |
|---|---|---|---|---|
GPT 5.3 Codex (xhigh) | Codex | 21 | 86% | 100% |
GPT 5.4 (xhigh) | Codex | 21 | 86% | 95% |
Cursor Composer 2.0 | Cursor | 21 | 76% | 95% |
Gemini 3.1 Pro Preview | Gemini CLI | 21 | 76% | 100% |
Claude Opus 4.6 | Claude Code | 21 | 71% | 100% |
Claude Sonnet 4.6 | Claude Code | 21 | 67% | 100% |
Gemini 3.0 Pro Preview | Gemini CLI | 21 | 67% | 90% |
Cursor Composer 1.5 | Cursor | 21 | 62% | 90% |
Claude Sonnet 4.5 | Claude Code | 21 | 57% | 86% |
GPT 5.2 Codex (xhigh) | Codex | 21 | 52% | 86% |
Kimi K2.5 | OpenCode | 21 | 19% | 52% |
* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.