AI Agent Evaluations

Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.

View on GitHub

Last run date: March 17, 2026

Agent Performance Results

Model	Agent	Total Evals	Success Rate	Success Rate with AGENTS.md*
GPT 5.3 Codex (xhigh)	Codex	21	86%	100%
GPT 5.4 (xhigh)	Codex	21	86%	95%
Cursor Composer 2.0	Cursor	21	76%	95%
Gemini 3.1 Pro Preview	Gemini CLI	21	76%	100%
Claude Opus 4.6	Claude Code	21	71%	100%
Claude Sonnet 4.6	Claude Code	21	67%	100%
Gemini 3.0 Pro Preview	Gemini CLI	21	67%	90%
Cursor Composer 1.5	Cursor	21	62%	90%
Claude Sonnet 4.5	Claude Code	21	57%	86%
GPT 5.2 Codex (xhigh)	Codex	21	52%	86%
Kimi K2.5	OpenCode	21	19%	52%

* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.