Skip to content

AI Agent Evaluations

Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.

View on GitHub
Last run date: March 17, 2026

Agent Performance Results

Model
Agent
Total Evals
Success Rate
Success Rate with AGENTS.md*
GPT 5.3 Codex (xhigh)
Codex
21
86%
100%
GPT 5.4 (xhigh)
Codex
21
86%
95%
Cursor Composer 2.0
Cursor
21
76%
95%
Gemini 3.1 Pro Preview
Gemini CLI
21
76%
100%
Claude Opus 4.6
Claude Code
21
71%
100%
Claude Sonnet 4.6
Claude Code
21
67%
100%
Gemini 3.0 Pro Preview
Gemini CLI
21
67%
90%
Cursor Composer 1.5
Cursor
21
62%
90%
Claude Sonnet 4.5
Claude Code
21
57%
86%
GPT 5.2 Codex (xhigh)
Codex
21
52%
86%
Kimi K2.5
OpenCode
21
19%
52%

* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.