Releases: fastxyz/skill-optimizer
Releases · fastxyz/skill-optimizer
v1.1.0
What's new in v1.1.0
Added
- Prompt surface — benchmark and optimize prompt templates, Claude Code skills, and agent instructions. Discovers phases and capabilities from markdown, evaluates output quality with content-based criteria (required sections, format patterns, forbidden keywords, code blocks).
- Codex auth — direct OpenAI model runs can use browser-login tokens or a static
OPENAI_API_KEYstored by Codex (~/.codex/auth.json) instead of requiring an env var. Setbenchmark.authMode: "codex"and"format": "openai"withopenai/<model>IDs. - SKILL folder — bundled AI-agent guidance (
SKILL/SKILL.md) so agents can use skill-optimizer reliably without extra setup. - Stable task IDs — IDs are now a SHA-1 hash of action names (SDK/CLI/MCP) or prompt text (prompt surface), so
--task <id>filters work across regenerations (fixes #17). - Optimizer loop diagram — README includes a visual workflow diagram.
Fixed
- Anthropic tool names — dotted tool names (e.g.
auth.status) are now sanitized toauth_statusbefore sending to the Anthropic API and mapped back in responses. Fixes hard failures on tool-calling benchmarks againstanthropic/models. - Prompt eval on model error — prompt evaluator no longer runs when the model call itself failed;
toolPrecisionis now correctly set to1.0for prompt tasks (no tool calls = vacuously perfect precision). - Config path — running without
--confignow looks for.skill-optimizer/skill-optimizer.json, matching whatinitscaffolds. - Format/prefix validation —
validatenow errors whenbenchmark.format: "openai"is paired with non-openai/model IDs, and vice versa foranthropic/. - Codex static key routing — a plain
OPENAI_API_KEYin~/.codex/auth.jsonnow correctly routes to the direct OpenAI transport instead of the JWT-only Codex transport. A malformedaccess_token(non-JWT) no longer shadows a valid static key fallback. - Model IDs — OpenRouter slugs preserve dots (
openrouter/anthropic/claude-sonnet-4.6); dot→hyphen rewrite applies only toanthropic/direct-API IDs;openai/slugs (e.g.gpt-5.4) are exempt. - Provider prefix is stripped before sending model IDs to
anthropic/andopenai/direct APIs. - Prompt-surface benchmarks no longer hard-fail on coverage violations; coverage is informational.
- Prompt tasks are scored against their specific
capabilityId, not always the first discovered capability.
Breaking changes
CodeModeConfig→SdkSurfaceConfigMcpModeConfig→McpSurfaceConfigExpectedTool→ExpectedActionToolMatch→ActionMatchLEGACY_PROJECT_CONFIG_NAME→ hard-code".skill-optimizer/skill-optimizer.json"toLegacyOptimizeManifest→ removedSurfaceSnapshotArg→ removed
TaskResult fields: toolMatches → actionMatches, hallucinatedCalls → hallucinatedActions, unnecessaryCalls → unnecessaryActions. Re-run benchmark to regenerate report files.
tasks.json files using expected_tools or method on action entries will error on load — rename to expected_actions and name.
The config file skill-benchmark.json is no longer auto-detected — rename to skill-optimizer.json and move it into .skill-optimizer/.
Full Changelog: 1.0.0...1.1.0