Releases · fastxyz/skill-optimizer

What's new in v1.1.0

Added

Prompt surface — benchmark and optimize prompt templates, Claude Code skills, and agent instructions. Discovers phases and capabilities from markdown, evaluates output quality with content-based criteria (required sections, format patterns, forbidden keywords, code blocks).
Codex auth — direct OpenAI model runs can use browser-login tokens or a static OPENAI_API_KEY stored by Codex (~/.codex/auth.json) instead of requiring an env var. Set benchmark.authMode: "codex" and "format": "openai" with openai/<model> IDs.
SKILL folder — bundled AI-agent guidance (SKILL/SKILL.md) so agents can use skill-optimizer reliably without extra setup.
Stable task IDs — IDs are now a SHA-1 hash of action names (SDK/CLI/MCP) or prompt text (prompt surface), so --task <id> filters work across regenerations (fixes #17).
Optimizer loop diagram — README includes a visual workflow diagram.

Fixed

Anthropic tool names — dotted tool names (e.g. auth.status) are now sanitized to auth_status before sending to the Anthropic API and mapped back in responses. Fixes hard failures on tool-calling benchmarks against anthropic/ models.
Prompt eval on model error — prompt evaluator no longer runs when the model call itself failed; toolPrecision is now correctly set to 1.0 for prompt tasks (no tool calls = vacuously perfect precision).
Config path — running without --config now looks for .skill-optimizer/skill-optimizer.json, matching what init scaffolds.
Format/prefix validation — validate now errors when benchmark.format: "openai" is paired with non-openai/ model IDs, and vice versa for anthropic/.
Codex static key routing — a plain OPENAI_API_KEY in ~/.codex/auth.json now correctly routes to the direct OpenAI transport instead of the JWT-only Codex transport. A malformed access_token (non-JWT) no longer shadows a valid static key fallback.
Model IDs — OpenRouter slugs preserve dots (openrouter/anthropic/claude-sonnet-4.6); dot→hyphen rewrite applies only to anthropic/ direct-API IDs; openai/ slugs (e.g. gpt-5.4) are exempt.
Provider prefix is stripped before sending model IDs to anthropic/ and openai/ direct APIs.
Prompt-surface benchmarks no longer hard-fail on coverage violations; coverage is informational.
Prompt tasks are scored against their specific capabilityId, not always the first discovered capability.

Breaking changes

CodeModeConfig → SdkSurfaceConfig
McpModeConfig → McpSurfaceConfig
ExpectedTool → ExpectedAction
ToolMatch → ActionMatch
LEGACY_PROJECT_CONFIG_NAME → hard-code ".skill-optimizer/skill-optimizer.json"
toLegacyOptimizeManifest → removed
SurfaceSnapshotArg → removed

TaskResult fields: toolMatches → actionMatches, hallucinatedCalls → hallucinatedActions, unnecessaryCalls → unnecessaryActions. Re-run benchmark to regenerate report files.

tasks.json files using expected_tools or method on action entries will error on load — rename to expected_actions and name.

The config file skill-benchmark.json is no longer auto-detected — rename to skill-optimizer.json and move it into .skill-optimizer/.

Full Changelog: 1.0.0...1.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's new in v1.1.0

Added

Fixed

Breaking changes

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: fastxyz/skill-optimizer

v1.1.0

What's new in v1.1.0

Added

Fixed

Breaking changes

Uh oh!

v1.0.0

Uh oh!