Skip to content

[aw-failures] [aw] Failure Report 2026-05-10 (6h): Go Logger toolchain mismatch + Schema lock-file drift (recurs #31178, #31221) #31314

@github-actions

Description

@github-actions

Executive summary

Two workflow runs failed in the 2026-05-10 ~01:35–07:36 UTC (6h) window; both already have auto-generated tracking issues. Each is a recurrence of an open parent failure cluster — no new P0 emerges, but the Go Logger Enhancement run revealed a new, actionable Go toolchain mismatch (go.mod requires Go 1.25.8, runner has 1.24.13 with GOTOOLCHAIN=local) that is worth tracking on top of the existing #31178 max-turns cluster.

Failure clusters

# Workflow Run Engine Conclusion Cluster Existing tracking
1 Schema Consistency Checker §25621677191 claude failure (activation, 36s) Lock-file drift #31307 (auto), parent #31221
2 Go Logger Enhancement §25619824735 claude (Opus 4.7) failure (agent timeout, 17.4m) Bash permission denials + Go toolchain mismatch #31292 (auto), parent #31178

Evidence

Cluster 1 — Schema Consistency Checker (lock-file drift)

audit of run 25621677191:

  • activation job failed in 13s; agent was skipped.
  • Single error from the activation step:
##[error]ERR_CONFIG: Lock file '.github/workflows/schema-consistency-checker.lock.yml' is outdated! The workflow file '.github/workflows/schema-consistency-checker.md' frontmatter has changed. Run 'gh aw compile' to regenerate the lock file.

audit-diff vs the most recent successful baseline (§25593663016, 2026-05-09 06:05 UTC):

  • 0 new domains, 0 status changes; the only delta is removal of api.anthropic.com:443 traffic (because the agent never started).
  • No firewall, MCP, or quota anomalies — purely a stale-lock failure that prevented the agent from booting.

This is a recurrence of the #31221 "agentic workflows out of sync" pattern. The auto-generated #31307 already provides the exact remediation (gh aw compile).

Cluster 2 — Go Logger Enhancement (agent timeout, mixed root cause)

audit of run 25619824735:

  • activation succeeded (15s); agent ran 16.8m and was terminated as failure (no result event emitted before kill).
  • 247 turns across 56 tool types with tool_breadth=broad and resource_profile=heavy.
  • 0 firewall blocks (143 requests, all allowed).
  • mcp_server_health clean (1 server mcpscripts, 6 calls, 0 errors).
  • 11 anomalous event patterns flagged by template clustering.

Burned-turn pattern from agent-stdio.log (matches #31178):

This Bash command contains multiple operations. The following parts require approval: ls -la /tmp/gh-aw/cache-memory/go-logger/, cat /tmp/gh-aw/cache-memory/go-logger/processed-files.json, ...
ls in '/tmp/gh-aw/cache-memory' was blocked. For security, Claude Code may only list files in the allowed working directories: '/home/runner/work/gh-aw/gh-aw'.
mkdir in '/tmp/gh-aw/cache-memory/go-logger' was blocked.
Output redirection to '/tmp/gh-aw/agent/all_go.txt' was blocked.
find with '-exec' executes commands or modifies files — cannot be auto-allowed by a Bash(find:*) prefix rule
This Bash command contains multiple operations. The following part requires approval: go version

New finding — Go toolchain mismatch (not present in #31178):

go: go.mod requires go >= 1.25.8 (running go 1.24.13; GOTOOLCHAIN=local)
make: *** [Makefile:23: build] Error 1

The runner has Go 1.24.13 and GOTOOLCHAIN=local is set, suppressing the auto-upgrade path that go.mod's 1.25.8 directive would normally trigger. The agent eventually pivoted to the mcpscripts go MCP tool (which has the correct toolchain), but by then most of its time/turn budget was gone — and mcpscripts then hit a hard 60s subprocess timeout for go test ./pkg/workflow/:

Error [-32603]: calling "tools/call": Command failed: /home/runner/work/_temp/gh-aw/mcp-scripts/go.sh (signal: SIGTERM)
stdout:
go test ./pkg/workflow/ -count=1 -timeout 300s

The agent passed -timeout 300s to go test, but the mcpscripts go wrapper enforces a 60s wall-clock kill, so the test was always going to be SIGTERM'd. The wallclock job limit hit before recovery.

Existing issue correlation

Existing issue State Today's evidence Recommendation
#31178 (Claude max-turns / Bash denials, Step Name + Design Decision Gate) open Go Logger Enhancement is a fresh 3rd recurrence with the same /tmp/gh-aw/cache-memory/* denial signature Keep open; Go Logger Enhancement should be added to its "affected workflows" table
#31221 (workflows out of sync) open Schema Consistency Checker hit it again on 2026-05-10 06:19 UTC Keep open until lock files are regenerated and committed; auto-issue #31307 already linked
#31292 (Go Logger Enhancement failed, auto) open, expires 2026-05-10 16:47 UTC Same evidence as Cluster 2 above Link as sub-issue of #30961; let auto-expiry handle close
#31307 (Schema Consistency Checker failed, auto) open, expires 2026-05-10 18:20 UTC Same evidence as Cluster 1 above Link as sub-issue of #30961; let auto-expiry handle close
#31285 / #31286 (printf allow-list) closed Today's failures unrelated; fix landed No action
#31309 (GitHub Remote MCP Auth, "no safe outputs") open The referenced run §25621704254 reports success in the audit; this is a known false-positive class No action; auto-expires
#31297 (Step Name Alignment, cache_memory_miss) open Run §25620382907 reports success; missing-data on cache key Distinct from #31178; leave open for the cache-key bug
#31287 (Daily Firewall Logs Collector, "no safe outputs") open Same pattern as #31309 No action; auto-expires

Proposed fix roadmap

P0

None. Both clusters have existing trackers and neither caused a service-wide outage.

P1 (this 6h window introduces new evidence)

  • Add Go toolchain handling to [aw-failures] Claude workflows hit max-turns burning every turn on Bash permission denials (Step Name Alignment, Design Decision Gate) #31178 follow-up. Either (a) bump runner Go to ≥ 1.25.8 in the Go Logger Enhancement workflow runtimes.go block, or (b) drop GOTOOLCHAIN=local so go.mod triggers an auto-download. Without this fix, every Go-touching agent workflow will keep failing on make build until the agent finds the mcpscripts go wrapper. (success criteria: a fresh Go Logger Enhancement run completes make build on the host without falling back to mcpscripts)
  • Raise the mcpscripts go.sh subprocess timeout above 60s for go test invocations, or document the cap in the workflow prompt so the agent doesn't pass -timeout 300s it can never use. (success criteria: a go test ./pkg/workflow/ -timeout 300s invocation via mcpscripts completes without SIGTERM)

P2

Sub-issues linked

References

Generated by [aw] Failure Investigator (6h) · ● 15.4M ·

  • expires on May 17, 2026, 7:45 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions