Compilation Process
This guide documents the internal compilation process that transforms markdown workflow files into executable GitHub Actions YAML. Understanding this process helps when debugging workflows, optimizing performance, or contributing to the project.
Overview
Section titled “Overview”The gh aw compile command transforms a markdown workflow file into a complete GitHub Actions .lock.yml by embedding frontmatter and setting up runtime loading of the markdown body. The process runs five compilation phases (parsing, validation, job construction, dependency resolution, and YAML generation) described below.
When the workflow runs, the markdown body is loaded at runtime — you can edit instructions without recompilation. See Editing Workflows for details.
Compilation Phases
Section titled “Compilation Phases”Phase 1: Parsing and Validation
Section titled “Phase 1: Parsing and Validation”The compiler extracts the YAML frontmatter, validates it against the workflow schema, validates expression safety (only allow-listed GitHub Actions expressions), and resolves imports.
Import Resolution
Section titled “Import Resolution”Imports are resolved with a deterministic breadth-first traversal: starting from imports: in the main workflow, each file is loaded, its configurations are extracted, and any nested imports are appended to the queue. Visited files are tracked to detect cycles.
| Field | Merge strategy |
|---|---|
| Tools | Deep merge; arrays concatenated and deduplicated |
| MCP servers | Imported servers override main-workflow servers with the same name |
| Network | Union of allowed domains, deduplicated and sorted |
| Permissions | Validation only — main must satisfy imported requirements |
| Safe outputs | Main workflow overrides imported configurations per type |
| Runtimes | Main workflow versions override imported versions |
Processing order follows BFS:
Main Workflow├── import-a.md → Processed 1st│ ├── nested-1.md → Processed 3rd (after import-b)│ └── nested-2.md → Processed 4th└── import-b.md → Processed 2nd └── nested-3.md → Processed 5thSee Imports Reference for complete merge semantics.
Phases 2–5: Building the Workflow
Section titled “Phases 2–5: Building the Workflow”| Phase | Steps |
|---|---|
| 2 Job Construction | Builds specialized jobs: pre-activation (if needed), activation, agent, safe outputs, safe-jobs, and custom jobs |
| 3 Dependency Resolution | Validates job dependencies, detects circular references, computes topological order, generates Mermaid graph |
| 4 Action Pinning | Pins all actions to SHAs: check cache → GitHub API → embedded pins → add version comment (e.g., actions/checkout@sha # v6) |
| 5 YAML Generation | Assembles final .lock.yml: header with metadata, Mermaid dependency graph, alphabetical jobs, embedded original prompt |
Job Types
Section titled “Job Types”The compilation process generates specialized jobs based on workflow configuration:
| Job | Trigger | Purpose | Key Dependencies |
|---|---|---|---|
| pre_activation | Role checks, stop-after deadlines, skip-if-match, or command triggers | Validates permissions, deadlines, and conditions before AI execution | None (runs first) |
| activation | Always | Prepares workflow context, sanitizes event text, validates lock file freshness | pre_activation (if exists) |
| agent | Always | Core job that executes AI agent with configured engine, tools, and Model Context Protocol (MCP) servers | activation |
| detection | safe-outputs.threat-detection: configured | Scans agent output for security threats before processing | agent |
| Safe output jobs | Corresponding safe-outputs.*: configured | Process agent output to perform GitHub API operations (create issues/PRs, add comments, upload assets, etc.) | agent, detection (if exists) |
| conclusion | Always (if safe outputs exist) | Aggregates results and generates workflow summary | All safe output jobs |
Agent Job Steps
Section titled “Agent Job Steps”The agent job runs: repository checkout and runtime setup (Node.js, Python, Go) → cache restoration → MCP container initialization → prompt generation from the markdown body → engine execution (Copilot, Claude, or Codex) → output upload as a GitHub Actions artifact → cache persistence. Key environment variables: GH_AW_PROMPT (prompt file), GH_AW_SAFE_OUTPUTS (output JSON), GITHUB_TOKEN.
Safe Output Jobs
Section titled “Safe Output Jobs”Every safe output job follows the same pattern: download the agent artifact, parse its JSON, execute the corresponding GitHub API operation with the right permissions, and link to related items. Available types include create_issue, create_discussion, add_comment, create_pull_request, create_pr_review_comment, create_code_scanning_alert, add_labels, assign_milestone, update_issue, update_release, push_to_pr_branch, upload_assets, update_project, missing_tool, and noop.
Custom Jobs
Section titled “Custom Jobs”Use safe-outputs.jobs: for custom jobs with full GitHub Actions syntax, or jobs: for additional workflow jobs with user-defined dependencies. See Deterministic & Agentic Patterns for examples of multi-stage workflows combining deterministic computation with AI reasoning.
Job Dependency Graphs
Section titled “Job Dependency Graphs”Jobs execute in topological order based on dependencies. Here’s a comprehensive example:
graph LR pre_activation["pre_activation"] activation["activation"] agent["agent"] detection["detection"] create_issue["create_issue"] add_comment["add_comment"] conclusion["conclusion"] pre_activation --> activation activation --> agent agent --> detection agent --> create_issue agent --> add_comment detection --> create_issue detection --> add_comment create_issue --> add_comment create_issue --> conclusion add_comment --> conclusion
Execution flow: Pre-activation validates permissions → Activation prepares context → Agent executes AI → Detection scans output → Safe outputs run in parallel → Add comment waits for created items → Conclusion summarizes results. Safe output jobs without cross-dependencies run concurrently; when threat detection is enabled, safe outputs depend on both agent and detection jobs.
Why Detection, Safe Outputs, and Conclusion Are Separate Jobs
Section titled “Why Detection, Safe Outputs, and Conclusion Are Separate Jobs”A typical compiled workflow contains these post-agent jobs:
flowchart TD
activation["activation<br/>ubuntu-slim<br/>contents: read"] --> agent["agent<br/>ubuntu-latest<br/>READ-ONLY permissions<br/>concurrency group"]
agent --> detection["detection<br/>ubuntu-latest<br/>contents: read<br/>concurrency group<br/>RUNS AI ENGINE"]
agent --> conclusion["conclusion<br/>ubuntu-slim<br/>issues: write<br/>pr: write"]
detection --> safe_outputs["safe_outputs<br/>ubuntu-slim<br/>contents: write<br/>issues: write<br/>pr: write"]
detection --> conclusion
safe_outputs --> conclusion
detection --> update_cache_memory["update_cache_memory<br/>ubuntu-latest<br/>contents: read"]
update_cache_memory --> conclusion
activation --> safe_outputs
activation --> conclusion
These three jobs form a sequential security pipeline rooted in Plan-Level Trust — AI reasoning (read-only) is separated from write operations. They cannot be merged because GitHub Actions permissions are per-job and immutable for the duration of a job:
| Job | Key Permissions | Rationale |
|---|---|---|
| detection | contents: read | Runs AI analysis — must not have write access |
| safe_outputs | contents: write, issues: write, pull-requests: write | Executes GitHub API write operations |
| conclusion | issues: write, pull-requests: write, discussions: write | Updates comments, handles failures |
A combined job would hold write permissions while running threat detection, defeating least privilege and letting a compromised agent bypass the gate. Job-level isolation also enables:
- Hard gating. The
safe_outputsjob conditionneeds.detection.outputs.success == 'true'prevents the runner from starting at all if detection fails. Step-levelifchecks within one job are weaker. always()semantics forconclusion. It inspects upstream results vianeeds.agent.resultto log errors and report missing tools even when writes fail.- Right-sized runners. Detection needs
ubuntu-latestfor AI execution; safe_outputs and conclusion use the lightweightubuntu-slim. - Concurrency isolation. Detection shares a concurrency group with the agent job to serialize AI execution; safe_outputs intentionally does not, so it can run alongside other workflows’ detection phases.
- Artifact-based handoff. The agent writes
agent_output.json; detection emitssuccess; safe_outputs only downloads the artifact if approved. A shared filesystem in a single job would allow output tampering between phases.
Action Pinning
Section titled “Action Pinning”All GitHub Actions are pinned to commit SHAs (e.g., actions/[email protected] # v6) to defend against supply chain attacks — tags can be moved, SHAs cannot. Resolution order is cache (.github/aw/actions-lock.json) → GitHub API → embedded pins.
The actions-lock.json Cache
Section titled “The actions-lock.json Cache”.github/aw/actions-lock.json caches resolved action@version → SHA mappings so compilation produces consistent results regardless of the available token. Resolving a tag to a SHA requires GitHub API access, which fails under restricted tokens — notably the GitHub Copilot Coding Agent (CCA) token. With the cache, CCA and similar restricted environments reuse SHAs from a prior compile run with a broader-scope token.
Commit actions-lock.json to version control so every contributor and automated tool uses the same immutable pins. Refresh with gh aw update-actions, or delete and recompile with a permissive token to force full re-resolution.
The gh-aw-actions Repository
Section titled “The gh-aw-actions Repository”github/gh-aw-actions contains the reusable actions that power compiled workflows. Every action step in a generated .lock.yml references it (usually by commit SHA, occasionally by a stable tag like v0 when SHA resolution is unavailable):
uses: github/gh-aw-actions/setup@abc1234...Never edit these references by hand — run gh aw compile or gh aw update-actions to regenerate them. Use --actions-repo (with --action-mode action) to compile against a fork or specific tag during development; see Compilation Commands.
Dependabot and gh-aw-actions
Section titled “Dependabot and gh-aw-actions”Dependabot may open PRs to bump github/gh-aw-actions to a newer SHA. Do not merge them — pin updates must come from gh aw compile, which coordinates pins across all compiled workflows from a single release. gh aw compile automatically inserts an ignore rule when a github-actions update block exists in .github/dependabot.yml. When enabling Dependabot from scratch, use:
updates: - package-ecosystem: github-actions directory: "/.github/workflows" ignore: - dependency-name: "github/gh-aw-actions/**" # Managed by gh aw compile. Version-locked to the gh-aw compiler; do not bump.Artifacts Created
Section titled “Artifacts Created”Workflows generate several artifacts during execution:
| Artifact | Location | Purpose | Lifecycle |
|---|---|---|---|
| agent_output.json | /tmp/gh-aw/safeoutputs/ | AI agent output with structured safe output data (create_issue, add_comment, etc.) | Uploaded by agent job, downloaded by safe output jobs, auto-deleted after 90 days |
| agent_usage.json | /tmp/gh-aw/ | Aggregated token counts: {"input_tokens":…,"output_tokens":…,"cache_read_tokens":…,"cache_write_tokens":…} | Bundled in the unified agent artifact when the firewall is enabled; accessible to third-party tools without parsing step summaries |
| prompt.txt | /tmp/gh-aw/aw-prompts/ | Generated prompt sent to AI agent (includes markdown instructions, imports, context variables) | Retained for debugging and reproduction |
| firewall-audit-logs | See structure below | Dedicated artifact for AWF audit/observability logs (token usage, network policy, audit trail) | Uploaded by all firewall-enabled workflows; analyzed by gh aw logs --artifacts firewall |
| firewall-logs/ | /tmp/gh-aw/sandbox/firewall/logs/ | Network access logs in Squid format (when network.firewall: enabled) | Analyzed by gh aw logs command |
| cache-memory/ | /tmp/gh-aw/cache-memory/ | Persistent agent memory across runs (when tools.cache-memory: configured) | Restored at start, saved at end via GitHub Actions cache |
| patches/, sarif/, metadata/ | Various | Safe output data (git patches, SARIF files, metadata JSON) | Temporary, cleaned after processing |
firewall-audit-logs Artifact Structure
Section titled “firewall-audit-logs Artifact Structure”The firewall-audit-logs artifact is a dedicated multi-file artifact uploaded by all firewall-enabled workflows. It is separate from the unified agent artifact. Downstream workflows that need token usage data or firewall audit logs must download this artifact specifically.
firewall-audit-logs/├── api-proxy-logs/│ └── token-usage.jsonl ← Token usage data per request├── squid-logs/│ └── access.log ← Network policy log (allow/deny)├── audit.jsonl ← Firewall audit trail└── policy-manifest.json ← Policy configuration snapshotTip: Use
gh aw logs <run-id> --artifacts firewallto download and analyze firewall data instead ofgh run downloaddirectly. The CLI handles artifact naming and backward compatibility automatically. See the Artifacts reference for the complete artifact naming guide.
MCP Server Integration
Section titled “MCP Server Integration”Model Context Protocol (MCP) servers provide tools to AI agents. Compilation emits mcp-config.json from the workflow’s tool configuration. Local servers run in Docker containers with auto-generated Dockerfiles and connect via stdio; HTTP servers connect directly with configured headers and authentication. allowed: restricts which tools the agent sees, and secrets inject through Dockerfile env vars (local) or config references (HTTP). At runtime, MCP containers start after runtime setup, the engine executes with tool access, then containers stop.
Pre-Activation Job
Section titled “Pre-Activation Job”Pre-activation runs gating checks sequentially before any AI execution. Any failure sets activated=false, skipping downstream jobs and saving costs:
- Role checks (
roles:) — actor has admin/maintainer/write permission - Stop-after (
on.stop-after:) — workflow has not passed its deadline (e.g.,+30d,2024-12-31) - Skip-if-match (
skip-if-match:) — no existing item matches the dedup criteria - Command position (
on.slash_command:) — slash command appears in the first 3 lines
Compilation Commands
Section titled “Compilation Commands”| Command | Description |
|---|---|
gh aw compile | Compile all workflows in .github/workflows/ |
gh aw compile my-workflow | Compile specific workflow |
gh aw compile --verbose | Enable verbose output |
gh aw compile --strict | Enhanced security validation |
gh aw compile --no-emit | Validate without generating files |
gh aw compile --actionlint --zizmor --poutine | Run security scanners |
gh aw compile --purge | Remove orphaned .lock.yml files |
gh aw compile --output /path/to/output | Custom output directory |
gh aw compile --action-mode action --actions-repo owner/repo | Compile using a custom actions repository (requires --action-mode action) |
gh aw compile --action-mode action --actions-repo owner/repo --action-tag branch-or-sha | Compile against a specific branch or SHA in a fork |
gh aw compile --action-tag v1.2.3 | Pin action references to a specific tag or SHA (implies release mode) |
gh aw validate | Validate all workflows (compile + all linters, no file output) |
gh aw validate my-workflow | Validate a specific workflow |
gh aw validate --json | Validate and output results in JSON format |
gh aw validate --strict | Validate with strict mode enforced |
Debugging Compilation
Section titled “Debugging Compilation”Run DEBUG=workflow:* gh aw compile my-workflow --verbose to trace job creation, action pin resolution, tool configuration, and MCP setup. Inspect generated .lock.yml files for header comments, the Mermaid dependency graph, job structure, SHA pins, and MCP config. Common fixes: circular dependencies → review needs: clauses; missing action pin → add to action_pins.json or enable dynamic resolution; invalid MCP config → verify command, args, env.
Performance
Section titled “Performance”Simple workflows compile in ~100ms; workflows with imports in ~500ms; workflows that resolve action SHAs dynamically in ~2s. To keep compilation fast, commit .github/aw/actions-lock.json and minimize import depth. At runtime, safe output jobs without cross-dependencies run in parallel; enable cache: and cache-memory: for further speedups.
Advanced Topics
Section titled “Advanced Topics”- Custom engines: implement an engine that returns GitHub Actions steps and tool access, then register it with the framework.
- Schema extension: add frontmatter fields by updating the workflow schema, rebuilding (
make build), and wiring up parser handling. - Workflow manifest: imported files are tracked in lock file headers for update detection and audit trails.
Related Documentation
Section titled “Related Documentation”- Editing Workflows - When to recompile vs edit directly
- Frontmatter Reference - All configuration options
- Tools Reference - Tool configuration guide
- Safe Outputs Reference - Output processing
- Engines Reference - AI engine configuration
- Network Reference - Network permissions