Skip to content

Commit 6cd308e

Browse files
committed
ci-analysis: add Step 0 context gathering, structured output, verify-before-claiming
- Add Step 0: Gather Context section with PR type classification table (code, flow, backport, merge, dependency update) that determines interpretation framework - Add Step 3: Verify before claiming - systematic checklist before labeling failures as infrastructure/transient/PR-related - Add structured output format (summary verdict, failure details, recommended actions) - Replace 'main branch' with 'target branch' throughout - backports and release-branch PRs need comparison against their actual base, not main - Remove redundant tip (covered by Step 0)
1 parent c0fc5fe commit 6cd308e

File tree

1 file changed

+71
-23
lines changed

1 file changed

+71
-23
lines changed

.github/skills/ci-analysis/SKILL.md

Lines changed: 71 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Analyze CI build status and test failures in Azure DevOps and Helix for dotnet r
99

1010
> 🚨 **NEVER** use `gh pr review --approve` or `--request-changes`. Only `--comment` is allowed. Approval and blocking are human-only actions.
1111
12-
**Workflow**: Run the script → read the human-readable output + `[CI_ANALYSIS_SUMMARY]` JSON → synthesize recommendations yourself. The script collects data; you generate the advice.
12+
**Workflow**: Gather PR context (Step 0) → run the script → read the human-readable output + `[CI_ANALYSIS_SUMMARY]` JSON → synthesize recommendations yourself. The script collects data; you generate the advice.
1313

1414
## When to Use This Skill
1515

@@ -114,7 +114,7 @@ The script operates in three distinct modes depending on what information you ha
114114

115115
**Local test failures**: Some repos (e.g., dotnet/sdk) run tests directly on build agents. These can also match known issues - search for the test name with the "Known Build Error" label.
116116

117-
> ⚠️ **Be cautious labeling failures as "infrastructure."** If Build Analysis didn't flag a failure as a known issue, treat it as potentially real — even if it looks like a device failure, Docker issue, or network timeout. Only conclude "infrastructure" when you have strong evidence (e.g., identical failure on main branch, Build Analysis match, or confirmed outage). Dismissing failures as transient without evidence delays real bug discovery.
117+
> ⚠️ **Be cautious labeling failures as "infrastructure."** If Build Analysis didn't flag a failure as a known issue, treat it as potentially real — even if it looks like a device failure, Docker issue, or network timeout. Only conclude "infrastructure" when you have strong evidence (e.g., identical failure on the target branch, Build Analysis match, or confirmed outage). Dismissing failures as transient without evidence delays real bug discovery.
118118
119119
> **Don't confuse "environment-related" with "infrastructure."** A test that fails because a required framework isn't installed (e.g., .NET 2.2) is a **test defect** — the test has wrong assumptions about what's available. Infrastructure failures are *transient*: network timeouts, Docker pull failures, agent crashes, disk space. If the failure would reproduce 100% of the time on any machine with the same setup, it's a code/test issue, not infra. The word "environment" in the error doesn't make it an infrastructure problem.
120120
@@ -133,7 +133,7 @@ Read `recommendationHint` as a starting point, then layer in context:
133133
| `BUILD_SUCCESSFUL` | No failures. Confirm CI is green. |
134134
| `KNOWN_ISSUES_DETECTED` | Known tracked issues found. Recommend retry if failures match known issues. Link the issues. |
135135
| `LIKELY_PR_RELATED` | Failures correlate with PR changes. Lead with "fix these before retrying" and list `correlatedFiles`. |
136-
| `POSSIBLY_TRANSIENT` | No correlation with PR changes, no known issues. Suggest checking main branch, searching for issues, or retrying. |
136+
| `POSSIBLY_TRANSIENT` | No correlation with PR changes, no known issues. Suggest checking the target branch, searching for issues, or retrying. |
137137
| `REVIEW_REQUIRED` | Could not auto-determine cause. Review failures manually. |
138138

139139
Then layer in nuance the heuristic can't capture:
@@ -143,7 +143,7 @@ Then layer in nuance the heuristic can't capture:
143143
- **Build still in progress**: If `lastBuildJobSummary.pending > 0`, note that more failures may appear.
144144
- **Multiple builds**: If `builds` has >1 entry, `lastBuildJobSummary` reflects only the last build — use `totalFailedJobs` for the aggregate count.
145145
- **BuildId mode**: `knownIssues` and `prCorrelation` will be empty (those require a PR number). Don't say "no known issues" — say "Build Analysis not available in BuildId mode."
146-
- **Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on main. See the anti-patterns in "Interpreting Results" above.
146+
- **Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on the target branch. See the anti-patterns in "Interpreting Results" above.
147147

148148
### How to Retry
149149

@@ -157,27 +157,76 @@ Be direct. Lead with the most important finding. Use 2-4 bullet points, not long
157157

158158
## Analysis Workflow
159159

160-
1. **Read PR context first** - Check title, description, comments
161-
2. **Run the script** with `-ShowLogs` for detailed failure info
162-
3. **Check Build Analysis** - Known issues are safe to retry
163-
4. **Correlate with PR changes** - Same files failing = likely PR-related
164-
5. **Compare with baseline** - If a test passes on main but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md)**delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
165-
6. **Interpret patterns** (but don't jump to conclusions):
160+
### Step 0: Gather Context (before running anything)
161+
162+
Before running the script, read the PR to understand what you're analyzing. Context changes how you interpret every failure.
163+
164+
1. **Read PR metadata** — title, description, author, labels, linked issues
165+
2. **Classify the PR type** — this determines your interpretation framework:
166+
167+
| PR Type | How to detect | Interpretation shift |
168+
|---------|--------------|---------------------|
169+
| **Code PR** | Human author, code changes | Failures likely relate to the changes |
170+
| **Flow/Codeflow PR** | Author is `dotnet-maestro[bot]`, title mentions "Update dependencies" | Missing packages may be behavioral, not infrastructure (see anti-pattern below) |
171+
| **Backport** | Title mentions "backport", targets a release branch | Failures may be branch-specific; check if test exists on target branch |
172+
| **Merge PR** | Merging between branches (e.g., release → main) | Conflicts and merge artifacts cause failures, not the individual changes |
173+
| **Dependency update** | Bumps package versions, global.json changes | Build failures often trace to the dependency, not the PR's own code |
174+
175+
3. **Check existing comments** — has someone already diagnosed the failures? Is there a retry pending?
176+
4. **Note the changed files** — you'll use these to evaluate correlation after the script runs
177+
178+
> **Don't skip Step 0.** Running the script without PR context leads to misdiagnosis — especially for flow PRs where "package not found" looks like infrastructure but is actually a code issue.
179+
180+
### Step 1: Run the script
181+
182+
Run with `-ShowLogs` for detailed failure info.
183+
184+
### Step 2: Analyze results
185+
186+
1. **Check Build Analysis** — Known issues are safe to retry
187+
2. **Correlate with PR changes** — Same files failing = likely PR-related
188+
3. **Compare with baseline** — If a test passes on the target branch but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md)**delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
189+
4. **Interpret patterns** (but don't jump to conclusions):
166190
- Same error across many jobs → Real code issue
167191
- Build Analysis flags a known issue → Safe to retry
168192
- Failure is **not** in Build Analysis → Investigate further before assuming transient
169-
- Device failures, Docker pulls, network timeouts → *Could* be infrastructure, but verify against main branch first
193+
- Device failures, Docker pulls, network timeouts → *Could* be infrastructure, but verify against the target branch first
170194
- Test timeout but tests passed → Executor issue, not test failure
171195

196+
### Step 3: Verify before claiming
197+
198+
Before stating a failure's cause, verify your claim:
199+
200+
- **"Infrastructure failure"** → Did Build Analysis flag it? Does the same test pass on the target branch? If neither, don't call it infrastructure.
201+
- **"Transient/flaky"** → Has it failed before? Is there a known issue? A single non-reproducing failure isn't enough to call it flaky.
202+
- **"PR-related"** → Do the changed files actually relate to the failing test? Correlation in the script output is heuristic, not proof.
203+
- **"Safe to retry"** → Are ALL failures accounted for (known issues or infrastructure), or are you ignoring some?
204+
- **"Not related to this PR"** → Have you checked if the test passes on the base branch? Don't assume — verify.
205+
172206
## Presenting Results
173207

174-
The script outputs both human-readable failure details and a `[CI_ANALYSIS_SUMMARY]` JSON block. Use both:
208+
The script outputs both human-readable failure details and a `[CI_ANALYSIS_SUMMARY]` JSON block. Use both to produce a structured response.
209+
210+
### Output structure
211+
212+
Use this format — adapt sections based on what you find:
213+
214+
**1. Summary verdict** (1-2 sentences)
215+
Lead with the most important finding. Is CI green? Are failures PR-related? Known issues?
216+
217+
**2. Failure details** (2-4 bullets)
218+
For each distinct failure category, state: what failed, why (known/correlated/unknown), and evidence.
219+
220+
**3. Recommended actions** (numbered list)
221+
Specific next steps: retry, fix specific files, investigate further. Include `/azp run` commands if retrying.
222+
223+
### How to synthesize
175224

176225
1. Read the JSON summary for structured facts (failed jobs, known issues, PR correlation, recommendation hint)
177226
2. Read the human-readable output for failure details, console logs, and error messages
178-
3. Reason over both to produce contextual recommendations — the `recommendationHint` is a starting point, not the final answer
179-
4. Look for patterns the heuristic may have missed (e.g., same failure across multiple jobs, related failures in different builds)
180-
5. Consider the PR context (what files changed, what the PR is trying to do)
227+
3. Layer in Step 0 context — PR type, author intent, changed files
228+
4. Reason over all three to produce contextual recommendations — the `recommendationHint` is a starting point, not the final answer
229+
5. Look for patterns the heuristic may have missed (e.g., same failure across multiple jobs, related failures in different builds)
181230
6. Present findings with appropriate caveats — state what is known vs. uncertain
182231

183232
## References
@@ -205,11 +254,10 @@ Canceled jobs (typically from timeouts) often still have useful artifacts. The H
205254

206255
## Tips
207256

208-
1. Read PR description and comments first for context
209-
2. Check if same test fails on main branch before assuming transient
210-
3. Look for `[ActiveIssue]` attributes for known skipped tests
211-
4. Use `-SearchMihuBot` for semantic search of related issues
212-
5. Binlogs in artifacts help diagnose MSB4018 task failures
213-
6. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
214-
7. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
215-
8. When investigating internal AzDO pipelines, check `az account show` first to verify authentication before making REST API calls
257+
1. Check if same test fails on the target branch before assuming transient
258+
2. Look for `[ActiveIssue]` attributes for known skipped tests
259+
3. Use `-SearchMihuBot` for semantic search of related issues
260+
4. Binlogs in artifacts help diagnose MSB4018 task failures
261+
5. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
262+
6. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
263+
7. When investigating internal AzDO pipelines, check `az account show` first to verify authentication before making REST API calls

0 commit comments

Comments
 (0)