You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add Step 0: Gather Context section with PR type classification table
(code, flow, backport, merge, dependency update) that determines
interpretation framework
- Add Step 3: Verify before claiming - systematic checklist before
labeling failures as infrastructure/transient/PR-related
- Add structured output format (summary verdict, failure details,
recommended actions)
- Replace 'main branch' with 'target branch' throughout - backports
and release-branch PRs need comparison against their actual base,
not main
- Remove redundant tip (covered by Step 0)
Copy file name to clipboardExpand all lines: .github/skills/ci-analysis/SKILL.md
+71-23Lines changed: 71 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ Analyze CI build status and test failures in Azure DevOps and Helix for dotnet r
9
9
10
10
> 🚨 **NEVER** use `gh pr review --approve` or `--request-changes`. Only `--comment` is allowed. Approval and blocking are human-only actions.
11
11
12
-
**Workflow**: Run the script → read the human-readable output + `[CI_ANALYSIS_SUMMARY]` JSON → synthesize recommendations yourself. The script collects data; you generate the advice.
12
+
**Workflow**: Gather PR context (Step 0) → run the script → read the human-readable output + `[CI_ANALYSIS_SUMMARY]` JSON → synthesize recommendations yourself. The script collects data; you generate the advice.
13
13
14
14
## When to Use This Skill
15
15
@@ -114,7 +114,7 @@ The script operates in three distinct modes depending on what information you ha
114
114
115
115
**Local test failures**: Some repos (e.g., dotnet/sdk) run tests directly on build agents. These can also match known issues - search for the test name with the "Known Build Error" label.
116
116
117
-
> ⚠️ **Be cautious labeling failures as "infrastructure."** If Build Analysis didn't flag a failure as a known issue, treat it as potentially real — even if it looks like a device failure, Docker issue, or network timeout. Only conclude "infrastructure" when you have strong evidence (e.g., identical failure on main branch, Build Analysis match, or confirmed outage). Dismissing failures as transient without evidence delays real bug discovery.
117
+
> ⚠️ **Be cautious labeling failures as "infrastructure."** If Build Analysis didn't flag a failure as a known issue, treat it as potentially real — even if it looks like a device failure, Docker issue, or network timeout. Only conclude "infrastructure" when you have strong evidence (e.g., identical failure on the target branch, Build Analysis match, or confirmed outage). Dismissing failures as transient without evidence delays real bug discovery.
118
118
119
119
> ❌ **Don't confuse "environment-related" with "infrastructure."** A test that fails because a required framework isn't installed (e.g., .NET 2.2) is a **test defect** — the test has wrong assumptions about what's available. Infrastructure failures are *transient*: network timeouts, Docker pull failures, agent crashes, disk space. If the failure would reproduce 100% of the time on any machine with the same setup, it's a code/test issue, not infra. The word "environment" in the error doesn't make it an infrastructure problem.
120
120
@@ -133,7 +133,7 @@ Read `recommendationHint` as a starting point, then layer in context:
133
133
|`BUILD_SUCCESSFUL`| No failures. Confirm CI is green. |
134
134
|`KNOWN_ISSUES_DETECTED`| Known tracked issues found. Recommend retry if failures match known issues. Link the issues. |
135
135
|`LIKELY_PR_RELATED`| Failures correlate with PR changes. Lead with "fix these before retrying" and list `correlatedFiles`. |
136
-
|`POSSIBLY_TRANSIENT`| No correlation with PR changes, no known issues. Suggest checking main branch, searching for issues, or retrying. |
136
+
|`POSSIBLY_TRANSIENT`| No correlation with PR changes, no known issues. Suggest checking the target branch, searching for issues, or retrying. |
137
137
|`REVIEW_REQUIRED`| Could not auto-determine cause. Review failures manually. |
138
138
139
139
Then layer in nuance the heuristic can't capture:
@@ -143,7 +143,7 @@ Then layer in nuance the heuristic can't capture:
143
143
-**Build still in progress**: If `lastBuildJobSummary.pending > 0`, note that more failures may appear.
144
144
-**Multiple builds**: If `builds` has >1 entry, `lastBuildJobSummary` reflects only the last build — use `totalFailedJobs` for the aggregate count.
145
145
-**BuildId mode**: `knownIssues` and `prCorrelation` will be empty (those require a PR number). Don't say "no known issues" — say "Build Analysis not available in BuildId mode."
146
-
-**Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on main. See the anti-patterns in "Interpreting Results" above.
146
+
-**Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on the target branch. See the anti-patterns in "Interpreting Results" above.
147
147
148
148
### How to Retry
149
149
@@ -157,27 +157,76 @@ Be direct. Lead with the most important finding. Use 2-4 bullet points, not long
2.**Run the script** with `-ShowLogs` for detailed failure info
162
-
3.**Check Build Analysis** - Known issues are safe to retry
163
-
4.**Correlate with PR changes** - Same files failing = likely PR-related
164
-
5.**Compare with baseline** - If a test passes on main but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md) — **delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
165
-
6.**Interpret patterns** (but don't jump to conclusions):
2.**Classify the PR type** — this determines your interpretation framework:
166
+
167
+
| PR Type | How to detect | Interpretation shift |
168
+
|---------|--------------|---------------------|
169
+
|**Code PR**| Human author, code changes | Failures likely relate to the changes |
170
+
|**Flow/Codeflow PR**| Author is `dotnet-maestro[bot]`, title mentions "Update dependencies" | Missing packages may be behavioral, not infrastructure (see anti-pattern below) |
171
+
|**Backport**| Title mentions "backport", targets a release branch | Failures may be branch-specific; check if test exists on target branch |
172
+
|**Merge PR**| Merging between branches (e.g., release → main) | Conflicts and merge artifacts cause failures, not the individual changes |
173
+
|**Dependency update**| Bumps package versions, global.json changes | Build failures often trace to the dependency, not the PR's own code |
174
+
175
+
3.**Check existing comments** — has someone already diagnosed the failures? Is there a retry pending?
176
+
4.**Note the changed files** — you'll use these to evaluate correlation after the script runs
177
+
178
+
> ❌ **Don't skip Step 0.** Running the script without PR context leads to misdiagnosis — especially for flow PRs where "package not found" looks like infrastructure but is actually a code issue.
179
+
180
+
### Step 1: Run the script
181
+
182
+
Run with `-ShowLogs` for detailed failure info.
183
+
184
+
### Step 2: Analyze results
185
+
186
+
1.**Check Build Analysis** — Known issues are safe to retry
187
+
2.**Correlate with PR changes** — Same files failing = likely PR-related
188
+
3.**Compare with baseline** — If a test passes on the target branch but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md) — **delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
189
+
4.**Interpret patterns** (but don't jump to conclusions):
166
190
- Same error across many jobs → Real code issue
167
191
- Build Analysis flags a known issue → Safe to retry
168
192
- Failure is **not** in Build Analysis → Investigate further before assuming transient
169
-
- Device failures, Docker pulls, network timeouts → *Could* be infrastructure, but verify against main branch first
193
+
- Device failures, Docker pulls, network timeouts → *Could* be infrastructure, but verify against the target branch first
170
194
- Test timeout but tests passed → Executor issue, not test failure
171
195
196
+
### Step 3: Verify before claiming
197
+
198
+
Before stating a failure's cause, verify your claim:
199
+
200
+
-**"Infrastructure failure"** → Did Build Analysis flag it? Does the same test pass on the target branch? If neither, don't call it infrastructure.
201
+
-**"Transient/flaky"** → Has it failed before? Is there a known issue? A single non-reproducing failure isn't enough to call it flaky.
202
+
-**"PR-related"** → Do the changed files actually relate to the failing test? Correlation in the script output is heuristic, not proof.
203
+
-**"Safe to retry"** → Are ALL failures accounted for (known issues or infrastructure), or are you ignoring some?
204
+
-**"Not related to this PR"** → Have you checked if the test passes on the base branch? Don't assume — verify.
205
+
172
206
## Presenting Results
173
207
174
-
The script outputs both human-readable failure details and a `[CI_ANALYSIS_SUMMARY]` JSON block. Use both:
208
+
The script outputs both human-readable failure details and a `[CI_ANALYSIS_SUMMARY]` JSON block. Use both to produce a structured response.
209
+
210
+
### Output structure
211
+
212
+
Use this format — adapt sections based on what you find:
213
+
214
+
**1. Summary verdict** (1-2 sentences)
215
+
Lead with the most important finding. Is CI green? Are failures PR-related? Known issues?
216
+
217
+
**2. Failure details** (2-4 bullets)
218
+
For each distinct failure category, state: what failed, why (known/correlated/unknown), and evidence.
219
+
220
+
**3. Recommended actions** (numbered list)
221
+
Specific next steps: retry, fix specific files, investigate further. Include `/azp run` commands if retrying.
222
+
223
+
### How to synthesize
175
224
176
225
1. Read the JSON summary for structured facts (failed jobs, known issues, PR correlation, recommendation hint)
177
226
2. Read the human-readable output for failure details, console logs, and error messages
178
-
3.Reason over both to produce contextual recommendations — the `recommendationHint` is a starting point, not the final answer
179
-
4.Look for patterns the heuristic may have missed (e.g., same failure across multiple jobs, related failures in different builds)
180
-
5.Consider the PR context (what files changed, what the PR is trying to do)
4.Reason over all three to produce contextual recommendations — the `recommendationHint` is a starting point, not the final answer
229
+
5.Look for patterns the heuristic may have missed (e.g., same failure across multiple jobs, related failures in different builds)
181
230
6. Present findings with appropriate caveats — state what is known vs. uncertain
182
231
183
232
## References
@@ -205,11 +254,10 @@ Canceled jobs (typically from timeouts) often still have useful artifacts. The H
205
254
206
255
## Tips
207
256
208
-
1. Read PR description and comments first for context
209
-
2. Check if same test fails on main branch before assuming transient
210
-
3. Look for `[ActiveIssue]` attributes for known skipped tests
211
-
4. Use `-SearchMihuBot` for semantic search of related issues
212
-
5. Binlogs in artifacts help diagnose MSB4018 task failures
213
-
6. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
214
-
7. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
215
-
8. When investigating internal AzDO pipelines, check `az account show` first to verify authentication before making REST API calls
257
+
1. Check if same test fails on the target branch before assuming transient
258
+
2. Look for `[ActiveIssue]` attributes for known skipped tests
259
+
3. Use `-SearchMihuBot` for semantic search of related issues
260
+
4. Binlogs in artifacts help diagnose MSB4018 task failures
261
+
5. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
262
+
6. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
263
+
7. When investigating internal AzDO pipelines, check `az account show` first to verify authentication before making REST API calls
0 commit comments