You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the Copilot CLI hits transient AI model server errors mid-stream and exits 1 with output, the driver retries with --continue. Attempt 2 then fails in ~1s with No authentication information found, even though the env vars (COPILOT_GITHUB_TOKEN / GITHUB_TOKEN / GH_TOKEN) are unchanged from attempt 1 and were valid for the entire 3+ minute attempt 1. PR #26146 made this fail-fast (no longer burns 3 retries), but the agent is now non-recoverable: a single transient model interruption mid-response = whole job fails, no retry possible.
This is similar to (and likely the same root cause as) #26001, which was closed via #26146 by classifying the failure as non-retryable rather than fixing the underlying behavior.
[copilot-driver] attempt 2: process closed exitCode=1 duration=1s stdout=0B stderr=404B hasOutput=true
Driver classifies as auth error, fails fast. Job exits 1.
Both attempts ran in the same step, with identical env. Nothing changed COPILOT_GITHUB_TOKEN between them.
Hypothesis
The Copilot CLI's --continue mode reads auth from a saved-session credential path (likely under ~/.config/github-copilot/) rather than re-reading env vars. When attempt 1 exits mid-stream from a server error, that on-disk session state is incomplete or invalid. On --continue, the CLI doesn't fall back to env vars, so it reports "No authentication information found".
Proposed fixes (any of these would help)
Fall back to env vars on --continue. If the session-cached credentials are missing/invalid, the Copilot CLI should re-do the env-var auth handshake instead of failing. This is upstream of gh-aw (Copilot CLI repo), but gh-aw could file or sponsor it.
Driver-side recovery: retry-fresh after --continue auth failure. When the driver detects NO_AUTH_INFO_PATTERNon a --continue attempt (not on attempt 1), respawn once without --continue instead of bailing. The model loses mid-stream context but at least the job has a chance to recover. Could be gated behind a feature flag for opt-in.
Driver-side cleanup: wipe Copilot session dir before --continue retry. Probably defeats --continue, so equivalent to option 2.
Summary
When the Copilot CLI hits transient AI model server errors mid-stream and exits 1 with output, the driver retries with
--continue. Attempt 2 then fails in ~1s withNo authentication information found, even though the env vars (COPILOT_GITHUB_TOKEN/GITHUB_TOKEN/GH_TOKEN) are unchanged from attempt 1 and were valid for the entire 3+ minute attempt 1. PR #26146 made this fail-fast (no longer burns 3 retries), but the agent is now non-recoverable: a single transient model interruption mid-response = whole job fails, no retry possible.This is similar to (and likely the same root cause as) #26001, which was closed via #26146 by classifying the failure as non-retryable rather than fixing the underlying behavior.
Reproduction
Workflow run: https://github.com/microsoft/vscode-engineering/actions/runs/25003940174 (
errors-regression-scan, gh-aw v0.68.6, enginecopilotmodelclaude-opus-4.6, withCOPILOT_GITHUB_TOKENas a static repo secret PAT).Sequence in agent step logs:
● Response was interrupted due to a server error. Retrying...Execution failed: Error: Failed to get response from the AI model; retried 5 times[copilot-driver] attempt 1: process closed exitCode=1 duration=3m 47s stdout=6801B stderr=407B hasOutput=true--continue.Error: No authentication information found.[copilot-driver] attempt 2: process closed exitCode=1 duration=1s stdout=0B stderr=404B hasOutput=trueBoth attempts ran in the same step, with identical env. Nothing changed
COPILOT_GITHUB_TOKENbetween them.Hypothesis
The Copilot CLI's
--continuemode reads auth from a saved-session credential path (likely under~/.config/github-copilot/) rather than re-reading env vars. When attempt 1 exits mid-stream from a server error, that on-disk session state is incomplete or invalid. On--continue, the CLI doesn't fall back to env vars, so it reports "No authentication information found".Proposed fixes (any of these would help)
--continue. If the session-cached credentials are missing/invalid, the Copilot CLI should re-do the env-var auth handshake instead of failing. This is upstream of gh-aw (Copilot CLI repo), but gh-aw could file or sponsor it.--continueauth failure. When the driver detectsNO_AUTH_INFO_PATTERNon a--continueattempt (not on attempt 1), respawn once without--continueinstead of bailing. The model loses mid-stream context but at least the job has a chance to recover. Could be gated behind a feature flag for opt-in.--continueretry. Probably defeats--continue, so equivalent to option 2.Option 2 seems lowest-risk and gh-aw-internal — happy to send a PR if the team agrees.
Environment
v0.68.6copilotclaude-opus-4.6COPILOT_GITHUB_TOKEN: static fine-grained PAT in repo secretsgh run rerun --failed(both attempts hit the same failure mode)Related