Skip to content

copilot-driver: --continue retry fails with 'No authentication information found' after mid-stream server error, no recovery path #28774

@bryanchen-d

Description

@bryanchen-d

Summary

When the Copilot CLI hits transient AI model server errors mid-stream and exits 1 with output, the driver retries with --continue. Attempt 2 then fails in ~1s with No authentication information found, even though the env vars (COPILOT_GITHUB_TOKEN / GITHUB_TOKEN / GH_TOKEN) are unchanged from attempt 1 and were valid for the entire 3+ minute attempt 1. PR #26146 made this fail-fast (no longer burns 3 retries), but the agent is now non-recoverable: a single transient model interruption mid-response = whole job fails, no retry possible.

This is similar to (and likely the same root cause as) #26001, which was closed via #26146 by classifying the failure as non-retryable rather than fixing the underlying behavior.

Reproduction

Workflow run: https://github.com/microsoft/vscode-engineering/actions/runs/25003940174 (errors-regression-scan, gh-aw v0.68.6, engine copilot model claude-opus-4.6, with COPILOT_GITHUB_TOKEN as a static repo secret PAT).

Sequence in agent step logs:

  1. Agent does ~3m of work, all MCP calls succeed, finishes the analysis.
  2. While the model emits its final response, 5 consecutive: ● Response was interrupted due to a server error. Retrying...
  3. Execution failed: Error: Failed to get response from the AI model; retried 5 times
  4. [copilot-driver] attempt 1: process closed exitCode=1 duration=3m 47s stdout=6801B stderr=407B hasOutput=true
  5. Driver sleeps 5s, respawns with --continue.
  6. Error: No authentication information found.
  7. [copilot-driver] attempt 2: process closed exitCode=1 duration=1s stdout=0B stderr=404B hasOutput=true
  8. Driver classifies as auth error, fails fast. Job exits 1.

Both attempts ran in the same step, with identical env. Nothing changed COPILOT_GITHUB_TOKEN between them.

Hypothesis

The Copilot CLI's --continue mode reads auth from a saved-session credential path (likely under ~/.config/github-copilot/) rather than re-reading env vars. When attempt 1 exits mid-stream from a server error, that on-disk session state is incomplete or invalid. On --continue, the CLI doesn't fall back to env vars, so it reports "No authentication information found".

Proposed fixes (any of these would help)

  1. Fall back to env vars on --continue. If the session-cached credentials are missing/invalid, the Copilot CLI should re-do the env-var auth handshake instead of failing. This is upstream of gh-aw (Copilot CLI repo), but gh-aw could file or sponsor it.
  2. Driver-side recovery: retry-fresh after --continue auth failure. When the driver detects NO_AUTH_INFO_PATTERN on a --continue attempt (not on attempt 1), respawn once without --continue instead of bailing. The model loses mid-stream context but at least the job has a chance to recover. Could be gated behind a feature flag for opt-in.
  3. Driver-side cleanup: wipe Copilot session dir before --continue retry. Probably defeats --continue, so equivalent to option 2.
  4. Better diagnostics. When the driver bails on auth-on-retry, log the auth-token availability check that PR fix(copilot-driver): handle auth failures in --continue attempts #26146 added (it currently only logs at startup) so users can confirm env vars are still present.

Option 2 seems lowest-risk and gh-aw-internal — happy to send a PR if the team agrees.

Environment

  • gh-aw v0.68.6
  • Engine: copilot
  • Model: claude-opus-4.6
  • COPILOT_GITHUB_TOKEN: static fine-grained PAT in repo secrets
  • Run: workflow_dispatch, then gh run rerun --failed (both attempts hit the same failure mode)

Related

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions