Skip to content

feat: task-level retry with exponential backoff#37

Merged
JackChen-me merged 4 commits into
mainfrom
feat/task-retry
Apr 3, 2026
Merged

feat: task-level retry with exponential backoff#37
JackChen-me merged 4 commits into
mainfrom
feat/task-retry

Conversation

@JackChen-me
Copy link
Copy Markdown
Member

Summary

Closes #30

  • Add maxRetries, retryDelayMs, retryBackoff to task config
  • On task failure, retry with exponential backoff (delay * backoff^attempt)
  • Fresh agent conversation per retry — no corrupted state carry-over
  • Emit task_retry event via onProgress with attempt number, error, and next delay
  • Cascade failure to dependents only after all retries exhausted
  • Works with both runTeam() and runTasks()
  • Zero behavior change when retry fields are not set (default: no retry)

Usage

const tasks = [
  {
    title: 'Generate report',
    assignee: 'analyst',
    description: '...',
    maxRetries: 2,        // retry up to 2 times
    retryDelayMs: 1000,   // 1s before first retry
    retryBackoff: 2,      // exponential: 1s, 2s
  },
]

const result = await orchestrator.runTasks(team, tasks)

Files changed

File Change
src/types.ts Add retry fields to Task, add task_retry to OrchestratorEvent
src/task/task.ts createTask() accepts and passes through retry config
src/orchestrator/orchestrator.ts Retry loop in executeQueue(), sleep() helper, plumbing in runTasks()/loadSpecsIntoQueue()
tests/task-retry.test.ts NEW — 7 tests covering retry success, exhaustion, backoff calculation, no-retry default

Test plan

  • npm run build — compiles clean
  • npm run lint — type check passes
  • npm test — 93/93 tests pass (7 new)

Add `maxRetries`, `retryDelayMs`, and `retryBackoff` to task config.
When a task fails and retries remain, the orchestrator waits with
exponential backoff and re-runs the task with a fresh agent conversation.
A `task_retry` event is emitted via `onProgress` for observability.
Cascade failure only occurs after all retries are exhausted.

Closes #30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7714e6189

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/orchestrator/orchestrator.ts Outdated
const sharedMem = team.getSharedMemoryInstance()
if (sharedMem) {
await sharedMem.write(assignee, `task:${task.id}:result`, result.output)
const maxAttempts = (task.maxRetries ?? 0) + 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate retry count before deriving attempt loop

The new retry loop computes maxAttempts as (task.maxRetries ?? 0) + 1 without bounding input, so any negative maxRetries (from runTasks input or parsed specs) yields maxAttempts <= 0 and the for loop never executes. Because the task was already set to in_progress, it is never completed or failed, and executeQueue can exit with a silently unexecuted task. Please clamp/validate maxRetries to a non-negative integer (or fail fast) before entering the loop.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #38 (99b028d) — maxRetries is now clamped with Math.max(0, ...) before entering the loop.

Comment on lines +812 to +814
maxRetries: spec.maxRetries,
retryDelayMs: spec.retryDelayMs,
retryBackoff: spec.retryBackoff,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse retry fields from coordinator task specs

The runTeam path now threads spec.maxRetries, spec.retryDelayMs, and spec.retryBackoff into created tasks, but parseTaskSpecs() still only copies title/description/assignee/dependsOn, so these retry fields remain undefined for coordinator-generated tasks. In practice, retry settings emitted by the coordinator are dropped, so task-level retries only work via runTasks and not via runTeam as intended.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design — retry config is a user-facing option for runTasks(). The coordinator in runTeam() is not expected to generate retry fields, so parseTaskSpecs() intentionally omits them.

- Extract `executeWithRetry()` as a testable exported function
- Add `computeRetryDelay()` with 30s max cap (prevents runaway backoff)
- Remove retry fields from `ParsedTaskSpec` (dead code for runTeam path)
- Deduplicate retry event emission (single code path for both error types)
- Injectable delay function for test determinism
- Rewrite tests to call the real `executeWithRetry`, not a copy
- 15 tests covering: success, retry+success, retry+failure, backoff
  calculation, delay cap, delay function injection, no-retry default
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 972f9f0c74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +139 to +142
const result = await run()
if (result.success) {
return result
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Aggregate token usage across retry attempts

When a task succeeds after one or more failed attempts, executeWithRetry returns only the final attempt result here, so token usage (and other execution metadata) from earlier attempts is discarded. Since executeQueue stores this single result in ctx.agentResults and buildTeamRunResult() sums result.tokenUsage, runs with retries underreport actual model consumption, which can skew billing/quota tracking and observability.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d9b20c0executeWithRetry now accumulates totalUsage across all attempts and returns it in the final result.

- maxRetries clamped to >= 0 (negative values treated as no retry)
- retryBackoff clamped to >= 1 (prevents zero/negative delay oscillation)
- retryDelayMs clamped to >= 0
- Add tests for negative maxRetries and negative backoff

Addresses Codex review P1 on #37
Previously only the final attempt's tokenUsage was returned, causing
under-reporting of actual model consumption when retries occurred.
Now all attempts' token counts are summed in the returned result.

Addresses Codex review P2 (token usage) on #37
@JackChen-me JackChen-me merged commit 4d7564b into main Apr 3, 2026
3 checks passed
@JackChen-me JackChen-me deleted the feat/task-retry branch April 3, 2026 06:08
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63e1f7068a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

onRetry?: (data: { attempt: number; maxAttempts: number; error: string; nextDelayMs: number }) => void,
delayFn: (ms: number) => Promise<void> = sleep,
): Promise<AgentRunResult> {
const maxAttempts = Math.max(0, task.maxRetries ?? 0) + 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard maxRetries against non-finite values

executeWithRetry derives maxAttempts directly from task.maxRetries, but Math.max(0, task.maxRetries ?? 0) does not sanitize Infinity/NaN. If a caller passes maxRetries: Infinity (possible via runTasks since this is plain JS input), the for (attempt <= maxAttempts) loop becomes effectively unbounded and the task never exits retrying, stalling executeQueue; NaN similarly produces a broken attempt bound. Please coerce to a finite non-negative integer before entering the loop.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d9b20c0 — all retry fields are now guarded with Number.isFinite() before use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P0] Task Retry — maxRetries with exponential backoff

1 participant