Skip to content

Feature: Per-Session Health Polling & Live Status Indicators #57933

@apoapostolov

Description

@apoapostolov

Problem

When a session is processing a request (especially waiting for a slow model or a long tool call), the Control UI shows no indication of whether the session is alive, thinking, stalled, or crashed. Users experience "dead air" — minutes of silence with no feedback — which is indistinguishable from a crash or stall.

This is especially painful with:

  • Slower models (long time-to-first-token)
  • Multi-step agentic loops with tool calls
  • Background/long-running operations
  • Sessions with a history of instability

Proposed Solution

Session Heartbeat Emitter

Each active session emits a lightweight heartbeat signal at a regular interval (e.g., every 5 seconds) containing:

interface SessionHeartbeat {
  sessionKey: string;
  state: 
    | "idle"           // no active turn
    | "awaiting_model" // sent request, waiting for first token
    | "streaming"      // receiving tokens from model
    | "tool_exec"      // executing a tool call
    | "tool_wait"      // waiting for tool result
    | "complete";      // turn finished, waiting for next input
  turnId?: string;     // current turn identifier
  model?: string;      // active model
  toolName?: string;   // currently executing tool
  lastTokenAt?: number;// timestamp of last received token
  startedAt: number;   // when current turn started
}

Key Behaviors

  1. Transition-based emission — heartbeat fires on every state transition (idle → awaiting_model → streaming → tool_exec, etc.) AND on a periodic timer (5s) to catch stalls
  2. No extra model calls — this is pure session lifecycle metadata, not LLM traffic
  3. Stall detection — if heartbeat shows awaiting_model or tool_exec for > configurable threshold (e.g., 30s), the session is flagged as "potentially stalled"
  4. Crash detection — if no heartbeat received for > 15s, the UI shows session as "unresponsive"

Control UI Integration

  • Per-session status indicator (colored dot or badge): 🟢 active, 🟡 waiting, 🔴 stalled/dead
  • Tooltip showing: state, current model, current tool, elapsed time
  • Optional "last activity" timestamp per session
  • Session list should sort/update in real-time based on heartbeat

API Surface

GET /api/sessions/:key/heartbeat   → current heartbeat state
GET /api/sessions/heartbeat        → all session heartbeats (batch)

Or expose via existing session list endpoint with an extended status field.

Open Questions

  • Should heartbeat be WebSocket-based (push) or polling (pull)? WebSocket is better for real-time, polling is simpler to implement.
  • Should stall thresholds be configurable per-session or global?
  • Should there be an auto-recovery action (restart stalled session) exposed via the UI?

Alternatives Considered

  • Model streaming alone — doesn't cover tool execution gaps or slow time-to-first-token
  • Existing session list polling — already exists but lacks granular state info
  • Log tailing — too heavy, requires parsing, not structured

Impact

This would significantly improve the operational experience of running OpenClaw, especially for users managing multiple sessions or using slower/cheaper models where latency is expected. It turns "is it dead?" from a guessing game into a visible status.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions