Add readiness probe endpoint that detects unlicensed multi-replica state

## Problem

When multiple Coder instances are deployed behind a load balancer (e.g., Kubernetes Service) pointing at the same database **without a Premium license**, users experience intermittent failures. This happens because:

1. Each Coder instance registers itself as a replica in the database
2. After the grace period (default 1 minute), the entitlements check detects multiple replicas without HA entitlement
3. An error is recorded: *"You have multiple replicas but high availability is an Enterprise feature. You will be unable to connect to workspaces."*
4. However, the `/healthz` endpoint still returns `200 OK`
5. The load balancer continues routing traffic to all nodes, but only one can properly serve workspace connections

The result is that ~50% of requests fail unpredictably (or worse ratios with more replicas).

## Current Behavior

The `/healthz` endpoint ([coderd/coderd.go:909](https://github.com/coder/coder/blob/main/coderd/coderd.go#L909)) unconditionally returns "OK":

```go
r.Get("/healthz", func(w http.ResponseWriter, _ *http.Request) { _, _ = w.Write([]byte("OK")) })
```

The entitlement error *is* being detected and stored ([enterprise/coderd/license/license.go:431-449](https://github.com/coder/coder/blob/main/enterprise/coderd/license/license.go#L431-L449)), but it's only exposed via:
- The authenticated `/api/v2/entitlements` endpoint
- Warning headers on authenticated responses

Neither of these can be used for Kubernetes readiness probes.

## Proposed Solution

Add a `/readyz` endpoint (unauthenticated) that returns:
- `200 OK` when the node is fully operational
- `503 Service Unavailable` when the node has critical issues that should exclude it from load balancing

The readiness check should verify:
1. **Database connectivity** - Can the node reach the database?
2. **No blocking entitlement errors** - Specifically, the multi-replica without HA license error

This follows Kubernetes conventions where:
- `/healthz` (liveness) = "Is the process alive?" &rarr; restart if failing
- `/readyz` (readiness) = "Can this instance serve traffic?" &rarr; remove from load balancer if failing

## Implementation Notes

Key code areas:

- **Entitlements tracking**: `coderd/entitlements/entitlements.go` - Add method like `HasBlockingErrors() bool` that checks for errors that should make the node unready
- **New endpoint**: `coderd/coderd.go` - Add `/readyz` route
- **Error detection**: The replica error is already generated in `enterprise/coderd/license/license.go` in the `Errors` slice

Example implementation sketch:

```go
r.Get("/readyz", func(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
    defer cancel()
    
    // Check database connectivity
    if _, err := api.Database.Ping(ctx); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        _, _ = w.Write([]byte("database unreachable"))
        return
    }
    
    // Check for blocking entitlement errors
    if api.Entitlements.HasBlockingErrors() {
        w.WriteHeader(http.StatusServiceUnavailable)
        _, _ = w.Write([]byte("entitlement error"))
        return
    }
    
    _, _ = w.Write([]byte("OK"))
})
```

## Alternatives Considered

1. **Modify `/healthz` directly** - Rejected because changing liveness probe behavior could cause restart loops instead of just removing from load balancer

2. **Require authentication on readiness probe** - Rejected because Kubernetes probes typically run without application-level auth, and managing secrets for probes adds operational complexity

3. **External monitoring of `/api/v2/entitlements`** - Works but requires additional infrastructure (sidecar, external health checker with credentials)

## Additional Context

This issue particularly affects:
- Kubernetes deployments using `replicas > 1` without realizing HA requires Premium
- Blue-green or rolling deployments where multiple pods temporarily coexist
- Development/staging environments that mirror production topology without licenses

The current workaround is to manually ensure only one replica runs, but this defeats the purpose of high availability and is error-prone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add readiness probe endpoint that detects unlicensed multi-replica state #21255

Problem

Current Behavior

Proposed Solution

Implementation Notes

Alternatives Considered

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add readiness probe endpoint that detects unlicensed multi-replica state #21255

Description

Problem

Current Behavior

Proposed Solution

Implementation Notes

Alternatives Considered

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions