Skip to content

fix(gateway): rate-limit compression warning messages to once per hour #3786

Open
dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
dlkakbs:fix/compression-warn-rate-limit
Open

fix(gateway): rate-limit compression warning messages to once per hour #3786
dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
dlkakbs:fix/compression-warn-rate-limit

Conversation

@dlkakbs
Copy link
Copy Markdown
Contributor

@dlkakbs dlkakbs commented Mar 29, 2026

What does this PR do?

The post-compression warning had no cooldown. When a session stayed above the 95% token threshold, it fired on every message — spamming users on long-running bots with no way to stop it.

The root cause was a missing rate-limit, not a missing config toggle. This PR adds a 1-hour cooldown per chat_id on GatewayRunner, gating both warning paths: "still large after compression" and "compression failed".

Why rate-limit instead of an on/off toggle? A toggle would silence a genuinely useful signal entirely, including the first occurrence. Rate-limiting preserves the warning for users who haven't seen it yet while eliminating the spam for those who have — making the config option unnecessary.

Related Issue

#3784

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Changes Made

  • gateway/run.py: added _compression_warn_sent dict and _compression_warn_cooldown (3600s) to GatewayRunner.init; both warn sites now check and update the rate-limit before sending
  • tests/gateway/test_session_hygiene.py: added TestCompressionWarnRateLimit with 4 tests covering first warn allowed, suppression within cooldown, allowed after cooldown, and per-chat isolation

How to Test

  1. Configure a session to stay above the 95% token threshold after compression
  2. Send multiple messages — warning should appear only once per hour, not every message
  3. pytest tests/gateway/test_session_hygiene.py -q — 23 passed

Checklist

  • Commit messages follow Conventional Commits
  • PR contains only changes related to this fix
  • All tests pass
  • Tests added for new behaviour

Documentation & Housekeeping

  • No config changes — N/A
  • Cross-platform impact considered — N/A

The post-compression warning ("Session is still very large") had no
cooldown, so it fired on every message as long as the session remained
above the 95% token threshold — spamming users on long-running bots
(Telegram, Discord, etc.).

Adds _compression_warn_sent (dict keyed by chat_id) and a 1-hour
cooldown on GatewayRunner. Both warn paths (compression ran but still
large, and compression failed) are gated by the same rate-limit.

Fixes NousResearch#3784
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant