fix(gateway): rate-limit compression warning messages to once per hour #3786
Open
dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
Open
fix(gateway): rate-limit compression warning messages to once per hour #3786dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
Conversation
The post-compression warning ("Session is still very large") had no
cooldown, so it fired on every message as long as the session remained
above the 95% token threshold — spamming users on long-running bots
(Telegram, Discord, etc.).
Adds _compression_warn_sent (dict keyed by chat_id) and a 1-hour
cooldown on GatewayRunner. Both warn paths (compression ran but still
large, and compression failed) are gated by the same rate-limit.
Fixes NousResearch#3784
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
The post-compression warning had no cooldown. When a session stayed above the 95% token threshold, it fired on every message — spamming users on long-running bots with no way to stop it.
The root cause was a missing rate-limit, not a missing config toggle. This PR adds a 1-hour cooldown per chat_id on GatewayRunner, gating both warning paths: "still large after compression" and "compression failed".
Why rate-limit instead of an on/off toggle? A toggle would silence a genuinely useful signal entirely, including the first occurrence. Rate-limiting preserves the warning for users who haven't seen it yet while eliminating the spam for those who have — making the config option unnecessary.
Related Issue
#3784
Type of Change
Changes Made
How to Test
Checklist
Documentation & Housekeeping