Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
In Progress | xcollazo | T368098 Dumps generation cause disruption to the production environment | |||
In Progress | Ladsgroup | T368289 Incredible amount of logs from Wikimedia\Rdbms\LoadBalancer::runPrimaryTransactionIdleCallbacks | |||
Resolved | aaron | T391364 [SPIKE] Add a statslib counter for the LoadBalancer::runPrimaryTransactionIdleCallbacks errors |
Event Timeline
Change #1137040 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):
[mediawiki/core@master] rdbms: add StatsLib counter for danging writes during post-transaction callbacks
If the current log emission was bumped back to WARNING (thus sent to logstash), any infinite loops could cause downtime (as before), flooding logstash and saturating networks enough to effect other requests. While some fixes were made, I've identified a few infinite loop scenarios that need to be fixed. Even if the statslib data shows few spikes, it only takes one to cause a lot of trouble, so I'd be hesitant to re-enable logging without those fixes being merged *and* the statslib counter showing no spikes.
The current WIP patch to fix them is both too big and incomplete. It needs to be split up and also based atop more refactoring patches to make the code practical to reason about (fiddley bits around DBO_TRX as well as callback exception handling).
Change #1137040 merged by jenkins-bot:
[mediawiki/core@master] rdbms: add StatsLib counter for danging writes during post-transaction callbacks