Page MenuHomePhabricator

[SPIKE] Add a statslib counter for the LoadBalancer::runPrimaryTransactionIdleCallbacks errors
Closed, ResolvedPublic2 Estimated Story Points

Description

Description

Make ourselves less blind to the potential impact of re-enabling logging in LoadBalancer::runPrimaryTransactionIdleCallbacks in light of T368289.

Conditions of acceptance

  • Create new counter for logging.

Event Timeline

HCoplin-WMF triaged this task as High priority.
HCoplin-WMF updated the task description. (Show Details)
HCoplin-WMF set the point value for this task to 5.
HCoplin-WMF added a subscriber: Ladsgroup.
aaron renamed this task from [SPIKE] Add a statslib counter for the incredible amount of logs to [SPIKE] Add a statslib counter for the LoadBalancer::runPrimaryTransactionIdleCallbacks errors.Apr 11 2025, 6:10 PM
aaron updated the task description. (Show Details)
HCoplin-WMF changed the point value for this task from 5 to 2.
HCoplin-WMF set Final Story Points to 5.

Change #1137040 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: add StatsLib counter for danging writes during post-transaction callbacks

https://gerrit.wikimedia.org/r/1137040

If the current log emission was bumped back to WARNING (thus sent to logstash), any infinite loops could cause downtime (as before), flooding logstash and saturating networks enough to effect other requests. While some fixes were made, I've identified a few infinite loop scenarios that need to be fixed. Even if the statslib data shows few spikes, it only takes one to cause a lot of trouble, so I'd be hesitant to re-enable logging without those fixes being merged *and* the statslib counter showing no spikes.

The current WIP patch to fix them is both too big and incomplete. It needs to be split up and also based atop more refactoring patches to make the code practical to reason about (fiddley bits around DBO_TRX as well as callback exception handling).

Change #1137040 merged by jenkins-bot:

[mediawiki/core@master] rdbms: add StatsLib counter for danging writes during post-transaction callbacks

https://gerrit.wikimedia.org/r/1137040