Page MenuHomePhabricator

colewhite (cwhite)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Aug 21 2018, 6:05 PM (352 w, 19 h)
Availability
Available
LDAP User
Cwhite
MediaWiki User
CWhite (WMF) [ Global Accounts ]

Recent Activity

Yesterday

colewhite added a comment to T228380: Tech debt: sunsetting of Graphite.

Is there an example I can follow of how to do this? The documentation has an example for PHP but not for client side.
I'm wondering if this is all I have to do or if I am missing something else.

Tue, May 20, 11:18 PM · SRE Observability (FY2024/2025-Q4), MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Patch-For-Review, Technical-Debt, Observability-Metrics

Thu, May 15

colewhite added a subtask for T377018: Curator forcemerges active weekly and yearly indexes every day: T394460: Conflict in index naming convention.
Thu, May 15, 9:37 PM · Patch-For-Review, Observability-Logging
colewhite added a parent task for T394460: Conflict in index naming convention: T377018: Curator forcemerges active weekly and yearly indexes every day.
Thu, May 15, 9:37 PM · Observability-Logging
colewhite created T394460: Conflict in index naming convention.
Thu, May 15, 9:36 PM · Observability-Logging
colewhite changed the status of T377018: Curator forcemerges active weekly and yearly indexes every day from Open to In Progress.
Thu, May 15, 8:58 PM · Patch-For-Review, Observability-Logging
colewhite merged T391661: Weekly indices are force-merged by curator every day for a week into T377018: Curator forcemerges active weekly and yearly indexes every day.
Thu, May 15, 8:58 PM · Patch-For-Review, Observability-Logging
colewhite merged task T391661: Weekly indices are force-merged by curator every day for a week into T377018: Curator forcemerges active weekly and yearly indexes every day.
Thu, May 15, 8:58 PM · SRE Observability (FY2024/2025-Q4), Observability-Logging
colewhite created T394364: Consider enabling log sampling for StatsLib.
Thu, May 15, 12:20 AM · MediaWiki-libs-Stats

Wed, May 14

colewhite updated the task description for T394363: Determine and enforce a sane maximum length for a label value.
Wed, May 14, 11:22 PM · MediaWiki-libs-Stats
colewhite created T394363: Determine and enforce a sane maximum length for a label value.
Wed, May 14, 11:17 PM · MediaWiki-libs-Stats
colewhite created T394362: More closely follow Prometheus' data model requirements.
Wed, May 14, 11:10 PM · MediaWiki-libs-Stats
colewhite created T394361: Consider reducing usage of StatsUtils::normalizeString.
Wed, May 14, 11:07 PM · Observability-Metrics, MediaWiki-libs-Stats
lmata awarded T359267: Migrate MediaWiki.timing to statslib a Party Time token.
Wed, May 14, 3:38 PM · SRE Observability (FY2024/2025-Q3), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Observability-Metrics
lmata awarded T359385: Migrate MediaWiki.arclamp to statslib a Love token.
Wed, May 14, 3:38 PM · SRE Observability (FY2024/2025-Q3), Observability-Metrics

Tue, May 13

colewhite removed a project from T383563: mw.track: support for histogram metrics: SRE Observability (FY2024/2025-Q3).
Tue, May 13, 11:07 PM · patch-welcome, MediaWiki-Engineering, Observability-Metrics, Data-Engineering-Radar, MediaWiki-Platform-Team (Radar), Data-Engineering, MediaWiki-extensions-WikimediaEvents, Grafana, Growth-Team, GrowthExperiments
colewhite closed T359267: Migrate MediaWiki.timing to statslib, a subtask of T350592: EPIC: migrate in use metrics and dashboards to statslib, as Resolved.
Tue, May 13, 11:06 PM · SRE Observability (FY2024/2025-Q4), MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
colewhite closed T359267: Migrate MediaWiki.timing to statslib as Resolved.

Calling this one done!

Tue, May 13, 11:06 PM · SRE Observability (FY2024/2025-Q3), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Observability-Metrics
colewhite updated the task description for T359267: Migrate MediaWiki.timing to statslib.
Tue, May 13, 11:05 PM · SRE Observability (FY2024/2025-Q3), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Observability-Metrics
colewhite closed T359385: Migrate MediaWiki.arclamp to statslib, a subtask of T350592: EPIC: migrate in use metrics and dashboards to statslib, as Resolved.
Tue, May 13, 10:55 PM · SRE Observability (FY2024/2025-Q4), MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
colewhite closed T359385: Migrate MediaWiki.arclamp to statslib as Resolved.

Dashboard looks migrated thanks to @andrea.denisse!

Tue, May 13, 10:55 PM · SRE Observability (FY2024/2025-Q3), Observability-Metrics
colewhite closed T364240: Investigate making TimingMetric unit-aware as Resolved.

I'm going to call this resolved as we have a few methods available now.

Tue, May 13, 10:51 PM · MW-1.44-notes (1.44.0-wmf.15; 2025-02-04), SRE Observability (FY2024/2025-Q3), MediaWiki-Platform-Team (Radar), MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), Patch-For-Review, MediaWiki-libs-Stats
colewhite edited projects for T390215: Logstash is overwhelmed, added: SRE Observability (FY2024/2025-Q4); removed SRE Observability (FY2024/2025-Q3).
Tue, May 13, 10:50 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging
colewhite added a comment to T390215: Logstash is overwhelmed.

Checking back on this, it seems the situation hasn't improved much.

Tue, May 13, 9:58 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging
colewhite added a comment to T391677: Audit dashboards using histogram_quantile on mediawiki_WikimediaEvents_editResponseTime.

I imported and modified a dashboard to help us see what's going on. It shows that up until a couple hours ago, we were still at 46k unique timeseries.

Tue, May 13, 4:39 PM · MediaWiki-Platform-Team, MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), SRE Observability (FY2024/2025-Q4), Observability-Metrics

Fri, May 9

colewhite edited projects for T393630: Cookbook downtiming does not work, continues anyway, added: SRE Observability; removed Observability-Alerting.
Fri, May 9, 8:32 PM · SRE Observability

Thu, May 8

colewhite closed T266886: Augment NEL reports with a computed timestamp-of-generation, a subtask of T257527: automatically collect network error reports from users' browsers (Network Error Logging API), as Resolved.
Thu, May 8, 5:11 PM · Product-Data-Infrastructure, SRE, Goal, Epic
colewhite closed T266886: Augment NEL reports with a computed timestamp-of-generation as Resolved.

The "generated" field is is now calculated from meta.dt - age and stored.

Thu, May 8, 5:11 PM · SRE Observability (FY2024/2025-Q3), Data-Engineering-Icebox, Data-Engineering, Observability-Logging, Analytics

Fri, May 2

colewhite created T393186: Remove dependency on liuggio/statsd-php-client.
Fri, May 2, 2:48 PM · MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), Patch-For-Review, MediaWiki-libs-Stats
colewhite added a comment to T326607: Future of liuggio/statsd-php-client?.

Now that graphite is fully deprecated, I don't think we need this library any more.

Fri, May 2, 2:45 PM · MediaWiki-libs-Stats, SRE Observability, observability, serviceops-radar, Technical-Debt

Wed, Apr 30

colewhite added a comment to T228380: Tech debt: sunsetting of Graphite.

Yes in pushgateway you have a "grouping key" say for example job=foo and then can replace all metrics and their labels pushed under that grouping key.

Would statsd-exporter TTL help in this case to avoid metrics lingering around ?

Wed, Apr 30, 9:49 PM · SRE Observability (FY2024/2025-Q4), MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Patch-For-Review, Technical-Debt, Observability-Metrics
colewhite added a comment to T391532: Migrate wikibase.queryService.ui metrics to Prometheus.

The new query GUI version got merged and deployed, and (in a new browser profile) I can see requests being sent to e.g. https://www.wikidata.org/beacon/stats?wikibase_queryService_ui_app_init_total:1|c, but so far the metrics aren’t showing up in Thanos or Grafana…

Wed, Apr 30, 3:26 PM · SRE Observability, Wikidata Analytics (Kanban), Wikidata

Tue, Apr 29

colewhite created T392931: Performance.timing is deprecated and should be removed from Vector skin..
Tue, Apr 29, 4:57 PM · Vector 2022 (Desktop improvements), Web-Team

Mon, Apr 28

colewhite added a comment to T228380: Tech debt: sunsetting of Graphite.

Is there a best practice for how to query these correctly such that when multiple are found, the correct/most recent is returned for any given interval point?

Mon, Apr 28, 9:30 PM · SRE Observability (FY2024/2025-Q4), MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Patch-For-Review, Technical-Debt, Observability-Metrics

Thu, Apr 24

colewhite added a comment to T392230: Explore improved isolation of non-ECS k8s log topics/indices.

The fields explosion event that prompted this task substantially slowed ingest across all topics. Given that, I do not think splitting indexes on the OpenSearch side would have isolated the problem logs from the rest of the log streams.

Thu, Apr 24, 10:56 PM · SRE Observability (FY2024/2025-Q3), Observability-Logging
colewhite added a comment to T391687: Consider sharding big logging indices.

Another option is partitioning the data into more indexes to reduce index size.

Thu, Apr 24, 10:19 PM · Observability-Logging

Apr 18 2025

colewhite added a comment to P54551 2025-04-18 snapshot MediaWiki Metrics in Grafana Dashboards.

codesearch mw.track counters
codesearch mw.track timers

Apr 18 2025, 9:42 PM
colewhite edited P54551 2025-04-18 snapshot MediaWiki Metrics in Grafana Dashboards.
Apr 18 2025, 9:33 PM
colewhite edited P54551 2025-04-18 snapshot MediaWiki Metrics in Grafana Dashboards.
Apr 18 2025, 9:26 PM
colewhite archived P75287 2025-04-18 snapshot of graphite metrics in Grafana dashboards.
Apr 18 2025, 8:55 PM
colewhite updated the title for P54551 2025-04-18 snapshot MediaWiki Metrics in Grafana Dashboards from 2024-01-08 snapshot MediaWiki Metrics in Grafana Dashboards to 2025-04-18 snapshot MediaWiki Metrics in Grafana Dashboards.
Apr 18 2025, 8:54 PM
colewhite created P75287 2025-04-18 snapshot of graphite metrics in Grafana dashboards.
Apr 18 2025, 8:52 PM

Apr 17 2025

colewhite closed T348796: MediaWiki: Define new metric type - Histogram, a subtask of T354908: evaluate and migrate in-use parsoid metrics to statslib, as Resolved.
Apr 17 2025, 8:48 PM · MW-1.43-notes (1.43.0-wmf.15; 2024-07-23), Content-Transform-Team, OKR-Work, Content-Transform-Team-WIP, MediaWiki-Platform-Team (Radar), Observability-Metrics
colewhite closed T348796: MediaWiki: Define new metric type - Histogram as Resolved.

The feature has landed. Thanks, all!

Apr 17 2025, 8:48 PM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), SRE Observability (FY2024/2025-Q3), Patch-For-Review, Observability-Metrics, MediaWiki-libs-Stats
colewhite updated the task description for T348806: Rethink how metric label values are sanitized.
Apr 17 2025, 6:23 PM · Observability-Metrics, MediaWiki-libs-Stats

Apr 16 2025

colewhite added a comment to T390215: Logstash is overwhelmed.

Update: we saw immediate ingest improvement after removing the out_request and outRequest fields which were generated by mobileapps. We should watch for more inexplicable dips in throughput in the coming days if this was an incomplete mitigation.

Apr 16 2025, 9:55 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging

Apr 15 2025

colewhite added a comment to T228380: Tech debt: sunsetting of Graphite.

Hi all, quick question as we address this Graphite migration work:
Can someone confirm whether the following metric families have already been instrumented and are available in Prometheus?

PagePreviewsApiResponse.*
PagePreviewsPreviewShow.*
These power several Web Team dashboard widgets (e.g., API response time, TTP, preview count, etc.), and we're trying to determine if this data is already flowing into Prometheus, or if we need to reinstrument it ourselves.

Appreciate any guidance—thanks so much!

cc: @ssingh @Jdrewniak

Apr 15 2025, 10:44 PM · SRE Observability (FY2024/2025-Q4), MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Patch-For-Review, Technical-Debt, Observability-Metrics
colewhite updated the task description for T359267: Migrate MediaWiki.timing to statslib.
Apr 15 2025, 10:25 PM · SRE Observability (FY2024/2025-Q3), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Observability-Metrics
colewhite updated the task description for T359267: Migrate MediaWiki.timing to statslib.
Apr 15 2025, 10:20 PM · SRE Observability (FY2024/2025-Q3), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Observability-Metrics
colewhite changed the status of T348796: MediaWiki: Define new metric type - Histogram from In Progress to Stalled.

Is this task stalled? Parsoid is having trouble migrating its byte-count metrics to prometheus so this is a blocker for us at this time. I note that @Michael's blocker seems to have been solved by adding bucket support to the JavaScript side in mw.track, but we really need PHP-side support.

Apr 15 2025, 7:49 PM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), SRE Observability (FY2024/2025-Q3), Patch-For-Review, Observability-Metrics, MediaWiki-libs-Stats
colewhite changed the status of T348796: MediaWiki: Define new metric type - Histogram, a subtask of T354908: evaluate and migrate in-use parsoid metrics to statslib, from In Progress to Stalled.
Apr 15 2025, 7:49 PM · MW-1.43-notes (1.43.0-wmf.15; 2024-07-23), Content-Transform-Team, OKR-Work, Content-Transform-Team-WIP, MediaWiki-Platform-Team (Radar), Observability-Metrics
colewhite added a comment to T372856: Configure graphite to be read only.

@fgiunchedi noted

Apr 15 2025, 2:57 PM · SRE Observability (FY2024/2025-Q4), Technical-Debt, Observability-Metrics

Apr 14 2025

colewhite added a comment to T372856: Configure graphite to be read only.

Affected dashboards: https://grafana.wikimedia.org/d/K6DEOo5Ik/grafana-graphite-datasource-utilization

Apr 14 2025, 4:15 PM · SRE Observability (FY2024/2025-Q4), Technical-Debt, Observability-Metrics

Apr 9 2025

colewhite placed T359246: [GRAFMIGR] Migrate MediaWiki.wikibase.quality.constraints.* to statslib up for grabs.
Apr 9 2025, 8:55 PM · Wikidata Analytics (Kanban), MW-1.44-notes (1.44.0-wmf.19; 2025-03-04), wmde-wikidata-tech (Wikidata Omega Triage), Wikidata, Observability-Metrics
colewhite added a comment to T359246: [GRAFMIGR] Migrate MediaWiki.wikibase.quality.constraints.* to statslib.

The final step in the migration is to update the dashboards using these metrics to use the Prometheus metrics provided by the Thanos datasource. I found these dashboards still referencing Graphite for this data:

Apr 9 2025, 2:45 PM · Wikidata Analytics (Kanban), MW-1.44-notes (1.44.0-wmf.19; 2025-03-04), wmde-wikidata-tech (Wikidata Omega Triage), Wikidata, Observability-Metrics
colewhite added a comment to T391005: Add a log view to SpiderPig.

A couple points for awareness as you plan the project:

Apr 9 2025, 2:25 PM · Patch-For-Review, Release-Engineering-Team (Yak Shaving 🐃🪒), Observability-Logging, logspam-watch, observability, User-brennen, Scap (SpiderPig 🕸️)

Apr 3 2025

colewhite awarded T322448: Volumes stuck in "Reserved" state a Love token.
Apr 3 2025, 10:24 PM · cloud-services-team, Cloud-VPS
colewhite closed T389469: No metrics from JS arriving in Prometheus/Graphite since around 11:48 UTC Wed. 2025-03-19 as Resolved.

We have alerting now and we know a simple restart of statsv brings it back. Optimistically closing.

Apr 3 2025, 10:02 PM · SRE Observability (FY2024/2025-Q3), Observability-Metrics

Apr 2 2025

colewhite added a comment to T390596: Backend logging fields are getting dropped.

I'm wondering if there are fields that SRE is additionally dropping before Logstash emission

We do a lot of filtering on the legacy pipeline for fields that cause problems. All filtering we do there is manual and changes regularly.

Apr 2 2025, 7:38 PM · Abstract Wikipedia team (25Q4 (Apr–Jun)), Abstract Wikipedia Fix-It tasks, function-schemata, function-evaluator, function-orchestrator

Mar 28 2025

colewhite added a comment to T390140: Eventstreams 'assignments' logstash field type.

We've rolled out a logstash filter to check for name KafkaSSE and to cast the assignments field into a string. This can be undone when it is no longer needed.

Mar 28 2025, 11:22 PM · Data-Engineering (Q4 2025 April 1st - June 30th), SRE Observability, EventStreams
colewhite updated subscribers of T390215: Logstash is overwhelmed.

Checking back today, istio-ingressgateway is logging about 1000/logs/sec these days.

Mar 28 2025, 7:05 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging

Mar 27 2025

colewhite added a comment to T359471: Migrate MediaWiki.extension.PageTriage to statslib.

I think it's because the object is the second argument rather than the measurement.

Mar 27 2025, 8:48 PM · MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Moderator-Tools-Team, PageTriage, Observability-Metrics
colewhite changed the status of T390215: Logstash is overwhelmed from In Progress to Open.

Log volume is down quite a bit since raising the production-ratelimit log level filter. Will check back in on this in a few days to re-evaluate in case there are other applications doing the same thing.

Mar 27 2025, 5:10 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging
colewhite updated the task description for T390215: Logstash is overwhelmed.
Mar 27 2025, 4:43 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging
colewhite updated the task description for T390215: Logstash is overwhelmed.
Mar 27 2025, 4:43 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging
colewhite added a comment to T390215: Logstash is overwhelmed.

First suspect is kubernetes.container_name:"production-ratelimit" generating debug logs at around 1500/sec.

Mar 27 2025, 4:37 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging
colewhite changed the status of T390215: Logstash is overwhelmed from Open to In Progress.
Mar 27 2025, 4:37 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging
colewhite created T390215: Logstash is overwhelmed.
Mar 27 2025, 4:35 PM · SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Logging

Mar 25 2025

MSantos awarded T387344: Migrate RESTBase dashboards to prometheus a Love token.
Mar 25 2025, 11:24 AM · Observability-Metrics, SRE Observability (FY2024/2025-Q3), RESTBase Sunsetting

Mar 21 2025

colewhite closed T387344: Migrate RESTBase dashboards to prometheus as Resolved.

Dashboards migrated

Mar 21 2025, 10:09 PM · Observability-Metrics, SRE Observability (FY2024/2025-Q3), RESTBase Sunsetting

Mar 20 2025

colewhite added a comment to T389469: No metrics from JS arriving in Prometheus/Graphite since around 11:48 UTC Wed. 2025-03-19.

From around the same time:

Mar 19 11:45:21 webperf1003 python3[3604232]: Process Process-1:
Mar 19 11:45:21 webperf1003 python3[3604232]: Traceback (most recent call last):
Mar 19 11:45:21 webperf1003 python3[3604232]:   File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
Mar 19 11:45:21 webperf1003 python3[3604232]:     self.run()
Mar 19 11:45:21 webperf1003 python3[3604232]:   File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
Mar 19 11:45:21 webperf1003 python3[3604232]:     self._target(*self._args, **self._kwargs)
Mar 19 11:45:21 webperf1003 python3[3604232]:   File "/srv/deployment/statsv/statsv/statsv.py", line 268, in process_queue
Mar 19 11:45:21 webperf1003 python3[3604232]:     emit(sock, statsd_addr, statsd_message)
Mar 19 11:45:21 webperf1003 python3[3604232]:   File "/srv/deployment/statsv/statsv/statsv.py", line 195, in emit
Mar 19 11:45:21 webperf1003 python3[3604232]:     sock.sendto(payload.encode('utf-8'), addr)
Mar 19 11:45:21 webperf1003 python3[3604232]: socket.gaierror: [Errno -2] Name or service not known
Mar 20 2025, 3:03 PM · SRE Observability (FY2024/2025-Q3), Observability-Metrics
colewhite claimed T389469: No metrics from JS arriving in Prometheus/Graphite since around 11:48 UTC Wed. 2025-03-19.

It seems the statsv process wedged itself. After restarting the process, metrics are now flowing again.

Mar 20 2025, 2:51 PM · SRE Observability (FY2024/2025-Q3), Observability-Metrics

Mar 18 2025

colewhite added a project to T359497: StatsD Exporter: gracefully handle metric signature changes: SRE Observability (FY2024/2025-Q3).
Mar 18 2025, 11:55 PM · SRE Observability (FY2024/2025-Q3), Observability-Metrics
colewhite closed T359497: StatsD Exporter: gracefully handle metric signature changes as Resolved.

Config deployed!

Mar 18 2025, 11:54 PM · SRE Observability (FY2024/2025-Q3), Observability-Metrics

Mar 14 2025

colewhite closed T359347: (mw.track) Migrate MediaWiki.echo.* to statslib, a subtask of T350592: EPIC: migrate in use metrics and dashboards to statslib, as Resolved.
Mar 14 2025, 3:57 PM · SRE Observability (FY2024/2025-Q4), MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
colewhite closed T359347: (mw.track) Migrate MediaWiki.echo.* to statslib as Resolved.

Dashboards migrated

Mar 14 2025, 3:57 PM · MW-1.44-notes (1.44.0-wmf.17; 2025-02-18), Growth-Team (Current Sprint), Notifications (Echo), Observability-Metrics

Mar 13 2025

colewhite closed T359481: Migrate MediaWiki.FileImporter to statslib, a subtask of T350592: EPIC: migrate in use metrics and dashboards to statslib, as Resolved.
Mar 13 2025, 5:49 PM · SRE Observability (FY2024/2025-Q4), MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
colewhite closed T359481: Migrate MediaWiki.FileImporter to statslib as Resolved.

Dashboards migrated

Mar 13 2025, 5:49 PM · MW-1.44-notes (1.44.0-wmf.8; 2024-12-17), WMDE-TechWish-Maintenance, Move-Files-To-Commons, Observability-Metrics
colewhite closed T359458: Create Prometheus-backed dashboard for BlockNotices, a subtask of T350592: EPIC: migrate in use metrics and dashboards to statslib, as Resolved.
Mar 13 2025, 3:58 PM · SRE Observability (FY2024/2025-Q4), MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
colewhite closed T359458: Create Prometheus-backed dashboard for BlockNotices as Resolved.

Dashboards migrated.

Mar 13 2025, 3:57 PM · Trust and Safety Product Sprint (Sprint Lemon Meringue (March 3 - 21)), MW-1.44-notes (1.44.0-wmf.13; 2025-01-21), Trust and Safety Product Team, Observability-Metrics
colewhite updated the task description for T359458: Create Prometheus-backed dashboard for BlockNotices.
Mar 13 2025, 3:57 PM · Trust and Safety Product Sprint (Sprint Lemon Meringue (March 3 - 21)), MW-1.44-notes (1.44.0-wmf.13; 2025-01-21), Trust and Safety Product Team, Observability-Metrics

Mar 12 2025

colewhite added a comment to T388629: puppet error at the end of the run on prometheus2008: Could not autoload puppet/reports/logstash: Cannot invoke "jnr.netdb.Service.getName()" because "service" is null.

Linking this task here in case it helps with the investigation: T385058: logstash.rb uses deprecated Socket.gethostbyname

Mar 12 2025, 11:50 PM · SRE, Puppet

Mar 7 2025

colewhite updated the task description for T359251: [REPO][SW][GRAFMIGR] (mw.track) Migrate MediaWiki.wikibase.repo.* to statslib.
Mar 7 2025, 5:23 PM · MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Wikidata Analytics (Kanban), WMDE-Analytics-Engineering, wmde-wikidata-tech (Wikidata Omega Triage), Patch-For-Review, MediaWiki-extensions-Wikibase-Repo, Wikidata, Observability-Metrics

Mar 5 2025

colewhite awarded T387899: WikimediaDebug no longer available in Chrome web store? a Party Time token.
Mar 5 2025, 6:23 PM · User-bd808, Upstream, WikimediaDebug
colewhite added a comment to T387343: Enable native prometheus metrics in RESTBase.

Restbase metrics are now being ingested by Prometheus.

Mar 5 2025, 6:06 PM · SRE Observability, Content-Transform-Team (Work In Progress), observability, RESTBase Sunsetting

Mar 4 2025

colewhite renamed T387842: Large volume of indexing errors from wikifeeds logs err field as object not text causing indexing errors to Large volume of indexing errors.
Mar 4 2025, 8:27 PM · Observability-Logging
colewhite created T387899: WikimediaDebug no longer available in Chrome web store?.
Mar 4 2025, 5:41 PM · User-bd808, Upstream, WikimediaDebug

Feb 26 2025

colewhite placed T383287: decommission logstash102[6-9] up for grabs.
Feb 26 2025, 2:14 PM · SRE, DC-Ops, ops-eqiad, Observability-Logging, decommission-hardware
colewhite updated the task description for T383287: decommission logstash102[6-9].
Feb 26 2025, 2:14 PM · SRE, DC-Ops, ops-eqiad, Observability-Logging, decommission-hardware
colewhite created T387261: Logstash: ~410 msgs a sec from tegola-vector-tiles.
Feb 26 2025, 12:43 AM · Observability-Logging

Feb 25 2025

colewhite reopened T359239: (mw.track) Migrate MediaWiki.cx.publish.*.sum to statslib, a subtask of T350592: EPIC: migrate in use metrics and dashboards to statslib, as Open.
Feb 25 2025, 11:28 PM · SRE Observability (FY2024/2025-Q4), MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Epic, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-Platform-Team (Radar), Observability-Metrics
colewhite reopened T359239: (mw.track) Migrate MediaWiki.cx.publish.*.sum to statslib as "Open".

Hi @colewhite,

Should we remove the metrics from MWTrack?

Reference: MWTrack usage in ContentTranslation

Let me know your thoughts.

Feb 25 2025, 11:28 PM · LPL Essential (LPL Essential 2024 Jul-Oct), Observability-Metrics
colewhite closed T385058: logstash.rb uses deprecated Socket.gethostbyname as Resolved.

The change is deployed. I'm not sure there is anything left to do, but optimistically resolving.

Feb 25 2025, 4:14 PM · Infrastructure-Foundations, Observability-Logging

Feb 21 2025

colewhite assigned T359237: (mw.track) Migrate MediaWiki.cx.campaign.*.accept to statslib to andrea.denisse.
Feb 21 2025, 11:32 PM · MW-1.44-notes (1.44.0-wmf.19; 2025-03-04), Language and Product Localization, ContentTranslation, Observability-Metrics
colewhite claimed T359462: Migrate MediaWiki.Cognate.Repo to statslib.
Feb 21 2025, 11:31 PM · MW-1.44-notes (1.44.0-wmf.18; 2025-02-25), Wikidata, Cognate, Observability-Metrics

Feb 18 2025

Michael awarded T368740: Simplify asserting expected stats in unit / integration tests a Love token.
Feb 18 2025, 7:23 PM · MW-1.44-notes (1.44.0-wmf.24; 2025-04-08), MediaWiki-libs-Stats

Feb 10 2025

colewhite renamed T359386: (Analytics?) Migrate MediaWiki.articleplaceholder to statslib from Migrate MediaWiki.articleplaceholder to statslib to (Analytics?) Migrate MediaWiki.articleplaceholder to statslib.
Feb 10 2025, 5:50 PM · Wikidata Analytics (Kanban), MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Wikidata, Observability-Metrics
colewhite added a comment to T359386: (Analytics?) Migrate MediaWiki.articleplaceholder to statslib.

The rest of the metrics on the dashboard appear to be generated by Airflow.

Feb 10 2025, 5:50 PM · Wikidata Analytics (Kanban), MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Wikidata, Observability-Metrics

Feb 7 2025

colewhite added a comment to T381607: Talk page notification alert bar: Improve data legibility with a clear grafana dashboard.

I'm trying to build a panel ( Echo notifications ) that allows to filter by wiki and "user types" and I'm failing to make a query that would work also for the "All" option. I've customized the all option to be of value .* and used the regex operator =~

Feb 7 2025, 6:45 PM · MW-1.44-notes (1.44.0-wmf.13; 2025-01-21), Growth-Team (Current Sprint), Notifications (Echo)
colewhite added a project to T383563: mw.track: support for histogram metrics: MediaWiki-Platform-Team.
Feb 7 2025, 1:44 PM · patch-welcome, MediaWiki-Engineering, Observability-Metrics, Data-Engineering-Radar, MediaWiki-Platform-Team (Radar), Data-Engineering, MediaWiki-extensions-WikimediaEvents, Grafana, Growth-Team, GrowthExperiments
colewhite renamed T383563: mw.track: support for histogram metrics from statslib: statsv support for histogram metrics to mw.track: support for histogram metrics.
Feb 7 2025, 1:42 PM · patch-welcome, MediaWiki-Engineering, Observability-Metrics, Data-Engineering-Radar, MediaWiki-Platform-Team (Radar), Data-Engineering, MediaWiki-extensions-WikimediaEvents, Grafana, Growth-Team, GrowthExperiments