Page MenuHomePhabricator

mw.track: support for histogram metrics
Open, LowPublic

Description

While most uses of the Graphites Metrics type "timing" were indeed for measures of time, it was also sometimes used for tracking other quantities. The motivation for that was that we still want to track, calculate, and visualize percentiles and related quantities in Grafana.
With Graphite that was defacto no problem, because Graphite does not know units, only numbers, and so it dutifully created histograms of anything that was put in under the label of a timing.
However with Prometheus, as I understand the conversation starting from T355837#10431981, units now do matter, and thus bytes should no longer be tracked as "timings" to get the 95th percentile of transferred data.

Examples for existing metrics in JavaScript that are not actually timings:

Note:

Event Timeline

Michael moved this task from Inbox to Tracking on the Growth-Team board.

Reopening and clarifying that this is about tracking histograms in JavaScript and thus distinct from T348796: MediaWiki: Define new metric type - Histogram and functionally closer to T383953: Statsv support for timer metrics.

Michael renamed this task from statslib: Add support for tracking histograms for non-timings to statslib: statsv support for histogram metricas.Feb 7 2025, 1:00 PM
Michael renamed this task from statslib: statsv support for histogram metricas to statslib: statsv support for histogram metrics.
Michael updated the task description. (Show Details)
colewhite renamed this task from statslib: statsv support for histogram metrics to mw.track: support for histogram metrics.Feb 7 2025, 1:42 PM
colewhite added a subscriber: Krinkle.

One use-case for this was "migrated" from Graphite to Prometheus in homepage: Add homepage_transfersize_bytes_total metric. With additional manual work in Grafana, we got to something similar-ish to a histogram: https://grafana.wikimedia.org/d/vGq7hbnMz/special-homepage-and-suggested-edits?from=now-24h&orgId=1&to=now&viewPanel=201, which is good enough for now. (Thanks to @Krinkle for looking into this 🙏)

So from Growth-Team perspective, this task is no longer blocking making Graphite read-only as described in T228380: Tech debt: sunsetting of Graphite. It should still remain open, because we still need a better solution that results in less friction both on the js/mw.track side as well as on the Grafana side.

Krinkle triaged this task as Low priority.EditedApr 10 2025, 2:48 PM
From the task description:

timing.growthExperiments.specialHomepage.navigationTransferSize is measuring bytes.

I've migrated this, per:

Change #1131821 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] homepage: Add `homepage_transfersize_bytes_total` metric

https://gerrit.wikimedia.org/r/1131821

I've detached this from T382003, thus it is no longer blocking the T343020: Converting MediaWiki Metrics to StatsLib and T350592: EPIC: migrate in use metrics and dashboards to statslib goals.

@MSantos I see you moved this task to "Needs Input (waiting)" on the MediaWiki-Engineering board. Could you clarify which input is needed for this to be actionable?

MSantos moved this task from Needs Input (waiting) to Radar on the MediaWiki-Engineering board.

Hi @Michael, thank you for asking this question. This has been moved to 'needs input', and a meeting between observability and MediaWiki-Engineering has been scheduled to discuss ownership questions.

During this meeting, we realized that there is a gap in domain ownership, and neither team is well-equipped to fulfill requests related to this component at this moment, due to resource constraints. This is being escalated to an ownership forum.

Meanwhile, patches are welcome, and we are happy to review and do our best to provide guidance.

For now, I'm moving this to Radar to reflect the outcome of the meeting while this is sorted on a different forum.