User Details
- User Since
- Oct 7 2014, 5:34 AM (560 w, 4 d)
- Availability
- Available
- IRC Nick
- subbu
- LDAP User
- Subramanya Sastry
- MediaWiki User
- SSastry (WMF) [ Global Accounts ]
Thu, Jul 3
Wed, Jul 2
The idea sounded fine in theory, but in practice, things get complicated. The above patch does a good first pass that works quite well for quote & list handlers, and decently well enough for the indent-pre handler.
Okay, we've filed separate tasks for specific issues and fixed some of them. The only uninvestigated issue is to look at timeouts on User and Wikipedia namespaces -- both of which point to issues with large pages and not necessarily something substantially broken. I'm going to close this task and when we get to a second round of performance work, we can revisit our state of OOMs and timeouts and file fresh tasks as needed.
Tue, Jul 1
Mon, Jun 30
Not that that explains why the tests are failing in AbuseFilter which presumably is only invoking the HTML-producing REST API endpoints, but just in case that observation above helps someone debug this.
wikitext/to/lint doesn't emit any HTML, only generates lints and returns the lint output -- so no tests should be run against this endpoint that require the HTML.
Thu, Jun 12
Fri, Jun 6
gerrit 114980 linked above yields about 0.5% perf improvement. Not sure what we think about it since it introduces one more specialized traverser. Might be worth considering, but given the benefits, definitely lower priority and something to consider later once all the big ticket fixes are done.
Jun 4 2025
At this point, there have been a number of patches that have sped up the parse time on this page by over 35% compared to when this phab task was filed. There are some more patches that are likely to further improve the performance by an additional 3-5%. I am going to resolve this task now.
This is now set up and we have some compound tokens being created. This also does improve performance on some bigger pages. I am going to resolve this.
Jun 3 2025
https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0/Typed_Templates is relevant in this context (not necessarily the proposal since that is half-thought out as a strawdog, but the discussion at least).
May 29 2025
Okay, with some more instrumentation, for crhwiki, it looks AddRedLinks is the culprit:
With an additional hacked up version of that patch on parsoidtest1001, here is some info
We could more easily change data-parsoid external representation without worrying about spec version bumps .. and maybe it makes sense from HTML size / parser cache use POV. But, there are separate phab tasks for optimizing HTML size .. So, should we just decline this then?
May 28 2025
Looks like the message localization ($msg->parse / $msg->parseAsBlock) returned an empty string maybe? An edge case, but something we should probably catch. Flagging @ihurbain since she understands this better.
They are all from pswiki and requests to the Action API to parse a page with Parsoid - 580 errors in all.
Regarding OOMs, I filed T395492: Infinite loop in Cite's linter code, and for a bunch of titles collected from this task and from logstash (I skipped enwiki User namespace), here is some data of peak memory usage (as reported on parsoidtest1001 via Parosid's parse.php script and the --benchmark option):
enwiki:Template:Syrian_Civil_War_detailed_map ------ 1964.85 MiB enwiki:Wikipedia:User_scripts/Ranking ------ 1553.93 MiB enwiki:Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Publisher1 ------ 1517.26 MiB enwiki:Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Publisher2 ------ 1518.95 MiB enwiki:Wikipedia:WikiProject_Spam/LinkReports/elections.alifailaan.pk ------ 1631.92 MiB frwiki:Projet:Football/Maintenance ------ 1794.44 MiB srwiki:Malacostraca ------ 2013.71 MiB crhwiki:Brânsk_rayonı ------ 1211.06 MiB crhwiki:Suniy_zekâ ------ 1929.90 MiB crhwiki:Noyabr ------ 4733.70 MiB crhwiki:Elektrik ------ 1891.67 MiB crhwiki:Silâ ------ 1736.65 MiB
Two of the pages from T392261#10849861 (originally reported T366082 in May 2024) above are no longer a problem. They parse within the memory limit and time limit.
enwiki:Timeline_of_the_COVID-19_pandemic_in_Canada ------ 255.49 MiB srwiki:Naučno-stručno_društvo_za_upravljanje_rizicima_u_vanrednim_situacijama ------ 105.36 MiB
May 23 2025
May 22 2025
WMDE has done most of this work already.
This task is effectively resolved with WMDE's reworking of Cite's Parsoid-implementation output to eliminate the need for this.
We should actually look at some test pages to determine this is useful before we go full-steam on this. But, the kernel of the idea -- wasteful internal representation -- might potentially yield something.
We won't do this -- just adds maintenance headaches for us.
This is from Parsoid/JS days and it is unclear if this is still an issue for Parsoid/PHP. Doesn't seem to show up in any of our profiling at this time.
We have some of this information in our dashboards and I am not sure it is actually providing any useful insights!
This is effectively handled by work done in T392261: Investigate crashers (out of memory, timeouts) and there is no need to do this.