Page MenuHomePhabricator

ssastry (Subbu)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 5:34 AM (554 w, 1 d)
Availability
Available
IRC Nick
subbu
LDAP User
Subramanya Sastry
MediaWiki User
SSastry (WMF) [ Global Accounts ]

Recent Activity

Today

ssastry raised the priority of T394808: DOMTraverser performance experimentations: Maybe create optimized / specialized DOMTraversers from Low to Medium.
Wed, May 21, 3:23 AM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team, Parsoid
ssastry added a comment to T394808: DOMTraverser performance experimentations: Maybe create optimized / specialized DOMTraversers.

DisplaySpace is another one that could be refactored to fit the 'simple' pattern.

Wed, May 21, 3:05 AM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team, Parsoid
ssastry triaged T394808: DOMTraverser performance experimentations: Maybe create optimized / specialized DOMTraversers as Low priority.

I did a quick inspection of the handlers that lead to the most simplification of the DOMTraverser -- and there are just 3 of them (dedupe-ids, gen-anchors, add-link-attributes), and looking at --profile output of a few pages, those handlers account for < 1% of total time in most profiles. So, even if those handlers sped up 25%, the total page speedup is going to be marginal.

Wed, May 21, 2:47 AM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team, Parsoid

Yesterday

ssastry added a comment to T394736: Beta cluster: data-list-id is not being added on bookmark link.

This is now reproducible on testwiki as well. Looking at the line that throws the error seen on testwiki, that is only triggered if you got a valid list id, but only if $silent is not set to true. Who passes $slient? Is it from the JS code?
From this 'git grep' output I don't see any calls to setupForUser that calls it with a true value?

maintenance/populateWithTestData.php:                           $repository->setupForUser();
src/Api/ApiReadingListsSetup.php:               $list = $this->getReadingListRepository( $this->getUser() )->setupForUser();
src/ReadingListRepository.php:  public function setupForUser( $silent = false ) {
src/ReadingListRepository.php:   * Check whether reading lists have been set up for the given user (i.e. setupForUser() was
src/Rest/SetupHandler.php:                      $this->getRepository()->setupForUser();
Tue, May 20, 10:13 PM · Patch-For-Review, Beta-Cluster-reproducible, MediaWiki-extensions-ReadingLists, Web-Team
ssastry added a comment to T392261: Investigate crashers (out of memory, timeouts).

Memory limit is set to 1400 MiB in the config repo.

Tue, May 20, 8:29 PM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry added a comment to T394270: LogicException: Title not found!.

While this patch isn't yet deployed everywhere on wmf.1 (I see that the backport to wmf.1 is scheduled for a late backport window today), I can confirm the old failure on enwiki where this change isn't yet live.

curl -X POST -H "Content-Type: application/json" --data '{ "wikitext": "== Hello Jupiter ==" }' 'https://en.wikipedia.org/w/rest.php/v1/transform/wikitext/to/html/||DBMS_PIPE.RECEIVE_MESSAGE(CHR(98)||CHR(98)||CHR(98)%2C15)||' 
{"message":"Error: exception of type LogicException","httpCode":500,"httpReason":"Internal Server Error"}%
Tue, May 20, 7:23 PM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error
ssastry updated subscribers of T394270: LogicException: Title not found!.

In an attempt to verify and close this task, I ran into this. This is not from @mszabo's change but this response should not have been HTTP 403.

$ curl "https://en.wikipedia.org/w/rest.php/v1/page/||DBMS_PIPE.RECEIVE_MESSAGE(CHR(98)||CHR(98)||CHR(98)%2C15)||/html"
{"errorKey":"rest-permission-denied-title","messageTranslations":{"en":"The user does not have rights to read title (||DBMS_PIPE.RECEIVE_MESSAGE(CHR(98)||CHR(98)||CHR(98),15)||)"},"httpCode":403,"httpReason":"Forbidden"}% 
Tue, May 20, 7:18 PM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error
ssastry closed T317018: Make HtmlOutputRendererHelper use ParserOutputAccess, a subtask of T367074: Deprecate and remove ParsoidOutputAccess, as Resolved.
Tue, May 20, 7:09 PM · MW-1.44-release, MW-1.43-notes (1.43.0-wmf.15; 2024-07-23), Parsoid, Essential-Work
ssastry closed T317018: Make HtmlOutputRendererHelper use ParserOutputAccess as Resolved.
Tue, May 20, 7:09 PM · Essential-Work, Content-Transform-Team (Work In Progress), Technical-Debt
ssastry closed T346196: Wikimedia\Assert\InvariantException: Invariant failed: Expected valid DSR as Resolved.
Tue, May 20, 7:08 PM · Content-Transform-Team (Work In Progress), OKR-Work, Patch-For-Review, Parsoid, Wikimedia-production-error
ssastry added a comment to T346196: Wikimedia\Assert\InvariantException: Invariant failed: Expected valid DSR.

itwikisource:Pagina%3ATempesta.djvu%2F16 is another page that triggers this error.

Tue, May 20, 7:08 PM · Content-Transform-Team (Work In Progress), OKR-Work, Patch-For-Review, Parsoid, Wikimedia-production-error
ssastry added a comment to T392261: Investigate crashers (out of memory, timeouts).

Regarding OOMs, after excluding user pages and FST-based langconversion pages (which has known issues), I found at least two pages that are legitimate OOMs (haven't looked at others closely):

Tue, May 20, 6:01 PM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry created T394808: DOMTraverser performance experimentations: Maybe create optimized / specialized DOMTraversers.
Tue, May 20, 4:54 PM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team, Parsoid

Thu, May 15

ssastry created T394436: Pathological WrapTemplates performance.
Thu, May 15, 4:48 PM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry added a comment to T268785: IDEA: Move parallel tag parsing logic from Math to core.

T352451: Parsoid runs ParserAfterTidy & ParserAfterParse hooks multiple times, causing problems for DiscussionTools is related

Thu, May 15, 4:46 PM · Content-Transform-Team (Work In Progress), Parsoid, Math
ssastry added a project to T268785: IDEA: Move parallel tag parsing logic from Math to core: Content-Transform-Team (Work In Progress).

I think we will need to solve some version of this for Parsoid since the current solution doesn't help Parsoid mitigate latencies (See T392261#10824804 for example)

Thu, May 15, 4:42 PM · Content-Transform-Team (Work In Progress), Parsoid, Math
ssastry added a comment to T392261: Investigate crashers (out of memory, timeouts).

Spot-checking other wikis for last month:

  • nlwiki: all user pages
  • kowiki: no timeouts
  • jawiki: 14 across all namespaces, one user page & rest wikipedia namespace
  • frwiki: user pages OR project pages like this with large lists
  • itwiki: except user pages, wikipedia pages, project pages, there are 12 entries -- all of them seem to have been transient ones and are small pages and all use timeline charts (so could have been a transient timeline outage).
Thu, May 15, 4:21 PM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry added a comment to T392261: Investigate crashers (out of memory, timeouts).

It might have been the same thing with https://en.wikipedia.org/w/index.php?title=Yuri%27s_Night&action=history and https://en.wikipedia.org/w/index.php?title=Gagarin%27s_Start&action=history which show a number of deleted revisions.

Thu, May 15, 3:19 AM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry added a comment to T392261: Investigate crashers (out of memory, timeouts).

Aha .. so,, revid 1288359999 on enwiki:Sputnik_1 is a vandalized version and has 15323 uses of Template:Chem_name and 15835 uses of Template:Sic. Using --profile, it turns out that WrapTemplates explodes in time usage on that page and takes 35s! So, that is worth fixing.

Thu, May 15, 3:12 AM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry claimed T392261: Investigate crashers (out of memory, timeouts).
Thu, May 15, 2:59 AM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry added a comment to T392261: Investigate crashers (out of memory, timeouts).

That turned out to be mostly a nothingburger for the most part. Here is the dump of parse.php times on the above titles (after resolving redirects). So, except for the two math pages (Filters_in_topology, List_of_set_identities_and_relations), everything else parses pretty quickly and I confirmed with an "?action=purge" on two of the pages that the pages do render fine. So, except for those two titles, everything else turned out to be probably transient timeouts.

Thu, May 15, 2:59 AM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid

Wed, May 14

ssastry added a comment to T392261: Investigate crashers (out of memory, timeouts).

I downloaded the logstash data from the last month and extracted the exception urls, stripped the revision ids (and exclude File, Template, Category, *Talk namespaces as well) that had timeouts in the last month:

Ankh_Morpork_City_Watch
Battle_of_khaybar
Fairbanks%2C_Morse_and_Company
Filters_in_topology
Gagarin%27s_Start
Good_Morning%2C_Judge
List_of_Evolve_Tag_Team_Champions
List_of_set_identities_and_relations
Magnum_Airlines_Helicopters
New_Super_Mario_Bros._(series)
Sputnik_1
Yuri%27s_Night
Wed, May 14, 9:59 PM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry added a comment to T392261: Investigate crashers (out of memory, timeouts).

Looking just at enwiki timeouts in our Logstash dashboard for the last 3 months,

  • If I exclude the "User:" and "Wikipedia:" namespaces, we have 2072 timeouts and 1629 OOMs.
  • If I look at just the "User:" namespace, we have ~16000 timeouts, and ~19700 OOMs.
  • If I look at just the "Wikipedia:" namespace, we have ~6900 timeouts and ~5800 OOMs.
Wed, May 14, 9:41 PM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry added a project to T394296: parsoid pcache keys get are too big: Content-Transform-Team.

No, we should at least investigate this.

Wed, May 14, 2:26 PM · Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry added a comment to T394114: Definition list written on a single line cannot start with an image.

Parsoid handles this correctly.

$  echo "; [[File:FAQ icon (Noun like).svg|20px]] Responses to questions : such as defined and free-form responses" | php bin/parse.php                                                                    141 ↵
<dl data-parsoid='{"dsr":[0,105,0,0]}'><dt data-parsoid='{"dsr":[0,64,1,0,1,1]}'><span typeof="mw:File" data-parsoid='{"optList":[{"ck":"width","ak":"20px"}],"dsr":[2,40,null,null]}'><a href="./File:FAQ_icon_(Noun_like).svg" class="mw-file-description" data-parsoid="{}"><img resource="./File:FAQ_icon_(Noun_like).svg" src="//upload.wikimedia.org/wikipedia/commons/thumb/1/17/FAQ_icon_%28Noun_like%29.svg/20px-FAQ_icon_%28Noun_like%29.svg.png" decoding="async" data-file-width="38" data-file-height="31" data-file-type="drawing" height="16" width="20" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/1/17/FAQ_icon_%28Noun_like%29.svg/30px-FAQ_icon_%28Noun_like%29.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/1/17/FAQ_icon_%28Noun_like%29.svg/39px-FAQ_icon_%28Noun_like%29.svg.png 2x" class="mw-file-element" data-parsoid='{"a":{"resource":"./File:FAQ_icon_(Noun_like).svg","height":"16","width":"20"},"sa":{"resource":"File:FAQ icon (Noun like).svg"}}'/></a></span> Responses to questions</dt><dd data-parsoid='{"stx":"row","dsr":[64,105,1,0,1,0]}'>such as defined and free-form responses</dd></dl>
Wed, May 14, 2:46 AM · Patch-For-Review, Essential-Work, Content-Transform-Team (Work In Progress), MediaWiki-Parser, Parsoid

Tue, May 13

ssastry added a comment to T393904: Bump memory of testreduce1002.

Thanks!

Tue, May 13, 4:07 PM · Content-Transform-Team, serviceops
ssastry added a comment to T393904: Bump memory of testreduce1002.

Anytime today or tomorrow works. We'll hold off running rt-testing till the reboot happens.

Tue, May 13, 3:44 PM · Content-Transform-Team, serviceops

Mon, May 12

ssastry added a comment to T392260: Investigate performance outliers.

T306679 is the other task I worked on related to performance outliers which had some patches merged and deployed.

Mon, May 12, 9:59 PM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry claimed T393971: Reduce TokenHandlerPipeline overheads on pages with large token streams.
Mon, May 12, 9:55 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25)
ssastry added a comment to T393971: Reduce TokenHandlerPipeline overheads on pages with large token streams.

T391416#10814194 reports the benefits from focusing on this work so far on an outlier page.

Mon, May 12, 9:55 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25)
ssastry added a comment to T391416: Parsoid takes 8x as much time as legacy parser on this link-heavy page.

On current master (what is going to be tagged as v0.22.0-a2), parse time on this page is 0.68x of what it was on v0.21.0-a26. So, a pretty substantial improvement. Almost all of it comes from efficiencies in the token handler pipeline.

Mon, May 12, 9:54 PM · OKR-Work, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry added a comment to T392260: Investigate performance outliers.

T393971 is another task I just filed.

Mon, May 12, 9:44 PM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry created T393971: Reduce TokenHandlerPipeline overheads on pages with large token streams.
Mon, May 12, 9:43 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25)
ssastry closed T392260: Investigate performance outliers, a subtask of T392227: [EPIC] Parsoid Performance, as Resolved.
Mon, May 12, 9:36 PM · OKR-Work, Epic, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry closed T392260: Investigate performance outliers as Resolved.

This is what I have been doing with my patches that I've been submitted over the last 3 weeks. T391416 and T268584 has a bunch of tagged patches. I've looked at 5 or 10 pages at this point. I'll continue to do so and will file phab tasks based on analyses. I am going to close this task as resolved since this doesn't need any additional action beyond creating specific actionable tasks based on reviewing pages from that performance data spreadsheet.

Mon, May 12, 9:36 PM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry created T393904: Bump memory of testreduce1002.
Mon, May 12, 3:00 PM · Content-Transform-Team, serviceops

Fri, May 9

ssastry added a comment to T393726: Cache WikiLink processing in WikiLinkHandler.

The goal here is to cache the entire wikilink processing going from a PEG wikilink token --> a-link html tokens. Wikilinks are commonly repeated on pages.

Fri, May 9, 5:17 AM · Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid

Thu, May 8

ssastry updated the task description for T393726: Cache WikiLink processing in WikiLinkHandler.
Thu, May 8, 11:00 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry added a comment to T393726: Cache WikiLink processing in WikiLinkHandler.

Even a simple and small page like https://en.wikipedia.org/wiki/Hospet has titles that repeat.

Thu, May 8, 6:12 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry added a comment to T393726: Cache WikiLink processing in WikiLinkHandler.

For bonus points, the cache implementation is smart enough that [[Foo|bar]] and [[Foo]] would still benefit from caching.

Thu, May 8, 6:11 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry moved T393726: Cache WikiLink processing in WikiLinkHandler from Backlog to Q4 FY24-25 on the Content-Transform-Team (Work In Progress) board.
Thu, May 8, 6:09 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry created T393726: Cache WikiLink processing in WikiLinkHandler.
Thu, May 8, 6:09 PM · Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid

Wed, May 7

ssastry added a comment to T268584: Introduce compound tokens in the parsing pipeline.

The patches above create compound tokens for List & Indent-Pre. Tables are a bit trickier -- I haven't looked into it.

Wed, May 7, 10:10 PM · Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry claimed T268584: Introduce compound tokens in the parsing pipeline.
Wed, May 7, 8:05 PM · Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid

Mon, May 5

ssastry added projects to T268584: Introduce compound tokens in the parsing pipeline: Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress).
Mon, May 5, 8:07 PM · Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry merged T392766: Optimize TokenHandlers to skip past large chunks of tokens en masse where not applicable into T268584: Introduce compound tokens in the parsing pipeline.
Mon, May 5, 8:07 PM · Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry merged task T392766: Optimize TokenHandlers to skip past large chunks of tokens en masse where not applicable into T268584: Introduce compound tokens in the parsing pipeline.
Mon, May 5, 8:07 PM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry moved T268584: Introduce compound tokens in the parsing pipeline from Tech Debt / Big changes to Performance on the Parsoid board.
Mon, May 5, 8:07 PM · Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry added a comment to T348254: Add ParserOutput::getHtmlHolder().

I am not complaining about the incremental approach. All I am saying is: to deal with performance concerns, you have to eliminate unnecessary format conversions. That means your passes will (have to) fall into format-aligned buckets which you can use to make format a pipeline property.

Mon, May 5, 5:53 PM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), MediaWiki-Parser, Parsoid
ssastry added a comment to T348254: Add ParserOutput::getHtmlHolder().

I think creating a HTMLHolder or ContentHolder is orthogonal to what I am recommending.

Mon, May 5, 4:39 PM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), MediaWiki-Parser, Parsoid

Fri, May 2

ssastry moved T329457: IP Masking Considerations: services/parsoid from Code Review to To Verify on the Content-Transform-Team (Work In Progress) board.
Fri, May 2, 6:35 PM · Content-Transform-Team (Work In Progress), Trust and Safety Product Team, Essential-Work, Temporary accounts
ssastry moved T391788: Parser limit reporting doesn't work on FlaggedRevs pages from To Deploy to To Verify on the Content-Transform-Team (Work In Progress) board.
Fri, May 2, 6:35 PM · Content-Transform-Team (Work In Progress), MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), FlaggedRevs
ssastry moved T391869: PHP Warning: Undefined property: Wikimedia\Parsoid\NodeData\DataMw::$caption from Backlog to To Verify on the Content-Transform-Team (Work In Progress) board.
Fri, May 2, 6:06 PM · Essential-Work, Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error
ssastry updated subscribers of T393134: References displaying as "0" in Wikipedia mobile view while showing correctly in desktop view.
Fri, May 2, 5:10 PM · Content-Transform-Team (Work In Progress), Page Content Service, Wikipedia-iOS-App-Backlog (iOS Release FY2024-25), Wikipedia-Android-App-Backlog (Android Release - FY2024-25), Patch-For-Review, Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2025-04-30, Cite

Tue, Apr 29

ssastry created T392939: AttributeExpander::any could be calling PipelineUtils::expandAttrValueToDOM with repeating content.
Tue, Apr 29, 5:41 PM · OKR-Work, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid

Sun, Apr 27

ssastry moved T392766: Optimize TokenHandlers to skip past large chunks of tokens en masse where not applicable from Needs Triage to Performance on the Parsoid board.
Sun, Apr 27, 3:18 AM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry created T392766: Optimize TokenHandlers to skip past large chunks of tokens en masse where not applicable.
Sun, Apr 27, 3:17 AM · Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid

Fri, Apr 25

ssastry added a comment to T306679: (Performance) Excessive backtracking processing template markup.

This is because of TemplateHandlers::convertToString(..) call for invalid template names. That effectively leads to 2^N calls to PipelineUtils::parseContentInPipeline where N is the number of wrappers in that particular template. The real fix is to stop tokenizing template names and args in the PEG tokenizer since Parsoid shuttles template expansions to the preprocessor, so trying to tokenize and resolve template names inside Parsoid is not very useful. In the interim, there are a few patches coming that will mitigate this issue.

Fri, Apr 25, 7:35 PM · OKR-Work, Patch-For-Review, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error
ssastry added a comment to T245464: Use php-hrtime monotonic clock instead of microtime for perf measure in MW.

I didn't know about this task beforehand, but https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1138519 fixed this in Parsoid because we needed more precise/accurate profiles than what microtime gave us.

Fri, Apr 25, 6:04 PM · MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), MediaWiki-Platform-Team (Roadmap), MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), MW-1.42-notes (1.42.0-wmf.3; 2023-10-31), Patch-For-Review, MediaWiki-libs-Stats, Wikimedia-Performance-recommendation, User-jijiki, MediaWiki-libs-BagOStuff

Thu, Apr 24

ssastry moved T391416: Parsoid takes 8x as much time as legacy parser on this link-heavy page from Backlog to In Progress on the Content-Transform-Team (Work In Progress) board.
Thu, Apr 24, 2:34 PM · OKR-Work, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry edited projects for T391416: Parsoid takes 8x as much time as legacy parser on this link-heavy page, added: Content-Transform-Team (Work In Progress); removed Content-Transform-Team.
Thu, Apr 24, 2:34 PM · OKR-Work, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid
ssastry claimed T391416: Parsoid takes 8x as much time as legacy parser on this link-heavy page.
Thu, Apr 24, 2:34 PM · OKR-Work, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid

Tue, Apr 22

ssastry moved T306679: (Performance) Excessive backtracking processing template markup from Backlog to Performance on the Parsoid-Read-Views (Performance and Cache research Q4 FY24-25) board.
Tue, Apr 22, 6:35 PM · OKR-Work, Patch-For-Review, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error
ssastry added a project to T306679: (Performance) Excessive backtracking processing template markup: Parsoid-Read-Views (Performance and Cache research Q4 FY24-25).
Tue, Apr 22, 6:34 PM · OKR-Work, Patch-For-Review, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error
ssastry moved T306679: (Performance) Excessive backtracking processing template markup from Backlog to In Progress on the Content-Transform-Team (Work In Progress) board.
Tue, Apr 22, 6:31 PM · OKR-Work, Patch-For-Review, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error
ssastry claimed T306679: (Performance) Excessive backtracking processing template markup.
Tue, Apr 22, 6:31 PM · OKR-Work, Patch-For-Review, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error

Apr 20 2025

ssastry added a comment to T388935: HTTP 500 Timeout trying to reach page with >50000 links.
Apr 20 2025, 4:05 PM · MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Essential-Work, Content-Transform-Team (Work In Progress), Performance Issue, MassMessage
ssastry added a comment to T384151: DiscussionTools gives error for the second message being written.

Is this your local wiki? That patch hasn't yet been deployed to the WMF production wikis and will only go out the week of April 28th.

Apr 20 2025, 3:48 PM · OKR-Work, Patch-For-Review, MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid, Content-Transform-Team (Work In Progress), DiscussionTools

Apr 18 2025

ssastry added a project to T325322: Performance implications of using dynamic properties in NodeData in newer versions of PHP: Content-Transform-Team.
Apr 18 2025, 1:05 AM · Patch-For-Review, OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid

Apr 17 2025

ssastry created T392273: MassMessage: Add reasonable resource limits on the size of mass message lists.
Apr 17 2025, 9:53 PM · Performance Issue, MassMessage
ssastry closed T388935: HTTP 500 Timeout trying to reach page with >50000 links as Resolved.
Apr 17 2025, 9:49 PM · MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Essential-Work, Content-Transform-Team (Work In Progress), Performance Issue, MassMessage
ssastry added a comment to T388935: HTTP 500 Timeout trying to reach page with >50000 links.

This page now renders, so this task is technically roeslved.

Apr 17 2025, 8:51 PM · MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Essential-Work, Content-Transform-Team (Work In Progress), Performance Issue, MassMessage
ssastry placed T389623: Parsoid fails to parse [[{}]] single curly brackets inside double rectangular brackets up for grabs.
Apr 17 2025, 8:44 PM · OKR-Work, Parsoid-Read-Views (Wiktionary Q3 FY2024-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry moved T389623: Parsoid fails to parse [[{}]] single curly brackets inside double rectangular brackets from To Verify to Backlog on the Content-Transform-Team (Work In Progress) board.
Apr 17 2025, 8:44 PM · OKR-Work, Parsoid-Read-Views (Wiktionary Q3 FY2024-25), Content-Transform-Team (Work In Progress), Parsoid
ssastry moved T392078: Fix broken parsoid grafana dashboards from Backlog to Observability on the Parsoid-Read-Views (Performance and Cache research Q4 FY24-25) board.
Apr 17 2025, 8:42 PM · MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), OKR-Work, Patch-For-Review, Content-Transform-Team (Work In Progress), Parsoid, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25)
ssastry moved T392079: Investigate open telemetry support for parsoid from Backlog to Observability on the Parsoid-Read-Views (Performance and Cache research Q4 FY24-25) board.
Apr 17 2025, 8:42 PM · OKR-Work, Content-Transform-Team (Work In Progress), Parsoid, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25)
ssastry added a comment to T392079: Investigate open telemetry support for parsoid.

Here are two possibilities:

  • Take a look at ParserPIpelineFactory and the definition for the "fullparse-wikitext-to-dom" pipeline for the top-level stages to create spans for. The only difference would be that I would collapse TokenTransform2 and TokenTransform3 into a single span called "TokenTransforms"
  • Alternatively, take a look at the output emitted (and embedded in a HTML comment at the bottom of the page) by parse.php --profile (recommend running this on parsoidtest1001) and use that output as a recipe for what spans to emit.
Apr 17 2025, 8:41 PM · OKR-Work, Content-Transform-Team (Work In Progress), Parsoid, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25)
ssastry updated the task description for T392263: Look at [[Special:LongPages]] and try to extract a performance benchmark.
Apr 17 2025, 6:39 PM · OKR-Work, Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Content-Transform-Team (Work In Progress), Parsoid

Apr 15 2025

ssastry closed T387694: PHP Fatal Error from line 146 of /srv/mediawiki/php-1.44.0-wmf.18/vendor/wikimedia/parsoid/src/Wt2Html/TT/TokenHandler.php: Maximum execution time of 210 seconds exceeded as Declined.
Apr 15 2025, 9:17 PM · Content-Transform-Team (Work In Progress), Performance Issue, Parsoid, Wikimedia-production-error
ssastry claimed T387694: PHP Fatal Error from line 146 of /srv/mediawiki/php-1.44.0-wmf.18/vendor/wikimedia/parsoid/src/Wt2Html/TT/TokenHandler.php: Maximum execution time of 210 seconds exceeded.

Seems like a largish page. Even with legacy parser, the page times out --> www.wikidata.org/wiki/Wikidata:Database reports/Constraint violations/P17?useparsoid=0 .. this is another instance of a page used by a Bot as a log file / database table.

Apr 15 2025, 9:16 PM · Content-Transform-Team (Work In Progress), Performance Issue, Parsoid, Wikimedia-production-error
ssastry added a comment to T391416: Parsoid takes 8x as much time as legacy parser on this link-heavy page.

That is good then and explains why RedLinking time isn't worse. But, given that both legacy and parsoid need to do redlinking ... parsoid taking 2.5s just for redlinking when legacy takes 3.5 for the entire parse indicates some other inefficiency lurking there, but it is not going to be the biggest bang for the buck. Maybe worth looking at accounting for THP time since it makes up 50% of it.

Apr 15 2025, 7:58 PM · OKR-Work, Content-Transform-Team (Work In Progress), Parsoid-Read-Views (Performance and Cache research Q4 FY24-25), Parsoid

Apr 14 2025

ssastry added a comment to T391760: Wikitext constructs across table and templats are not properly parsed.
Apr 14 2025, 4:17 AM · Parsoid-Read-Views (Small Size Wikipedias), Parsoid

Apr 13 2025

ssastry added a comment to T391760: Wikitext constructs across table and templats are not properly parsed.

Investigation notes: getReparseType() and other code have incorrect checks for "in extension content" because the wrapper node is both from an extension *and* from a template and so it skips over the entire template content when the first node is also an extension tag. T87274: DOM nodes with multiple typeof values is related.

Apr 13 2025, 10:46 PM · Parsoid-Read-Views (Small Size Wikipedias), Parsoid
ssastry added a project to T391760: Wikitext constructs across table and templats are not properly parsed: Content-Transform-Team.
Apr 13 2025, 6:53 PM · Parsoid-Read-Views (Small Size Wikipedias), Parsoid

Apr 11 2025

ssastry assigned T391729: maplink stripmarker seen on frwikivoyage map to cscott.
Apr 11 2025, 9:52 PM · Content-Transform-Team (Work In Progress)
ssastry created T391729: maplink stripmarker seen on frwikivoyage map.
Apr 11 2025, 9:50 PM · Content-Transform-Team (Work In Progress)
ssastry closed T391728: Wikimedia\Assert\InvariantException: Invariant failed: Bad UTF-8 at end of string (2 byte sequence) as Resolved.

Cannot reproduce anymore. The page renders.

Apr 11 2025, 9:48 PM · Parsoid, Content-Transform-Team, Wikimedia-production-error
ssastry closed T389641: PHP Warning: Attempt to read property "start" on null as Resolved.

Cannot reproduce this anymore -- must have been fixed by some deployed code.

Apr 11 2025, 9:43 PM · Essential-Work, Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error
ssastry closed T389642: PHP Warning: Undefined property: Wikimedia\Parsoid\NodeData\DataParsoid::$dsr, a subtask of T379874: ☂ PHP 8.1 issues found during WMF rollout/ramp up, as Resolved.
Apr 11 2025, 9:43 PM · Content-Transform-Team, MediaWiki-Platform-Team (Radar), PHP 8.1 support, WMF-General-or-Unknown
ssastry closed T389642: PHP Warning: Undefined property: Wikimedia\Parsoid\NodeData\DataParsoid::$dsr as Resolved.

Cannot reproduce this anymore -- must have been fixed by some deployed code.

Apr 11 2025, 9:43 PM · Content-Transform-Team (Work In Progress), PHP 8.0 support, Parsoid, Wikimedia-production-error
ssastry created T391728: Wikimedia\Assert\InvariantException: Invariant failed: Bad UTF-8 at end of string (2 byte sequence).
Apr 11 2025, 9:38 PM · Parsoid, Content-Transform-Team, Wikimedia-production-error
ssastry moved T391042: CTT tasks week of 2025-04-04 from Backlog to In Progress on the Content-Transform-Team (Work In Progress) board.
Apr 11 2025, 9:31 PM · MW-1.44-notes (1.44.0-wmf.24; 2025-04-08), Content-Transform-Team (Work In Progress), Essential-Work
ssastry closed T380768: Deploy Parsoid Read Views to incubator (week of 2025-03-31), a subtask of T378477: [EPIC] Roll-out Parsoid to Incubator Wikis and newly created wikis, as Resolved.
Apr 11 2025, 7:38 PM · Parsoid-Read-Views (Incubator Wiki Q3 FY2024-25), incubator.wikimedia.org, Epic
ssastry closed T380768: Deploy Parsoid Read Views to incubator (week of 2025-03-31) as Resolved.
Apr 11 2025, 7:38 PM · Parsoid-Read-Views (Incubator Wiki Q3 FY2024-25), Content-Transform-Team (Work In Progress)
ssastry moved T380768: Deploy Parsoid Read Views to incubator (week of 2025-03-31) from Q4 FY24-25 to To Verify on the Content-Transform-Team (Work In Progress) board.
Apr 11 2025, 7:38 PM · Parsoid-Read-Views (Incubator Wiki Q3 FY2024-25), Content-Transform-Team (Work In Progress)
ssastry closed T390499: TypeError: Argument 1 passed to Wikimedia\Parsoid\Utils\DOMDataUtils::getBag() must be an instance of Wikimedia\Parsoid\DOM\Document, instance of Wikimedia\Parsoid\DOM\DocumentFragment given as Resolved.
Apr 11 2025, 7:35 PM · Content-Transform-Team (Work In Progress), Parsoid
ssastry moved T390499: TypeError: Argument 1 passed to Wikimedia\Parsoid\Utils\DOMDataUtils::getBag() must be an instance of Wikimedia\Parsoid\DOM\Document, instance of Wikimedia\Parsoid\DOM\DocumentFragment given from To Deploy to To Verify on the Content-Transform-Team (Work In Progress) board.
Apr 11 2025, 7:35 PM · Content-Transform-Team (Work In Progress), Parsoid
ssastry closed T381182: Cite error reporting discrepancy (for auto-generated references section with a group attribute) between Parsoid & legacy versions, a subtask of T372709: Missing cite error message and category, as Resolved.
Apr 11 2025, 7:34 PM · MW-1.44-notes (1.44.0-wmf.3; 2024-11-12), Parsoid-Read-Views (Phase 1 - DiscussionTools support), Content-Transform-Team-WIP, Patch-For-Review, WMDE-TechWish-Sprint-2024-08-21, WMDE-References-FocusArea, WMDE-TechWish-Sprint-2024-08-13, Parsoid
ssastry closed T381182: Cite error reporting discrepancy (for auto-generated references section with a group attribute) between Parsoid & legacy versions as Resolved.

After purge, this issue looks fixed.

Apr 11 2025, 7:34 PM · MW-1.44-notes (1.44.0-wmf.24; 2025-04-08), Parsoid-Read-Views (Incubator Wiki Q3 FY2024-25), Content-Transform-Team (Work In Progress), Cite
ssastry moved T381182: Cite error reporting discrepancy (for auto-generated references section with a group attribute) between Parsoid & legacy versions from To Deploy to To Verify on the Content-Transform-Team (Work In Progress) board.
Apr 11 2025, 7:32 PM · MW-1.44-notes (1.44.0-wmf.24; 2025-04-08), Parsoid-Read-Views (Incubator Wiki Q3 FY2024-25), Content-Transform-Team (Work In Progress), Cite