Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
a9f2434e6f88afafbe1ec099bd21a2363a0df692 is one example where a uniform mechanism would have been useful, instead we manually recreated the trigger_error mechanism.
"Extra update MedaiWiki jobs due to Wikifunctions content" will be the mediawiki_refreshlinks_parsercache_operations_total metric with status=cache_miss and has_async_content=true. This the total number of refresh links jobs with async content. If you look at the label async_not_ready=true then these are jobs which are going to need to be repeated once the async content is ready, and so they are "extra" update jobs. In addition, there will be a few extra update jobs with async_not_ready=false when entries fall out of the parser cache, do to the way we currently handle updating async content. So the range of "extra" jobs is between the lower bound of the # of jobs with async_not_ready=true and the upper bound of the # of jobs with has_async_content=true. We can refine this metric further if/when the upper bound gets close to our SLI limit.
Closing this task as (1) category sorting was enabled as an option in core, and (2) investigation revealed that many editors currently expect categories to be an ordered list *not* a set. So we'll have to apply that learning to metadata update when we work on selective update.
Verified in parser tests; can't verify in production because production doesn't (yet) use Parsoid for the refreshlinksjob.
Verified that limit report appears on https://de.wikipedia.org/wiki/Johanne_Karoline_Wilhelmine_Spazier
Added a patch to add 'href'.
Currently
https://en.wikipedia.org/w/index.php?title=H2O&redirect=no&useparsoid=1 contains <link rel="mw:PageProp/redirect" href="./Water" id="mwAg"> and
https://en.wikipedia.org/w/index.php?title=H2O&redirect=no&useparsoid=0 contains <link rel="mw:PageProp/redirect"> I'm not sure why the href is missing?
https://www.wikifunctions.org/wiki/Special:Version lists {{#function}} as a Parsoid-only module, seems to work.
The goal here is to move to the more standards-compliant PHP8.4 implementations as soon as WMF production is ready for them. We intend to switch to the PHP8.4 DOM classes in CI right away (https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/704745), so we don't regress.
I wonder about the security (DoS) implications of allowing clients to control cacheability. I think 'no-store' is the potentially problematic one, given that you are interpreting 'no-cache' as "no stale" not actually "no cache". But "no-store" could be used to DoS us, and I feel like we should put some guardrails around its use.
Tag #ctt-attention on gerrit if you upload a patch for this.
I am recommending a rollback of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1133094 rather than a new release of Parsoid to address this for this week's train, as the strict label checking will cause a stats dropout of other affected code in addition to causing logspam.
It turns out this is an upstream spec bug: see https://github.com/whatwg/dom/issues/849 and https://github.com/whatwg/dom/issues/769.
In T393983#10814639, @matmarex wrote:There is a class_alias, but Composer doesn't support generating autoload data for class aliases, so the class is not loaded when the alias is used.
Closing this task as we renamed a bunch of these with the prometheus migration.
We've added a lot of new performance metrics as part of the integration with core, inheriting most of core's limit report metrics, new metrics from the parser cache, etc. Closing.
That design seems feasible. Just a note that in addition to the "report visual bug" UX we're also looking for a replacement design for the "parsoid icon" indicator to indicate that parsoid is in use on the page -- the replacement could be "no replacement", a message in the footer, a revised indicator, etc -- we should just document what the proposed replacement is.
Without "category" being sortable, you can't recreate the old organization of the page (grouped by category), which seems like a regression in functionality.
Note that the work I'm doing for T393391: Refactor PEG grammar for transclusions also can result in "parsed less" [[...]] and [...] tokens. So maybe it makes sense to wait to tackle this one until I've landed that task, so we're not creating conflicts?
I did a quick test in the above patch of removing dynamic properties in DataMw and replacing them with an associative array and __get/__set magic methods.
At the last engineering offsite, we decided that putting lints into ParserOutput was the future direction here. I'm going to close this as a dupe of T393717: Put lints in ParserOutput/RefreshLinksJob to reflect this consensus.
We decided at the last engineering offsite that Lints are going to be put into ParserOutput, and that we'll move the DB maintenance to RefreshLinksJob, although not until RLJ is powered by Parsoid (T393716). I'm going to close this as a duplicate of T393717: Put lints in ParserOutput/RefreshLinksJob to reflect this consensus.
And just for the record, my response to this:
Design is going to synthesize a concrete recommendation here by end of May and CTT will implement it in June.
In T392775#10775623, @hector.arroyo wrote:I think the work done in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1139193 for this task may already be covered by the patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1138396 we are working on for T389474: CheckUser: Special:GlobalContributions should highlight temporary accounts, since we are moving the logic to determine the CSS classes to apply form CheckUser's GlobalContributions (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1137000) to core.
Yes, but the whole point of ContentHolder was to avoid having to boil the ocean in a single day. We have existing passes out there, and existing users of text-based passes via the extension postprocess hook (including DiscussionTools, and ParserOutput serialization/deserialization). By creating an abstraction we can work toward the goal of moving everything to DOM without having to do it all at once. And, as @ihurbain notes, the existing pipelines we have to work with are very sensitive to stage order (unfortunately!) so "just" moving all the DOM-based passes to the front/back isn't an easy solution.