Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
I'm still getting PhanCompatibleTrailingCommaParameterList on https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1166266 for some reason (PS).
Sorry for wasting your time! I assumed that wikipeg got a bump to mediawiki-phan-config at the same time @Jdforrester-WMF bumped the min version in the composer.json.
It appears like Parsoid allows much more complex nesting of <ref> tags. This was intentionally blocked in the classic parser.
We don't have to manually review 4000 method signatures. We can assume that the original authors gave their method reasonable names, and review the fewer cases in the Make method parameter names consistent for PHP 8 named arguments patch where there seem to be done disagreement. If even that is too hard, we can add @no-named-arguments only to those classes where these is an existing disagreement, as highlighted by that patch.
This is still causing some production errors due to bad content stuck in the ParserCache, but that should resolve itself with time.
Yeah, almost certainly fixed, the bad content (html with content outside of the <body>) just takes a while to leave the parser cache.
Yeah, we probably fixed it, there's just bad content in the cache that will take a while to expire.
Almost certainly a dup of T390629: Wikimedia\Assert\UnreachableException: Trying to fetch node data without loading! which I thought we'd fixed.
Almost certainly the same problem as T390629: Wikimedia\Assert\UnreachableException: Trying to fetch node data without loading!, which I thought we'd fixed.
This is a dup of T390344: v3 parserfunction serialization doesn't properly support named arguments and is caused by using named arguments with wikifunctions, which is not (yet) supported.
To be clear, under current conventions PHP (and phan) will allow named argument calls unless there is an explicit @no-named-arguments annotation. There's a lot of work whichever way -- either someone needs to write a new phan plugin to enforce a new "no named arguments unless ..." and a new annotation type (unique to mediawiki) for permitting named arguments ... or we can use the existing tooling to sync up our argument names in short order with a small number of explicit @no-named-arguments annotations where appropriate (@Tacsipacsi makes the case that SpecialPage::execute() may be one such exception).
One option would be to make displaytitle a separate metadata property independent of the page wikitext, like language links are now. Then the parser function could echo the metadata without causing circularity issues with wikitext parsing.
Updated the patchdemo with the render information as a separate sentence: https://patchdemo.wmcloud.org/wikis/d6bfc596bf/wiki/New_York_City
My wild guess would be that Parsoid is logging something (probably something inconsequential) when linting is turned on, and that's interfering with the naive way that the AF test is looking for its log entries:
In T398175#10958561, @Daimona wrote:The AF test in question is only making edits and then checking what change tags were applied to that edit. From a quick look at the code (getActionTags), it grabs the logging entry for the given page with the maximum log_id and matching type. This can obviously fail if tags were not applied, but also if it happens to be picking the wrong log entry.
In T397789#10958985, @ArielGlenn wrote:Speaking just out of self-interest here, I would love this feature to be widely used for the simple sake of readability. I really appreciate reading a call fooSomething($page=blah, $user=blat, $showAll=true) instead of fooSomething(blah, blat, true). So whatever we can reasonably do to get to "widely used", I'm in favour of.
Discussed this task at the Content-Transform-Team tech forum today. A few notes:
It's also worth noting briefly that by default psalm propagates the name of the base class/interface to all of its subclasses/implementations. It's relatively easy to do a bulk rename by renaming the parameter on the base class/interface and then re-running psalm --alter --issues=ParamNameMismatch; you can cherry-pick the base psalm configuration from DNM: add psalm checker. So we can have a discussion about what 'good' parameter names are during the 1.45 cycle and make updates fairly easily before we fix the names in place going forward.
In T397789#10957318, @daniel wrote:This would immediately cement the names the paramters of all existing stable methods. These names were not chosen with long term stability and consistency in mind. We'd have to go over all of them and fix at least the really bad ones, since changing them later is rather fiddly.
Let's also consider how you would safely rename a parameter.
function foo($oldName) { ... }
first new add the new name, making it available for invocation as a named parameter, while preserving argument order:
function foo($newName, $oldName=null) { if ($oldName!==null) { $newName = $oldName; wfDeprecated(__METHOD__ . ' with $oldName', '1.xx'); } }
then after the appropriate amount of time, the trailing $oldName parameter can be removed.
This can be based on the code we added to the Linter to do real time performance comparsions: T393399.
Just as a straw proposal: why not do the opposite? Method names should be considered part of the API unless you have an explicit @no-named-arguments annotation?
In T396813#10945301, @matmarex wrote:In T396813#10943490, @cscott wrote:Named arguments require that all implementations/overrides of a method use the same name for the method arguments. psalm has a sniff for this and an automatic fixer, I don't know if phan/phpcs do? This should be turned on IMO as it is a prerequisite to using named arguments in our codebase. (I've already merged patches to fix Parsoid's argument naming, written by psalm.)
phpcs wouldn't really do it, it's mostly for syntax-level code conventions, and doing anything that needs to analyze the world like this is terribly unpleasant.
phan could do it, but it doesn't quite: it checks function/method calls (which are always a runtime error), but doesn't check method overrides (which only cause a runtime error at call time, but the overriding itself is fine according to PHP, just ill-advised – see Parameter name changes during inheritance). I could probably write a plugin or an upstream patch to warn about this too.
See also T397789: Define stable interface policy and coding conventions for named arguments (I just created it based on discussion elsewhere).
Patch demo at https://patchdemo.wmcloud.org/wikis/e5a16a7d5c/wiki/New_York_(magazine) -- no back end yet, but this is a good place to check the UX layout of the dialog.
In his review, @Arlolra pointed out that the footer text is not used in the MinervaNeue skin. (I tested every other skin available on patchdemo and they looked fine).
(I've also looked at the code for Extension:KeepTitle which actually seems like a relatively small and efficient way to do what you want, using existing Hooks in completely appropriate ways. I'm inclined to think it would be easier to have WMF adopt that extension than to have WMF resource a new #redirect functionality, but ¯\_(ツ)_/¯
For the record, I'm on a personal quest (T204370) to eventually replace all the "magic words" in wikitext with a uniform alternative {{#....}} syntax. This would make it easier to add additional arguments, as was noted. So:
{{#redirect|NewTitle|special message}}
is a somewhat reasonable straw-proposal. That would put 'old title' => 'special message' in a db table (maybe a new column of the existing redirects table), and then the existing code which puts the redirect header at the top (OutputTransform/Stages/AddRedirectHeader.php and the post-cache part of WikitextContentHandler::fillParserOutput which calls setRedirectHeader) could swap in a custom redirect header where appropriate. Since this modification was done post-cache it wouldn't require splitting the parser cache.
As I think @bvibber probably notes, implementing this literally as the bug description states would be A Bad Idea, since it would mean the each redirected title would have to be cached as an entirely separate object, since the content of each page could depend on arbitrary ways on the "original title". That's a nonstarter, and I suspect the task description should be changed to describe /what/ is wanted, ie "text at the top that depends on the original redirect title" not the /how/ (a specific magic word).
This was probably related to the fix I landed in 4cd16aa25622b2f63ba57dc8fef9c344ae2e4e5b?
This is partially done: we have ParserOutput::setFromParserOptions(ParserOptions $po) now which transfers a number of things: wrapper class (T303615), section edit link flag, the collapsible sections flag (T363496) and the preview flag (T341010). The missing piece is to add a list of these in ParserOptions akin to the existing $initialCacheVaryingOptionsHash, $initialLazyOptions, etc lists so that these "only for ParserOutput" options aren't included in the cache key -- right now collapsibleSections and suppressSectionEditLinks are included in the cache key, and all of these are included in ParserOptions::matches() etc.
Wrapclass was removed from the parser cache key in 8b0e7298ac25ccbb83d382bcb81f6c9433d053ca in 2017.
Likely an extension returning bad UTF-8 data. The stack trace seems to indicate that the bad UTF-8 was wrapped as a strip marker. https://he.wikipedia.org/wiki/%D7%95%D7%99%D7%A7%D7%99%D7%A4%D7%93%D7%99%D7%94%3A%D7%97%D7%93%D7%A9%D7%95%D7%AA doesn't display any errors at the present, and during February we were still actively working on the fragment/strip marker code.
I suspect this has been fixed by our continued work in this area. In particular, we don't round-trip through HTML strings much (if at all) any more, so the issue roundtripping through the string form should have been fixed.
We've implemented a reasonable strategy for now; see patches above. We also have metrics in place (part of the wikifunctions SLO/SLA) which will allow us to determine if our caching strategy works/backfires/etc. So we can resolve this for now, and open a new task if the metrics show that we need to revisit our initial caching strategy.
Verified that the last instance of this exception was on Jun 19.
The Content-Transform-Team in theory owns Reading Lists, but none of us have actually touched that code AFAIK. I think clearly communicating the sunset is preferable to maintaining an illusion of support.
Space normalization (replace all spaces with _, compress runs of spaces) is part of the wiki title normalization and shouldn't be changed.
<p><wiki-chart><div>......</div></wiki-chart></p> is parsed by every browser as <p><wiki-chart></wiki-chart></p><div>......</div><p></p> so that's not a parsoid bug per se. It's one of the dangers of using custom elements, which are always parsed as inline AFAIK.
Named arguments require that all implementations/overrides of a method use the same name for the method arguments. psalm has a sniff for this and an automatic fixer, I don't know if phan/phpcs do? This should be turned on IMO as it is a prerequisite to using named arguments in our codebase. (I've already merged patches to fix Parsoid's argument naming, written by psalm.)
I like the second idea (add useparsoid=1 to ApiQueryMapData).
It is stuck awaiting review. It got to a C+1 and never managed to get merged. It's not clear which team is ultimately responsible for the review.
Sorry, you'll need the parser migration extension installed, and to have parsoid read views enabled for your account:
https://www.mediawiki.org/wiki/Help:Extension%3AParserMigration
That is installed on beta and testwiki, I believe, so it should just be a matter of enabling parsoid read views in your user preferences.
{T4700: Pre-save transform skips extensions using wikitext (gallery, references, footnotes, Cite, status indicators, pipe trick, subst, signatures)}, which has a patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/981193 but https://gerrit.wikimedia.org/r/c/mediawiki/core/+/607367 is the better patch I think.
Because the legacy parser is pre-cache, we might already be splitting the parser cache by user variant, but parsoid expects to do language conversion post cache. It is more technically correct to use user language here because user language strings are also pre-converted, excluded from language converter, etc. That said, your content might be hidden behind a strip marker and hidden from language converter anyway.
Welcome to my favorite misfeature of PHP: numeric strings used as keys to associative arrays are silently converted to ints. I count 16 (string) casts in ParserOutput alone which are all due to this same misfeature. phan doesn't flag this, likely because the false positive rate would be high. I've changed a number of APIs to make the problem less likely to occur for clients, for example ParserOutput::getCategoryNames() returns a list<string> which in connection with ParserOutput::getCategorySortKey() avoiding the problems of the previous interface (::getCategoryMap()) which exported a array<string,string> directly which was really/secretly an array<string|int,string> trapping the unwary (::getCategoryMap() at least documents the correct type now).
This orphan tag shows up locally using the API as well, but you have to use body_only=false:
$ php bin/parse.php --domain da.wiktionary.org --page mercredi --wrapSections --body_only=false ... </section></section></body>
Something is adding bogus HTML to the bottom of the document before the OutputTransform stages get started:
<!DOCTYPE html> <html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head prefix="mwr: https://da.wiktionary.org/wiki/Special:Redirect/"><meta charset="utf-8"/><meta property="mw:pageId" content="0"/><meta property="mw:pageNamespace" content="0"/> ... </section></li></section></section></body>
I can reproduce on parsoidtest1001:
$ sudo -u www-data php /srv/mediawiki/multiversion/MWScript.php view.php --wiki=dawiktionary mercredi > mercredi.wt $ sudo -u www-data php /srv/mediawiki/multiversion/MWScript.php parse.php --wiki=dawiktionary -p < mercredi.wt
seems like I should add a --page option to core's parse.php to save a step here. regardless, now I have something to look at...
Unreachable exceptions shouldn't be reached. Not sure it makes sense to try to normalize this, as opposed to fixing the bug.
Reproducible: https://da.wiktionary.org/wiki/mercredi?useparsoid=1 ; note that https://da.wiktionary.org/wiki/mercredi?useparsoid=0 is fine. Also running via the API of integrated modes on parsoidtest1001 is fine; this is a bug in the OutputTransform pipeline after Parsoid is done parsing.
Indeed, it looks like we're using some information in data-parsoid to indicate whether an explicit sort key should be provided, and the id attribute is telling selser to use the data-parsoid from the previous element. We should be flagging this as modified (and thus not using the old data-parsoid) but apparently are not. Yep, probably a bug in Parsoid.
Ok, Bug fix for 'Rule reference variables should not be affected by failed rules' and the patch it follows up, Rule reference variables should not be affected by failed rules, will probably fix this once we move Parsoid to the new wikipeg.
To briefly update: part of the issue here is how we sanitize content cut-and-pasted into VisualEditor. VE uses a somewhat restrictive sanitizer. Supporting custom components naively would require updating the sanitizer every time to include each custom tag name defined by an extension.
@mwilliams anything learned from the "bit of time next week to explore a couple other possible solutions", or is this design ready to implement?
The proposed summary fails to mention a couple of additional implementations, including in Parsoid. This strikes me as a https://xkcd.com/927/ issue.
https://en.wikipedia.org/wiki/User:Cscott/Opportunities_for_Content_Transform_Team mentions both Subbu's Typed Templates proposal as well as a number of other possible initial clients/use cases.
An effective reproduction of the backtracking is this snippet:
* a {{bar|1 * b