Nemo_bis (Nemo)
User

Projects (81)
View All

Calendar

User Details

User Since: Oct 10 2014, 2:32 PM (554 w, 16 h)
Availability: Available
LDAP User: Unknown
MediaWiki User: Nemo bis [ Global Accounts ]

Wikimedia cross-wiki coordination and L10n/i18n. Mainly active on Wikiquote, Wiktionary, Wikisource, Commons, Wikidata, Wikibooks. And of course Meta-Wiki, translatewiki.net.

Contact me by MediaWiki.org email or user talk.

Recent Activity
View All

Thu, May 22

Nemo_bis closed T166281: Allow user to toggle on/off to include academic social networks as Declined.

Nowadays we only use Unpaywall and the number of ResearchGate or Academia.edu suggestions is negligible.

Thu, May 22, 8:55 PM · OABot

Nemo_bis added a comment to T164153: Integrate OABot tool with Wikipedias other than English Wikipedia.

Given how many years it has taken us to babysit OAbot on the English Wikipedia to do a fraction of what originally envisioned, I'm starting to wonder whether this should instead be done with an extension, similar to the SecureLinkFixer extension. After all, sending visitors to the websites of legacy publishers is a clear and present danger. The Unpaywall snapshot could be imported similar to what the Tor extension does, or redirection could be delegated to oadoi.org. Ways can be devised to leave more control to local wikis.

Thu, May 22, 8:43 PM · OABot

Nemo_bis lowered the priority of T164153: Integrate OABot tool with Wikipedias other than English Wikipedia from High to Medium.

Thu, May 22, 8:37 PM · OABot

Nemo_bis moved T395087: Remove links to PubMed from the url parameter from Bugs to Functionality on the OABot board.

Thu, May 22, 8:37 PM · OABot

Nemo_bis lowered the priority of T344114: Remove doi-access=free when Unpaywall no longer confirms it from High to Medium.

Thu, May 22, 8:37 PM · OABot

Nemo_bis triaged T395087: Remove links to PubMed from the url parameter as Medium priority.

Thu, May 22, 8:36 PM · OABot

Nemo_bis updated the task description for T395087: Remove links to PubMed from the url parameter.

Thu, May 22, 8:36 PM · OABot

Nemo_bis created T395087: Remove links to PubMed from the url parameter.

Thu, May 22, 8:34 PM · OABot

Nemo_bis created T395086: Valid repository link incorrectly marked as url-access=subscription.

Thu, May 22, 8:31 PM · OABot

Nemo_bis closed T283717: Add PMC ID even if doi-access=free, a subtask of T196255: Do not take existing URL or identifier for granted, as Resolved.

Thu, May 22, 8:06 PM · OABot

Nemo_bis closed T283717: Add PMC ID even if doi-access=free as Resolved.

Generally speaking, this has been working fine for a while. Example: https://en.wikipedia.org/w/index.php?title=MIM_Museum&diff=prev&oldid=1291114435

Thu, May 22, 8:06 PM · OABot

Wed, May 21

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Current most popular DOI prefixes

$ find ~/www/python/src/bot_cache -maxdepth 1 -type f -print0 | xargs -0 -P16 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=|")) | .orig_string' | grep doi | grep -Eo 'doi *= [^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40
jq: error: Could not open file /data/project/oabot/www/python/src/bot_cache/ISO#IEC_2022.json: No such file or directory
parse error: Invalid numeric literal at line 1, column 6
   1933 10.1074/jbc.
   1194 10.1038/sj.onc
    705 10.1126/science.
    512 10.1098/rsbm.
    396 10.4049/jimmunol.
    385 10.1093/hmg
    370 10.1111/syen.
    304 10.1096/fj.
    284 10.1001/jama.
    250 10.1242/jcs.
    213 10.1096/fasebj.
    204 10.11646/zootaxa.
    202 10.1182/blood
    162 10.1016/j.febslet
    138 10.1038/sj.mp
    127 10.1182/blood.
    111 10.1242/dev.
    103 10.1016/s
    100 10.1111/j.
    100 10.1002/art.
     87 10.1210/jcem.
     87 10.1167/iovs.
     85 10.1111/j.1432-1033
     81 10.1093/brain
     81 10.1016/j.
     80 10.1038/onc.
     80 10.1001/archinte.
     77 10.1093/humupd
     76 10.1038/sj.leu
     75 10.1242/jeb.
     75 10.1098/rstl.
     74 10.1093/mnras
     74 10.1002/ijc.
     73 10.1001/archneur.
     72 10.1007/s
     70 10.4269/ajtmh.
     70 10.1146/annurev
     66 10.1016/j.cell
     64 10.1542/peds.
     62 10.1124/pr.

Wed, May 21, 7:40 PM · OABot

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

Rare cases of removal of url-access=subscription do not seem very useful: https://en.wikipedia.org/w/index.php?title=Economy_of_Russia&diff=prev&oldid=1291484551

Wed, May 21, 2:59 PM · OABot

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

The most common proposed changes in the bot queue, currently not acted upon, are:

Wed, May 21, 2:57 PM · OABot

Tue, May 20

Nemo_bis added a comment to T164152: Integrate with Wikidata.

I think we should decline this for good, since the Wikidata graph split has been completed and the future of WikiCite data on Wikidata remains uncertain.

Tue, May 20, 9:40 PM · OABot

Nemo_bis changed the status of T377015: pdfs.semanticscholar.org URL added to citation with doi-access=free from Invalid to Resolved.

As rephrased, the issue has been solved.

Tue, May 20, 9:39 PM · OABot

Nemo_bis renamed T377015: pdfs.semanticscholar.org URL added to citation with doi-access=free from oabot should not suggest adding a link when doi access=free to pdfs.semanticscholar.org URL added to citation with doi-access=free.

Tue, May 20, 9:38 PM · OABot

Nemo_bis closed T377015: pdfs.semanticscholar.org URL added to citation with doi-access=free as Invalid.

Thanks for sharing this discussion between those two users. The tool already avoids adding URLs when the doi-access=true is confirmed to be correct (cf. T344114). Links to repositories are added for additional safety when the DOI link appears to be closed.

Tue, May 20, 9:37 PM · OABot

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

A [February 2025 RfC](https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(policy)/Archive_201#h-RFC:_Allow_for_bots_(e.g._Citation_bot)_to_remove_redundant_URLs_known_to_not_ho-20250217092500) on the English Wikipedia has explicitly endorsed removing PubMed and OCLC URLs which do not provide a full text.

Tue, May 20, 9:35 PM · OABot

Nemo_bis closed T354471: Re-assess repository links Unpaywall found on CiteSeerX, a subtask of T283717: Add PMC ID even if doi-access=free, as Resolved.

Tue, May 20, 9:24 PM · OABot

Nemo_bis closed T354471: Re-assess repository links Unpaywall found on CiteSeerX as Resolved.

All the cases mentioned above seem now ok in Unpaywall, from some spot checks.

Tue, May 20, 9:24 PM · OABot

Nemo_bis closed T228702: Relax author and year match? as Declined.

No longer relevant as Dissemin has closed: T394853.

Tue, May 20, 9:22 PM · OABot

Nemo_bis created T394853: Remove calls to Dissemin.

Tue, May 20, 9:22 PM · OABot

Nemo_bis added a comment to T345041: TimeoutError with PDF retrieval from some repositories.

Not sure whether this is still happening.

Tue, May 20, 9:16 PM · OABot

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

In the past year, redundant links have grown from 90k to 120k, so clearly this is more necessary than ever...

Tue, May 20, 9:15 PM · OABot

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

I've finally merged the PR as oabot has been running that code for over a year without problems now. https://github.com/dissemin/oabot/pull/91

Tue, May 20, 9:14 PM · OABot

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

34 more examples which seem bronze OA from my manual check, out of 71 OAbot found Unpaywall says are closed (the rest I mostly couldn't verify).

Tue, May 20, 9:04 PM · OABot

Nemo_bis closed T363061: Repeated false positive for Corless+(1996) as Resolved.

Thanks for the report and sorry for the annoying experience. Errors about individual DOIs are best reported to Unpaywall directly. The issue has since been fixed, as doi:10.1007/BF02124750 is now considered closed access.

BF02124750.json1 KBDownload

Tue, May 20, 8:53 PM · OABot

Nemo_bis closed T376782: oabot should link to the pdf att zookeys instead of s2 as Invalid.

It's true it would be good to link e.g. doi:10.3897/zookeys.43.390 if it weren't linked already, but the bot already does that.

Tue, May 20, 8:51 PM · OABot

Nemo_bis closed T376780: S2 pdf is 404 but suggested as url anyway by the app as Invalid.

Thanks for the report. Next time please apply the edit and revert it, or include the suggestion citation, or at least mention the DOI. Links to suggestions expire after a few weeks as they get deleted from the cache.

Tue, May 20, 8:47 PM · OABot

Nemo_bis closed T358431: Adding links to pdfs.semanticscholar.org is not needed because these URLs are proxy links to an article itself rahter than the file, and the URL is calculated by wikipedia from a s2cid attribute as Invalid.

OABot adds URLs to http://pdfs.semanticscholar.org/8775/3fa9d86e28e1fb332f1509f3519e5b3a9c0d.pdf which redirects

Tue, May 20, 8:39 PM · OABot

Nemo_bis closed T376779: Add s2cid instead of pdf link when the pdf is found in S2 as Declined.

The s2cid parameter does not autolink, so it's not a substitute for the url parameter. See also Why does the oabot tool make edits the bot doesn't?.

Tue, May 20, 8:35 PM · OABot

Apr 10 2025

Nemo_bis added a comment to T391020: Howto mass download Wikimedia Commons images?.

It seems clear to me that we need a mirror of Wikimedia Commons files. Ideally we would have kept both the media tarballs at your.org and the WikiTeam collection at the Internet Archive up to date, but we've not managed to keep up after 2012 and 2016.

Apr 10 2025, 7:12 PM · Fiwiki-Wikidata-Commons

Mar 16 2025

Nemo_bis added a project to T349671: Cannot upload on Commons or even here: SRE-swift-storage.

Mar 16 2025, 9:50 AM · SRE-swift-storage, Traffic, SRE

Nemo_bis added a parent task for T271530: Better protect against data loss and corruption during file uploads: T289996: Media storage metadata inconsistent with Swift or corrupted in general.

Mar 16 2025, 9:49 AM · Platform Team Workboards (Clinic Duty Team), MediaWiki-Uploading

Nemo_bis added a subtask for T289996: Media storage metadata inconsistent with Swift or corrupted in general: T271530: Better protect against data loss and corruption during file uploads.

Mar 16 2025, 9:49 AM · Commons, MediaWiki-File-management, media-backups, SRE-swift-storage

Nemo_bis added a comment to T42304: Reupload/overwriting of old version of a file fails, multiple files are uploaded under same title, old revisions are lost.

The investigation at T263301, T271530 gives some idea of what might have happened here.

Mar 16 2025, 9:47 AM · Multimedia, UploadWizard

Mar 15 2025

Nemo_bis awarded T383243: Zuul/Jenkins: Investigate caching of build results for MediaWiki testsuite jobs a Love token.

Mar 15 2025, 12:32 PM · Continuous-Integration-Infrastructure, Release-Engineering-Team (Doing 😎)

Feb 8 2025

Nemo_bis triaged T193728: Address concerns about perceived legal uncertainty of Wikidata as Low priority.

Feb 8 2025, 9:12 AM · WMF-Legal, Wikidata

Feb 6 2025

Nemo_bis awarded T376297: Block traffic to RESTBase /page/related endpoint and sunset it a Love token.

Feb 6 2025, 12:29 PM · Content-Transform-Team (Work In Progress), serviceops-radar, Traffic, RESTBase, RESTBase Sunsetting

Jan 28 2025

Nemo_bis added a comment to T341665: Update the share links to deprioritise twitter and Facebook.

Also, I've tried the link from a recent post and it doesn't even work: it produces an empty post after one and two redirects. It seems nobody is using those links, as nobody noticed.

Jan 28 2025, 6:18 AM · wikimediafoundation.org

Nemo_bis updated subscribers of T341665: Update the share links to deprioritise twitter and Facebook.

Jan 28 2025, 6:06 AM · wikimediafoundation.org

Nemo_bis updated subscribers of T341665: Update the share links to deprioritise twitter and Facebook.

Another reason to do this is that Facebook doesn't even allow sharing links to some Wikimedia projects.

Jan 28 2025, 6:05 AM · wikimediafoundation.org

Jan 22 2025

Nemo_bis added a comment to T368098: Dumps generation cause disruption to the production environment.

Thanks for the update on XML data dumps list. I see there's progress on the other side: https://phabricator.wikimedia.org/T382947#10476420 . Hopefully this will allow to re-enable the dumps soon.

Jan 22 2025, 7:09 AM · DPE-Mediawiki-Content, Epic, Data-Engineering, MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Dumps-Generation, SRE

Jan 3 2025

Nemo_bis added a comment to T382069: Undeploy and archive ActiveAbstract.

IIRC these (and the OAI feeds) were added back in the day when the WMF got some corporate contribution to provide specialised data feeds. I imagine any contractual obligations have long expired (if they even existed), but I don't know who could verify that.

Jan 3 2025, 8:02 PM · translatewiki.net, Wikimedia-GitHub, Diffusion-Repository-Administrators, Projects-Cleanup, Data-Engineering, Dumps-Generation, ActiveAbstract

Oct 20 2024

Pppery awarded T106007: Add link to report form in "Fatal exception of type X" errorbox a Like token.

Oct 20 2024, 9:57 PM · WMF-General-or-Unknown

Sep 9 2024

Nemo_bis added a comment to T360041: Set query result retention time.

The query itself will remain, so getting fresh results should be nothing more than a submit query away.

Sep 9 2024, 3:11 PM · Quarry

Sep 2 2024

Nemo_bis added a comment to Blog Post: Moving performance testing tools out of AWS.

By running more tests and using Mann Whitney we know if a performance regression is of statistical significance. That way we can make sure that we only alert on real regressions. That decreases the number of false alerts and time spent investigating regressions.

Sep 2 2024, 6:19 AM · Synthetic-Performance-Testing

Jul 22 2024

Nemo_bis added a comment to T367528: Cloud VPS "dumps" project Buster deprecation.

We certainly don't want to be in the way. Feel free to delete the VMs. I was hoping to double check there's nothing to salvage in the local mounts but usually there shouldn't be anyway.

Jul 22 2024, 7:54 PM · Cloud-VPS (Debian Buster Deprecation)

Jul 16 2024

Nemo_bis added a comment to T368729: CLDR data for relative timestamps missing for Karakalpak.

As an update, I created the account and luckily we were still in time for this round of submissions (CLDR 46). It's always a good time to ask a CLDR account from me! Six months tend to fly by.

Jul 16 2024, 7:19 PM · MediaWiki-extensions-CLDR

Jul 12 2024

Nemo_bis awarded T343020: Converting MediaWiki Metrics to StatsLib a Love token.

Jul 12 2024, 6:56 AM · Essential-Work, Editing-team (Kanban Board), MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), SRE Observability (FY2024/2025-Q4), Patch-For-Review, Observability-Metrics

Jul 10 2024

Nemo_bis added a comment to T369675: Empty legacy upload/deletion/block logs, initial creations of Main Page on some wikis, and some unrelated old revisions on simplewiki, have blank timestamps, which render as the current time.

Maybe it could be retrieved from a very early dump or some other means

Jul 10 2024, 6:33 PM · Wikimedia-database-issue (Bad data)

Jul 1 2024

Nemo_bis claimed T367528: Cloud VPS "dumps" project Buster deprecation.

Jul 1 2024, 4:53 PM · Cloud-VPS (Debian Buster Deprecation)

Nemo_bis added a comment to T367528: Cloud VPS "dumps" project Buster deprecation.

@Hydriz Can I upgrade the VMs to Debian 11 one of these weekends? The only reason not to that I can think of is some scripts may require Python2, but that's still available in Debian 11.

Jul 1 2024, 4:53 PM · Cloud-VPS (Debian Buster Deprecation)

Jun 5 2024

Nemo_bis added a comment to T365690: Make it possible to access the Realtime API and On-demand API without authentication .

@HShaikh Please don't propagate myths. https://aeon.co/essays/the-tragedy-of-the-commons-is-a-false-and-dangerous-myth

Jun 5 2024, 8:16 PM · Wikimedia Enterprise

May 4 2024

Nemo_bis closed T309328: IP range-blocks should not block trusted logged-in users (autopatrolled, bot, bureaucrat, checkuser, interface-admin, steward) as Invalid.

I'm closing this task as unclear and not pertaining to MediaWiki core, mostly because it mixes different user groups and permissions some of which are Wikimedia-specific.

May 4 2024, 10:15 PM · Trust and Safety Product Team, User-Frostly, Community-consensus-needed, MediaWiki-Blocks

Nemo_bis claimed T309328: IP range-blocks should not block trusted logged-in users (autopatrolled, bot, bureaucrat, checkuser, interface-admin, steward).

May 4 2024, 10:14 PM · Trust and Safety Product Team, User-Frostly, Community-consensus-needed, MediaWiki-Blocks

Nemo_bis updated the task description for T204949: Audit local interface message overrides at translatewiki.net.

May 4 2024, 8:38 PM · User-Nikerabbit, translatewiki.net

Nemo_bis triaged T204949: Audit local interface message overrides at translatewiki.net as Low priority.

May 4 2024, 8:10 PM · User-Nikerabbit, translatewiki.net

Nemo_bis updated the task description for T204949: Audit local interface message overrides at translatewiki.net.

May 4 2024, 7:19 PM · User-Nikerabbit, translatewiki.net

Nemo_bis updated the task description for T204949: Audit local interface message overrides at translatewiki.net.

May 4 2024, 6:41 PM · User-Nikerabbit, translatewiki.net

Nemo_bis updated the task description for T204949: Audit local interface message overrides at translatewiki.net.

May 4 2024, 6:35 PM · User-Nikerabbit, translatewiki.net

Nemo_bis awarded T348388: SUL3: Use a dedicated domain for login and account creation a Mountain of Wealth token.

May 4 2024, 4:40 PM · Goal, OKR-Work, MediaWiki-Platform-Team (Roadmap), SUL3, Stewards-and-global-tools, MediaWiki-Core-AuthManager, MediaWiki-extensions-CentralAuth

May 3 2024

Nemo_bis added a comment to T363078: WQT: Automated Wikidata Entity Quality Checker with Language Models.

This reminds me a bit of the https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool , which I believe focused on identifying easy concepts like numbers. I've not used it in years.

May 3 2024, 5:46 PM · Wikimedia-Hackathon-2024

Nemo_bis updated the task description for T363078: WQT: Automated Wikidata Entity Quality Checker with Language Models.

May 3 2024, 5:03 PM · Wikimedia-Hackathon-2024

Nemo_bis added a comment to F49979116: Screenshot_20240503_163330.png.

https://www.mediawiki.org/wiki/Special:RecentChanges?useskin=vector&uselang=ksh after disabling JavaScript recentchanges:

Screenshot_20240503_163542.png (419×774 px, 87 KB)

May 3 2024, 1:38 PM

Nemo_bis updated subscribers of T160604: Frequency of message key renames.

@Mazevedo Here's an example old ticket which may or may not be relevant any more. :)

May 3 2024, 11:27 AM · I18n, Wikipedia-iOS-App-Backlog

Nemo_bis renamed T231755: Local language name should be translatable in translatewiki.net from Local language name should be translatable in translatewiki to Local language name should be translatable in translatewiki.net.

May 3 2024, 8:05 AM · MW-1.44-notes (1.44.0-wmf.14; 2025-01-28), Patch-For-Review, MW-1.43-notes (1.43.0-wmf.4; 2024-05-07), Wikimedia-Hackathon-2024, translatewiki.net, MediaWiki-extensions-CLDR

Nemo_bis added a comment to T231755: Local language name should be translatable in translatewiki.net.

Do you want to focus on the exonyms in languages which are supported by MediaWiki core (or at least translatewiki.net) but not in CLDR?

May 3 2024, 8:00 AM · MW-1.44-notes (1.44.0-wmf.14; 2025-01-28), Patch-For-Review, MW-1.43-notes (1.43.0-wmf.4; 2024-05-07), Wikimedia-Hackathon-2024, translatewiki.net, MediaWiki-extensions-CLDR

Apr 25 2024

Nemo_bis awarded T299694: Adding sicilian language (scn) a Love token.

Apr 25 2024, 8:58 PM · Phabricator (2024-04-23), translatewiki.net, I18n

Apr 20 2024

Nemo_bis added a comment to P43437 enwiki-20230120-pmc-redundanturl.txt.

That was with all namespaces.

Apr 20 2024, 8:15 AM · OABot

Apr 19 2024

Nemo_bis added a comment to P43437 enwiki-20230120-pmc-redundanturl.txt.

Current status

Apr 19 2024, 5:48 PM · OABot

Apr 17 2024

Nemo_bis placed T32442: Give the user a chance to specify the gender when creating the account up for grabs.

Apr 17 2024, 7:57 AM · Growth-Team, Gender-Support, I18n, MediaWiki-User-login-and-signup

Jan 16 2024

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

After the latest run

Jan 16 2024, 8:03 AM · OABot

Jan 14 2024

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Mostly fixed upstream.

Jan 14 2024, 1:29 PM · OABot

Jan 13 2024

Nemo_bis added a comment to T283717: Add PMC ID even if doi-access=free.

Not clear to me why this doi:10.1038/s41586-023-06291-2 got an arxiv but not pmc ID https://en.wikipedia.org/w/index.php?title=PubMed&diff=prev&oldid=1195324840

Jan 13 2024, 11:22 AM · OABot

Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

The new round seems to go fine so far https://en.wikipedia.org/w/index.php?title=Special:Contributions/OAbot&target=OAbot&dir=prev&offset=20240107000000&limit=50

Jan 13 2024, 11:17 AM · OABot

Jan 7 2024

Nemo_bis triaged T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers as Medium priority.

Jan 7 2024, 9:26 PM · OABot

Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

For the non-Unpaywall side, continues at T228702

Jan 7 2024, 9:19 PM · OABot

Nemo_bis added a comment to T228702: Relax author and year match?.

We're still discarding excess merges from Dissemin similar to the 2019 logic https://github.com/dissemin/oabot/commit/e3c74bff735c1ef16ee333dde2ac4bdd20949635 . We're not currently using the Dissemin title matches but if we did it would not be enough to check for title, author, year match: https://en.wikipedia.org/w/index.php?title=User_talk%3AOAbot&diff=1194216712&oldid=1193993325 .

Jan 7 2024, 9:18 PM · OABot

Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

Should be fixed with https://github.com/dissemin/oabot/pull/91/commits/1b49d999504b868c7d5eb4d4512300db1f55e871

Jan 7 2024, 9:07 PM · OABot

Nemo_bis updated the task description for T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

Jan 7 2024, 8:54 PM · OABot

Nemo_bis updated the task description for T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

Jan 7 2024, 8:52 PM · OABot

Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

There are over 6500k PMC matches and only 650k matches by title and author, of which some 60k appear without a PMCID match, so perhaps we can just ignore those europepmc matches:

$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep '"is_oa": true' | grep pmc | grep -c "oa repository (via pmcid lookup)"
6499014
$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep '"is_oa": true' | grep pmc | grep -c "oa repository (via OAI-PMH title and first author match)"
637491
$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep '"is_oa": true' | grep pmc | grep "oa repository (via OAI-PMH title and first author match)" | grep -vc "oa repository (via pmcid lookup)"
62310

Jan 7 2024, 7:47 PM · OABot

Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

Both papers on Unpaywall have evidence "oa repository (via OAI-PMH title and first author match)" although the PMC side exposes a link to the correct DOI. The CrossRef API has the page range like "113-128", "283-288", so it may be possible to check for the number of pages.

Jan 7 2024, 6:36 PM · OABot

Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

So we won't suggest edits like this either https://en.wikipedia.org/w/index.php?title=Saccharomyceta&curid=68064105&diff=1194087545&oldid=1182890284 as we don't get non-repository URLs from other sources.

Jan 7 2024, 9:37 AM · OABot

Jan 6 2024

Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

A sample of what kind of URLs we're talking about

Jan 6 2024, 5:34 PM · OABot

Nemo_bis raised the priority of T354471: Re-assess repository links Unpaywall found on CiteSeerX from Low to Medium.

Jan 6 2024, 2:48 PM · OABot

Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

Should be fixed by https://github.com/nemobis/oabot/commit/8895319d9fd65808b8a1cb41dd0ef29ed2987c43

Jan 6 2024, 2:47 PM · OABot

Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

Only 35k or so of these are in the best_oa_location (sometimes even when a separate match for arxiv exists, like doi:10.1002/rsa.20071 / oai:CiteSeerX.psu:10.1.1.237.8456 / oai:arXiv.org:math/0209357 ).

Jan 6 2024, 2:26 PM · OABot

Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

Not sure how to narrow this down, we're talking about some 500k matches from CiteSeerX (out of 900k):

$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep citeseerx | grep "oa repository (via OAI-PMH doi match)" | jq -r 'select(.oa_locations | .[] | .endpoint_id == "CiteSeerX.psu" and .evidence == "oa repository (via OAI-PMH doi match)" )|.doi' | wc -l
505747
$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep -c citeseerx
887759

Jan 6 2024, 1:59 PM · OABot

Nemo_bis added a parent task for T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers: T283717: Add PMC ID even if doi-access=free.

Jan 6 2024, 10:23 AM · OABot

Nemo_bis added a subtask for T283717: Add PMC ID even if doi-access=free: T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

Jan 6 2024, 10:23 AM · OABot

Nemo_bis created T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

Jan 6 2024, 10:08 AM · OABot

Nemo_bis added a parent task for T354471: Re-assess repository links Unpaywall found on CiteSeerX: T283717: Add PMC ID even if doi-access=free.

Jan 6 2024, 10:01 AM · OABot

Nemo_bis added a subtask for T283717: Add PMC ID even if doi-access=free: T354471: Re-assess repository links Unpaywall found on CiteSeerX.

Jan 6 2024, 10:01 AM · OABot

Nemo_bis updated the task description for T354471: Re-assess repository links Unpaywall found on CiteSeerX.