Skip to content

Conversation

@lfoppiano
Copy link
Collaborator

This PR fixes the URL extraction when the regular expression is shorter than the actual target (the annotated URL).

@coveralls
Copy link

coveralls commented Oct 24, 2024

Coverage Status

coverage: 40.768% (+0.01%) from 40.755%
when pulling 35ec905 on fix-url-extraction-regex-shorter
into be44579 on master.

@lfoppiano lfoppiano marked this pull request as ready for review October 26, 2024 04:54
@lfoppiano lfoppiano linked an issue Oct 31, 2024 that may be closed by this pull request
@lfoppiano lfoppiano requested a review from kermitt2 November 12, 2024 17:36
@lfoppiano
Copy link
Collaborator Author

lfoppiano commented Nov 13, 2024

Added a fix for the edge case:

image

Where genius editors are adding the - for breaking up an URL over two lines.

Here the document: https://doi.org/10.1038/s41588-024-01785-9

@lfoppiano lfoppiano self-assigned this Nov 21, 2024
@lfoppiano lfoppiano added this to the 0.8.2 milestone Nov 21, 2024
@lfoppiano lfoppiano merged commit 61162e7 into master Nov 25, 2024
8 checks passed
@lfoppiano lfoppiano deleted the fix-url-extraction-regex-shorter branch November 25, 2024 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

URLs where the regex capture less than the annotations are not consolidated with the clickable links from the PDF document

4 participants