Fix re.fullmatch POSSESSIVE_REPEAT#7187
Conversation
|
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughThis PR modifies the SRE (Regex) engine to ensure explicit next-context handling when breaking to subsequent matching paths for possessive repeat and atomic group operations. The changes set Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin closes-#7183 |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/sre_engine/Cargo.toml (1)
17-19: Remove the redundant explicit[[test]]entry or document the reason for making it explicit.In Rust 2024 edition with
autotests = true(the default), Cargo automatically discoverstests/tests.rsas an integration test target. The explicit[[test]] name = "tests"without apathfield resolves to the same file that would already be discovered, making this entry redundant. If the intent is to make it explicit for documentation or consistency with the[[bench]]declaration above, that's acceptable—but if it serves no purpose, removing it would clean up unnecessary boilerplate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sre_engine/Cargo.toml` around lines 17 - 19, Remove the redundant explicit test table: delete the `[[test]]` block that only contains `name = "tests"` (the `[[test]] name = "tests"` entry) from Cargo.toml, or if you intend it for documentation/consistency, add a short comment above that block explaining why it's explicit (e.g., clarifying it's intentional to mirror the `[[bench]]` declaration); do not add a `path` field since Cargo already autodiscovers tests in Rust 2024 with `autotests = true`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sre_engine/src/engine.rs`:
- Around line 473-476: Typo fix: change the comment text "herited" to
"inherited" in all occurrences; locate the comments near the
ctx.next_offset(...) usage and the 'context label (e.g., the comment above the
let mut next = ctx.next_offset(4, Jump::PossessiveRepeat2); block and the two
other comments that reference "herited" around similar code paths) and update
the three comment instances to read "inherited".
---
Nitpick comments:
In `@crates/sre_engine/Cargo.toml`:
- Around line 17-19: Remove the redundant explicit test table: delete the
`[[test]]` block that only contains `name = "tests"` (the `[[test]] name =
"tests"` entry) from Cargo.toml, or if you intend it for
documentation/consistency, add a short comment above that block explaining why
it's explicit (e.g., clarifying it's intentional to mirror the `[[bench]]`
declaration); do not add a `path` field since Cargo already autodiscovers tests
in Rust 2024 with `autotests = true`.
youknowone
left a comment
There was a problem hiding this comment.
Looks good in general. Thank you for contributing! Please also check coderabbitai's review comments.
wvmscs
left a comment
There was a problem hiding this comment.
Recommended changes done
|
@wvmscs probably you forgot to push the changes. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
extra_tests/snippets/stdlib_re.py (1)
80-80: Unescaped.in pattern matches any character, not only literal dot.
(?:.[0-9]+)will match1X25Y38just as readily as1.25.38. The test still exercises the correct code path for the issue, but using\.better expresses intent.✏️ Proposed fix
-assert re.fullmatch(r"([0-9]++(?:.[0-9]+)*+)", "1.25.38") -assert re.fullmatch(r"([0-9]++(?:.[0-9]+)*+)", "1.25.38").group(0) == "1.25.38" +assert re.fullmatch(r"([0-9]++(?:\.[0-9]+)*+)", "1.25.38") +assert re.fullmatch(r"([0-9]++(?:\.[0-9]+)*+)", "1.25.38").group(0) == "1.25.38"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extra_tests/snippets/stdlib_re.py` at line 80, The regex in the re.fullmatch call uses an unescaped dot in the non-capturing group (pattern "([0-9]++(?:.[0-9]+)*+)"), so it will match any character instead of a literal period; update the pattern passed to re.fullmatch to escape the dot in the group (i.e., change the "(?:.[0-9]+)" portion to "(?:\.[0-9]+)") so the test enforces literal dots (reference the re.fullmatch invocation and the pattern string).crates/sre_engine/src/engine.rs (1)
473-473: Comments read as change descriptions rather than code explanations.
// modified next.toplevel from inherited to falsedescribes what was changed (commit-message style) rather than why the invariant exists. A future reader maintaining this code won't immediately understand the significance.✏️ Suggested wording (apply to all three sites)
- // modified next.toplevel from inherited to false - let mut next = ctx.next_offset(4, Jump::PossessiveRepeat2); - next.toplevel = false; + let mut next = ctx.next_offset(4, Jump::PossessiveRepeat2); + next.toplevel = false; // sub-pattern must not apply match_all/must_advance constraints- let mut next = ctx.next_offset(4, Jump::PossessiveRepeat4); - next.toplevel = false; // modified next.toplevel from inherited to false + let mut next = ctx.next_offset(4, Jump::PossessiveRepeat4); + next.toplevel = false; // sub-pattern must not apply match_all/must_advance constraints- let mut next_ctx = ctx.next_offset(2, Jump::AtomicGroup1); - next_ctx.toplevel = false; // modified next.toplevel from inherited to false + let mut next_ctx = ctx.next_offset(2, Jump::AtomicGroup1); + next_ctx.toplevel = false; // sub-pattern must not apply match_all/must_advance constraintsAlso applies to: 501-501, 841-841
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sre_engine/src/engine.rs` at line 473, Replace the commit-style comment "// modified next.toplevel from inherited to false" with a concise explanation of the invariant and rationale: state why next.toplevel must be set to false (what property/state would break if it remained inherited), what callers or code paths rely on this being false, and any assumptions about downstream behavior or performance; apply the same explanatory replacement at the other two occurrences that reference next.toplevel so future maintainers understand the intent and not just the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@extra_tests/snippets/stdlib_re.py`:
- Line 81: The assertion uses re.fullmatch(...).groups(0) which returns a tuple,
causing the comparison to always fail; change the call to use the singular
.group(0) (or .group(1) if you prefer the first capture) on the Match object
returned by re.fullmatch so the expression compares a string ("1.25.38") to a
string rather than a tuple; locate the call to re.fullmatch in the snippet and
replace .groups(0) with .group(0).
---
Nitpick comments:
In `@crates/sre_engine/src/engine.rs`:
- Line 473: Replace the commit-style comment "// modified next.toplevel from
inherited to false" with a concise explanation of the invariant and rationale:
state why next.toplevel must be set to false (what property/state would break if
it remained inherited), what callers or code paths rely on this being false, and
any assumptions about downstream behavior or performance; apply the same
explanatory replacement at the other two occurrences that reference
next.toplevel so future maintainers understand the intent and not just the
change.
In `@extra_tests/snippets/stdlib_re.py`:
- Line 80: The regex in the re.fullmatch call uses an unescaped dot in the
non-capturing group (pattern "([0-9]++(?:.[0-9]+)*+)"), so it will match any
character instead of a literal period; update the pattern passed to re.fullmatch
to escape the dot in the group (i.e., change the "(?:.[0-9]+)" portion to
"(?:\.[0-9]+)") so the test enforces literal dots (reference the re.fullmatch
invocation and the pattern string).
youknowone
left a comment
There was a problem hiding this comment.
Thank you so much! Welcome to RustPython project
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] lib: cpython/Lib/re dependencies:
dependent tests: (62 tests)
Legend:
|
This PR implements the corrections for the POSSESSIVE_REPEAT i.e. "(x)++)" given in #7183.
3 tests in the file test_re.py (in Lib/test) failed for unexpected success
UNEXPECTED SUCCESS: test_bug_gh91616 (test.test_re.ReTests.test_bug_gh91616)
UNEXPECTED SUCCESS: test_fullmatch_atomic_grouping (test.test_re.ReTests.test_fullmatch_atomic_grouping)
UNEXPECTED SUCCESS: test_fullmatch_possessive_quantifiers (test.test_re.ReTests.test_fullmatch_possessive_quantifiers)
I believe this was the old behavior before applying the correction, as the result is now the same as CPython 3.14.3 .
So, I also corrected the test file and add one new test for POSSESSIVE_REPEAT.
please review.
best regards
Summary by CodeRabbit
Bug Fixes
Tests