more algorithm-independent GC infra by youknowone · Pull Request #7194 · RustPython/RustPython

youknowone · 2026-02-21T04:59:57Z

separated from #6849

Summary by CodeRabbit

New Features
- CPython-compatible generational garbage collector with cycle detection, finalization, resurrection handling, and uncollectable tracking.
- Deferred-drop and trashcan mechanisms to safely handle deep or recursive deallocation.
- Public GC triggers to run targeted collections.
Bug Fixes
- Reordered interpreter shutdown to run atexit, set finalizing, force GC, then finalize modules—improves finalization and unraisable reporting.
- Improved weakref/finalizer and GC untracking semantics.
Chores
- Spell-check vocabulary expanded; GC feature enabled in build config.

coderabbitai · 2026-02-21T05:00:06Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Implements a CPython-style generational GC, refactors reference counting into a unified atomic state with deferred-drop support, adds a per-thread trashcan for deferred deallocation, reorders interpreter shutdown and module finalization, and enables the gc feature in the workspace dependency.

Changes

Cohort / File(s)	Summary
Spell-check dictionaries `\.cspell.dict/cpython.txt`, `\.cspell.json`	Added new vocabulary entries: `untracking`, `worklist`, `finalizers`, `uncollectable`, `weaked`.
Workspace/Cargo `Cargo.toml`	Enabled `gc` feature for `rustpython-vm` dependency (was `compiler` only).
Refcount & deferred drops `crates/common/src/refcount.rs`	Replaced per-field strong counter with an internal `State` (atomic usize); refcount ops use state bitfields; removed `MAX_REFCOUNT`; added thread-local deferred-drop queue, DeferredDropGuard, and APIs: `with_deferred_drops`, `try_defer_drop`, `flush_deferred_drops`.
Generational GC & GcState `crates/vm/src/gc_state.rs`	Implemented CPython-style generational collector: added `tracked_objects` and `finalized_objects`, `collect` / `collect_force`, and `collect_inner` with reachability analysis, resurrection handling, weakref/callback ordering, promotions, and `GcStats.uncollectable`.
Object lifecycle & trashcan `crates/vm/src/object/core.rs`, `crates/vm/src/builtins/type.rs`	Added per-thread Trashcan with `begin`/`end` to defer deep recursive deallocations and processing queue; integrated into `default_dealloc`; static types explicitly untracked via `gc_state().untrack_object(...)`.
Module finalization & shutdown `crates/vm/src/vm/interpreter.rs`, `crates/vm/src/vm/mod.rs`	Reordered shutdown: run atexit handlers before setting `finalizing`, perform forced GC passes around module removal and dict clearing, changed `finalize_remove_modules` to return only weak refs, and adjusted module/dict clearing phases.
WeakRef insertion / locking `crates/vm/src/object/core.rs`	Two-phase weakref insertion: fast-path under stripe lock, allocate outside lock, then re-acquire lock for linked-list insertion to avoid deadlocks.
Signals & thread formatting `crates/vm/src/signal.rs`, `crates/vm/src/vm/thread.rs`	Marked `trigger_signals` as `inline(never)` and `cold`; minor TLS whitespace change.

Sequence Diagram(s)

sequenceDiagram
    participant Interpreter
    participant VM as VirtualMachine
    participant GC as GcState
    participant Heap as HeapObjects
    participant Modules

    Interpreter->>VM: finalize()
    VM->>VM: run atexit handlers
    VM->>VM: set finalizing flag
    VM->>GC: collect_force(generation)
    GC->>Heap: build refs map & mark reachable
    GC->>Heap: identify unreachable, run weakref callbacks
    GC->>Heap: finalize unreachable (handle resurrection)
    VM->>Modules: finalize_remove_modules() -> Vec<(name, weak)>
    VM->>Modules: finalize_clear_module_dicts()
    VM->>GC: collect_force(generation) (post-removal)
    VM->>Modules: clear sys & builtins dicts
    VM->>GC: collect_force(generation) (final)

sequenceDiagram
    participant Dealloc as Deallocator
    participant Trashcan
    participant Queue as DeferredQueue
    participant GC as GcState

    Dealloc->>Trashcan: begin(obj, dealloc_fn)
    alt depth <= threshold
        Trashcan-->>Dealloc: allow immediate dealloc
        Dealloc->>GC: untrack object
        Dealloc->>Dealloc: tp_clear / drop children
        Dealloc->>Trashcan: end()
        Trashcan->>Queue: process deferred items iteratively
    else depth > threshold
        Trashcan-->>Dealloc: defer
        Trashcan->>Queue: push (obj, dealloc_fn)
        Dealloc-->>Trashcan: return (deferred)
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

dealloc and finalize_modules #6934 — Overlaps changes to interpreter shutdown and module finalization, including finalize ordering and deallocation paths.
gc module internal structure and API #6910 — Overlapping GC internals changes and expansion of GcState API (collect/collect_force).
Add GC infrastructure: object tracking, tp_clear, and helper methods #6994 — Related GC infrastructure and object deallocation changes touching gc_state.rs and object/core.rs.

Suggested reviewers

ShaharNaveh
fanninpm
coolreader18

Poem

🐇 I nibble at cycles and tidy the heap,

I stash runaway drops where they might sleep.
GC sings, weakrefs whisper, survivors climb high,
A trashcan guards depths, defers with a sigh.
Hop—modules clear, the VM breathes a bright sky.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'more algorithm-independent GC infra' accurately describes the main focus of the changeset, which involves refactoring garbage collection infrastructure with algorithm-independent abstractions.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

crates/vm/src/vm/interpreter.rs (1)

386-395: ⚠️ Potential issue | 🟡 Minor

Doc comment is out of sync with the actual shutdown sequence.

The numbered steps in the doc comment don't match the code:

Doc step	Doc says	Code actually does
3	Wait for thread shutdown	✅ threading._shutdown
4	Set finalizing flag	❌ atexit runs here
5	Call threading._shutdown()	❌ duplicate of step 3; finalizing flag set here
6	Run atexit	❌ GC collect_force here
7	GC pass and module cleanup	✅ finalize_modules
8	Final GC pass	❌ code does flush_std, no separate final GC

Suggested doc fix

-    /// 1. Set finalizing flag (suppresses unraisable exceptions).
-    /// 1. Call threading._shutdown() to join non-daemon threads.
-    /// 1. Run atexit exit functions.
-    /// 1. GC pass and module cleanup (finalize_modules).
-    /// 1. Final GC pass.
+    /// 1. Run atexit exit functions.
+    /// 1. Set finalizing flag (suppresses unraisable exceptions from __del__).
+    /// 1. Forced GC collection pass (collect cycles while builtins are available).
+    /// 1. Module finalization (finalize_modules).
+    /// 1. Final stdout/stderr flush.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/interpreter.rs` around lines 386 - 395, Update the
"Finalization steps" doc comment so it matches the actual shutdown sequence in
this file: enumerate the real order (flush stdout/stderr via flush_std, handle
exit exception -> exit code, call threading._shutdown to wait for threads, set
the finalizing flag (suppresses unraisable exceptions), run atexit exit
functions, run GC collect_force (where the doc previously claimed atexit ran
elsewhere), call finalize_modules for module cleanup, and note any final
flush/GC behavior; remove the duplicate threading._shutdown step and correct the
step labels to reference the actual calls (threading._shutdown, atexit,
collect_force, finalize_modules, flush_std) so the numbered list matches the
code flow.

crates/vm/src/vm/mod.rs (1)

1122-1149: ⚠️ Potential issue | 🔴 Critical

scopeguard::defer! is placed after execution, defeating panic safety.

The tracing + execution block (lines 1122–1132) runs before the scopeguard::defer! guard is created (line 1136). If f(frame) or trace_event panics, the guard doesn't exist yet, so cleanup (restoring owner, popping exception, popping frame, decrementing recursion depth) never runs — leaving the VM in a corrupted state.

Compare with with_frame_exc (line 1056) where the guard is created before the execution block (line 1078). This method should follow the same pattern: SETUP → GUARD → EXECUTE.

🐛 Proposed fix: move scopeguard before execution

     let old_owner = frame.owner.swap(
         crate::frame::FrameOwner::Thread as i8,
         core::sync::atomic::Ordering::AcqRel,
     );

+    // Ensure cleanup on panic: restore owner, pop exception, frame chain, frames Vec,
+    // and recursion depth.
+    scopeguard::defer! {
+        frame.owner.store(old_owner, core::sync::atomic::Ordering::Release);
+        self.exceptions.borrow_mut().stack
+            .pop()
+            .expect("pop_exception() without nested exc stack");
+        crate::vm::thread::set_current_frame(old_frame);
+        self.frames.borrow_mut().pop();
+        #[cfg(feature = "threading")]
+        crate::vm::thread::pop_thread_frame();
+
+        self.recursion_depth.update(|d| d - 1);
+    }
+
     use crate::protocol::TraceEvent;
     let result = match self.trace_event(TraceEvent::Call, None) {
         Ok(()) => {
             let result = f(frame);
             if result.is_ok() {
                 let _ = self.trace_event(TraceEvent::Return, None);
             }
             result
         }
         Err(e) => Err(e),
     };

-    // Ensure cleanup on panic: restore owner, pop exception, frame chain, frames Vec,
-    // and recursion depth.
-    scopeguard::defer! {
-        frame.owner.store(old_owner, core::sync::atomic::Ordering::Release);
-        self.exceptions.borrow_mut().stack
-            .pop()
-            .expect("pop_exception() without nested exc stack");
-        crate::vm::thread::set_current_frame(old_frame);
-        self.frames.borrow_mut().pop();
-        #[cfg(feature = "threading")]
-        crate::vm::thread::pop_thread_frame();
-
-        self.recursion_depth.update(|d| d - 1);
-    }
-
     result
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 1122 - 1149, The defer cleanup
(scopeguard::defer!) must be created before running the trace/execution so
panics still trigger cleanup; move the scopeguard::defer! block to just after
setting up the frame/owner/exception/recursion state and before calling
self.trace_event(TraceEvent::Call, None) and f(frame). Ensure TraceEvent::Return
is still emitted on successful result (keep the existing trace_event call after
f(frame)), and preserve the same cleanup actions inside the guard
(frame.owner.store(...), self.exceptions.borrow_mut().stack.pop(),
crate::vm::thread::set_current_frame(old_frame), self.frames.borrow_mut().pop(),
cfg thread pop_thread_frame(), and self.recursion_depth.update(|d| d - 1)) so
any panic during trace_event or f(frame) will run the guard.

🧹 Nitpick comments (4)

crates/common/src/refcount.rs (2)

43-46: Dead arithmetic: / COUNT and * COUNT where COUNT = 1

COUNT is defined as 1, so the division in strong() and the multiplication in add_strong() are no-ops that add noise. Simplify to direct mask/add:

♻️ Proposed simplification

 #[inline]
 fn strong(self) -> u32 {
-    ((self.inner & STRONG) / COUNT) as u32
+    (self.inner & STRONG) as u32
 }

 #[inline]
 fn add_strong(self, val: u32) -> Self {
-    Self::from_raw(self.inner + (val as usize) * COUNT)
+    Self::from_raw(self.inner + val as usize)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/refcount.rs` around lines 43 - 46, The code does
unnecessary arithmetic because COUNT is 1; simplify by removing the redundant /
COUNT in strong() and * COUNT in add_strong(): in the strong() method use
((self.inner & STRONG) as u32) (or equivalent masking) to extract the strong
count directly, and in add_strong() add the increment directly to the strong
bits instead of multiplying by COUNT; update any related bit-manipulation
comments to reflect the simplification and keep the STRONG and COUNT constants
as needed (or remove COUNT if unused).

250-252: #[inline] on a Vec-allocating function is a misleading hint

flush_deferred_drops drains and collects into a new Vec on every call. The heap allocation prevents meaningful inlining and the attribute gives the compiler a misleading signal. Consider removing it, or if the goal is to inline the fast-path empty-queue check, restructure accordingly.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/refcount.rs` around lines 250 - 252, The #[inline]
attribute on flush_deferred_drops is misleading because the function always
allocates a new Vec when draining, preventing meaningful inlining; remove the
#[inline] on the flush_deferred_drops function or restructure it so the cheap
fast-path (e.g., an early is_empty check on the deferred queue) is performed in
a small #[inline] helper that returns early while the heavier Vec-allocating
work remains in a non-inlined function; locate flush_deferred_drops and either
drop the attribute or split out the empty-queue check into an inline helper to
preserve the intended optimization.

crates/vm/src/gc_state.rs (2)

464-480: Traversal of referents is called twice over the collecting set — consider caching.

gc_get_referent_ptrs() is an unsafe call that traverses each object's references. It's invoked once in the gc_refs subtraction loop (lines 470–479) and again in the reachability traversal (lines 500–507) for every reachable object. For large object graphs this doubles the traversal work.

If gc_get_referent_ptrs() is non-trivial (e.g., it allocates or scans), caching the adjacency list (HashMap<GcObjectPtr, Vec<GcObjectPtr>>) during the first pass and reusing it in the second would halve the unsafe calls and improve collection latency.

Also applies to: 496-509

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/gc_state.rs` around lines 464 - 480, The code currently calls
unsafe obj.gc_get_referent_ptrs() twice per object; change to compute and store
each object's adjacency once and reuse it: during the initial loop over
collecting (the block that iterates over collecting, reads
obj.gc_get_referent_ptrs(), and updates gc_refs) build a HashMap<GcObjectPtr,
Vec<GcObjectPtr>> (or similar) keyed by gc_ptr and populate it from
obj.gc_get_referent_ptrs(); then in the later reachability traversal (the code
that walks reachable objects) read referents from that cache instead of calling
gc_get_referent_ptrs() again. Update lookups to use the cache where you
currently call gc_get_referent_ptrs(), and ensure you still respect the same
alive/strong_count checks and membership tests against collecting and gc_refs.

598-614: TOCTOU between is_finalized read and mark_finalized write — consider a single write-lock check.

The read lock at line 600 and write lock at line 609 are separate acquisitions. While the collecting mutex prevents concurrent GC, untrack_object (which removes from finalized_objects) could run between the two — potentially allowing a re-finalization of an object that was concurrently untracked and re-tracked.

Consolidating into one write-lock acquisition eliminates the window:

Proposed consolidation

-            let already_finalized = if let Ok(finalized) = self.finalized_objects.read() {
-                finalized.contains(&ptr)
-            } else {
-                false
-            };
-
-            if !already_finalized {
-                // Mark as finalized BEFORE calling __del__
-                // This ensures is_finalized() returns True inside __del__
-                if let Ok(mut finalized) = self.finalized_objects.write() {
-                    finalized.insert(ptr);
-                }
+            let newly_inserted = self
+                .finalized_objects
+                .write()
+                .map(|mut set| set.insert(ptr))
+                .unwrap_or(false);
+
+            if newly_inserted {
                 obj_ref.try_call_finalizer();
             }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/gc_state.rs` around lines 598 - 614, Consolidate the separate
read and write lock around finalized_objects into a single write-lock per
object: in the loop over unreachable_refs (using GcObjectPtr), acquire
self.finalized_objects.write() once, check if the pointer is already present,
and if not insert it; then drop the write lock and call
obj_ref.try_call_finalizer() only when you performed the insert. This removes
the TOCTOU window with untrack_object while preserving the behavior that the
object is marked finalized before __del__ runs (is_finalized()/mark_finalized
semantics).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/common/src/refcount.rs`:
- Around line 4-6: The layout comment mentions a "[1 bit: weaked]" flag that has
no corresponding implementation; either remove that token from the comment to
match the current code or add the missing bit definition and accessors. To fix,
decide whether to actually reserve the bit: if unused, update the comment to
drop "weaked" and adjust bit position explanations (remove any reference to
WEAKED/State::weaked); if intended, add a WEAKED constant (e.g., WEAKED_MASK)
and implement State::weaked(&self) -> bool plus setter/clearer logic where State
bits are manipulated (matching how destructed/leaked are handled), and ensure
weak_count/strong_count masks/shift constants are unchanged. Use existing
symbols like State, destructed, leaked, weak_count, strong_count, and follow the
pattern for other flag constants so the comment and code remain consistent.
- Around line 144-153: The dec() method currently does an unconditional
self.state.fetch_sub(COUNT, Ordering::SeqCst) then checks
State::from_raw(...).leaked(), which allows underflow to clear the LEAKED bit;
fix by detecting leaked() after the fetch and immediately restoring the
subtraction (call self.state.fetch_add(COUNT, Ordering::SeqCst)) before
returning false so the LEAKED flag and count aren't corrupted; update dec() to
perform the undo restore when old.leaked() is true and then return false
(optionally note you could replace with a CAS loop to avoid the transient
window).
- Around line 179-193: The deferred-drop thread-local and APIs must be compiled
only when the standard library is available: wrap the thread_local! block
(IN_DEFERRED_CONTEXT and DEFERRED_QUEUE), the DeferredDropGuard struct, and the
functions with_deferred_drops, try_defer_drop, and flush_deferred_drops with
#[cfg(feature = "std")] so they are excluded from no_std builds; ensure the
attribute is applied to each item (or to a surrounding mod) referencing those
symbols so the crate compiles without std just like refcount_overflow() does.
- Around line 99-110: Update the doc comment and safety semantics for pub fn
inc(&self) in refcount.rs: clarify that inc() assumes the caller already holds a
valid strong reference (e.g., used in to_owned()) and is not a resurrection path
— recommend safe_inc() for resurrecting from weak refs; replace the vague
comment "// The previous fetch_add created a permission to run decrement again"
with a brief protocol description (double fetch_add is an optimization to avoid
CAS when caller already had a strong ref and may race with a destructor that is
already in-flight), and make overflow handling consistent with inc_by() by
adding the same strong-count overflow check (use State::strong()/STRONG logic
and call refcount_overflow() on overflow) or, if you intentionally omit overflow
detection, document why it is safe to omit it; reference the symbols inc,
safe_inc, inc_by, State, COUNT, STRONG, destructed, and refcount_overflow in the
change.

In `@crates/vm/src/gc_state.rs`:
- Around line 439-444: The unconditional reset of generations[0].count loses
increments from concurrent track_object calls after gen_locks are dropped;
change the logic in the collection exit paths (where
generations[0].count.store(0, ...) is called) to either (A) switch maybe_collect
to use alloc_count as the authoritative allocation counter and reset alloc_count
there instead of touching generations[0].count, ensuring track_object increments
alloc_count, or (B) recompute gen0's true count after promotions by reading
generation_objects[0] (via its read lock) and store its length into
generations[0].count before releasing locks; update all early-return branches
(the places with generations[0].count.store(0, ...)) to use the chosen approach
so concurrent increments are not lost.
- Around line 734-758: promote_survivors currently removes an object from its
source generation and then silently skips adding it to next_gen if acquiring the
write lock for self.generation_objects[next_gen] fails, orphaning the object;
change the target-generation lock acquisition to propagate a poisoned-lock panic
instead of silently failing by replacing the conditional try-lock for next_gen
with an unconditional write() followed by .expect(...) (e.g. let mut next_set =
self.generation_objects[next_gen].write().expect("poisoned
generation_objects[next_gen] lock");) and then perform next_set.insert(ptr) and
the fetch_add on self.generations[next_gen].count, so that a poisoned lock
triggers a panic and the invariant that every tracked object is in exactly one
generation is preserved.

In `@crates/vm/src/vm/mod.rs`:
- Line 880: The doc comment above the block that clears the `sys` and `builtins`
dicts is mislabeled as "Phase 6" but `finalize_modules` actually treats that
block as Phase 7; update the comment to read "Phase 7" (or match the existing
phase numbering used in `finalize_modules`) so the phase labels are consistent
with the earlier Phase 6 GC collection and the surrounding phase comments.

---

Outside diff comments:
In `@crates/vm/src/vm/interpreter.rs`:
- Around line 386-395: Update the "Finalization steps" doc comment so it matches
the actual shutdown sequence in this file: enumerate the real order (flush
stdout/stderr via flush_std, handle exit exception -> exit code, call
threading._shutdown to wait for threads, set the finalizing flag (suppresses
unraisable exceptions), run atexit exit functions, run GC collect_force (where
the doc previously claimed atexit ran elsewhere), call finalize_modules for
module cleanup, and note any final flush/GC behavior; remove the duplicate
threading._shutdown step and correct the step labels to reference the actual
calls (threading._shutdown, atexit, collect_force, finalize_modules, flush_std)
so the numbered list matches the code flow.

In `@crates/vm/src/vm/mod.rs`:
- Around line 1122-1149: The defer cleanup (scopeguard::defer!) must be created
before running the trace/execution so panics still trigger cleanup; move the
scopeguard::defer! block to just after setting up the
frame/owner/exception/recursion state and before calling
self.trace_event(TraceEvent::Call, None) and f(frame). Ensure TraceEvent::Return
is still emitted on successful result (keep the existing trace_event call after
f(frame)), and preserve the same cleanup actions inside the guard
(frame.owner.store(...), self.exceptions.borrow_mut().stack.pop(),
crate::vm::thread::set_current_frame(old_frame), self.frames.borrow_mut().pop(),
cfg thread pop_thread_frame(), and self.recursion_depth.update(|d| d - 1)) so
any panic during trace_event or f(frame) will run the guard.

---

Nitpick comments:
In `@crates/common/src/refcount.rs`:
- Around line 43-46: The code does unnecessary arithmetic because COUNT is 1;
simplify by removing the redundant / COUNT in strong() and * COUNT in
add_strong(): in the strong() method use ((self.inner & STRONG) as u32) (or
equivalent masking) to extract the strong count directly, and in add_strong()
add the increment directly to the strong bits instead of multiplying by COUNT;
update any related bit-manipulation comments to reflect the simplification and
keep the STRONG and COUNT constants as needed (or remove COUNT if unused).
- Around line 250-252: The #[inline] attribute on flush_deferred_drops is
misleading because the function always allocates a new Vec when draining,
preventing meaningful inlining; remove the #[inline] on the flush_deferred_drops
function or restructure it so the cheap fast-path (e.g., an early is_empty check
on the deferred queue) is performed in a small #[inline] helper that returns
early while the heavier Vec-allocating work remains in a non-inlined function;
locate flush_deferred_drops and either drop the attribute or split out the
empty-queue check into an inline helper to preserve the intended optimization.

In `@crates/vm/src/gc_state.rs`:
- Around line 464-480: The code currently calls unsafe
obj.gc_get_referent_ptrs() twice per object; change to compute and store each
object's adjacency once and reuse it: during the initial loop over collecting
(the block that iterates over collecting, reads obj.gc_get_referent_ptrs(), and
updates gc_refs) build a HashMap<GcObjectPtr, Vec<GcObjectPtr>> (or similar)
keyed by gc_ptr and populate it from obj.gc_get_referent_ptrs(); then in the
later reachability traversal (the code that walks reachable objects) read
referents from that cache instead of calling gc_get_referent_ptrs() again.
Update lookups to use the cache where you currently call gc_get_referent_ptrs(),
and ensure you still respect the same alive/strong_count checks and membership
tests against collecting and gc_refs.
- Around line 598-614: Consolidate the separate read and write lock around
finalized_objects into a single write-lock per object: in the loop over
unreachable_refs (using GcObjectPtr), acquire self.finalized_objects.write()
once, check if the pointer is already present, and if not insert it; then drop
the write lock and call obj_ref.try_call_finalizer() only when you performed the
insert. This removes the TOCTOU window with untrack_object while preserving the
behavior that the object is marked finalized before __del__ runs
(is_finalized()/mark_finalized semantics).

crates/common/src/refcount.rs

coderabbitai · 2026-02-21T10:48:22Z

crates/vm/src/gc_state.rs

+        if collecting.is_empty() {
+            // Reset gen0 count even if nothing to collect
+            self.generations[0].count.store(0, Ordering::SeqCst);
+            self.generations[generation].update_stats(0, 0);
+            return (0, 0);
+        }


⚠️ Potential issue | 🟠 Major

Resetting gen0.count to 0 loses objects tracked during collection.

After dropping gen_locks (line 534), other threads can call track_object, which increments generations[0].count. The unconditional store(0) at line 718 (and early-exit paths at lines 441, 527, 554) discards those increments, causing maybe_collect (line 376) to under-count gen0 and potentially miss triggering the next collection.

Note that alloc_count (line 120) is already incremented in track_object but is never read — it looks like it was intended as a dedicated allocation counter for threshold checking (matching CPython's approach), but maybe_collect uses generations[0].count instead.

Consider either:

Using alloc_count for threshold checking in maybe_collect and resetting it here, or

Recomputing gen0 count from the actual set size after promotion:
let actual = self.generation_objects[0].read().map(|s| s.len()).unwrap_or(0); self.generations[0].count.store(actual, Ordering::SeqCst);

Also applies to: 716-722

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 439 - 444, The unconditional reset of generations[0].count loses increments from concurrent track_object calls after gen_locks are dropped; change the logic in the collection exit paths (where generations[0].count.store(0, ...) is called) to either (A) switch maybe_collect to use alloc_count as the authoritative allocation counter and reset alloc_count there instead of touching generations[0].count, ensuring track_object increments alloc_count, or (B) recompute gen0's true count after promotions by reading generation_objects[0] (via its read lock) and store its length into generations[0].count before releasing locks; update all early-return branches (the places with generations[0].count.store(0, ...)) to use the chosen approach so concurrent increments are not lost.

coderabbitai · 2026-02-21T10:48:22Z

crates/vm/src/gc_state.rs

+            // Remove from current generation
+            for gen_idx in 0..=from_gen {
+                if let Ok(mut gen_set) = self.generation_objects[gen_idx].write()
+                    && gen_set.remove(&ptr)
+                {
+                    // Decrement count for source generation
+                    let count = self.generations[gen_idx].count.load(Ordering::SeqCst);
+                    if count > 0 {
+                        self.generations[gen_idx]
+                            .count
+                            .fetch_sub(1, Ordering::SeqCst);
+                    }
+
+                    // Add to next generation
+                    if let Ok(mut next_set) = self.generation_objects[next_gen].write()
+                        && next_set.insert(ptr)
+                    {
+                        // Increment count for target generation
+                        self.generations[next_gen]
+                            .count
+                            .fetch_add(1, Ordering::SeqCst);
+                    }
+                    break;
+                }
+            }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check how RwLock errors are handled across gc_state.rs rg -n 'if let Ok\(' crates/vm/src/gc_state.rs

Repository: RustPython/RustPython

Length of output: 1394

🏁 Script executed:

# Get the full context of the promote_survivors function sed -n '700,800p' crates/vm/src/gc_state.rs | cat -n

Repository: RustPython/RustPython

Length of output: 4494

🏁 Script executed:

# Check if there are any other error handling patterns or comments about lock poisoning rg -n -i 'poison|panic|expect|unwrap' crates/vm/src/gc_state.rs | head -30

Repository: RustPython/RustPython

Length of output: 142

🏁 Script executed:

# Check the GcState struct definition and comments about the invariant sed -n '1,200p' crates/vm/src/gc_state.rs | head -150

Repository: RustPython/RustPython

Length of output: 5096

🏁 Script executed:

# Look for any cleanup, recovery, or assertion code related to the invariant rg -n 'tracked_objects|generation_objects' crates/vm/src/gc_state.rs | head -40

Repository: RustPython/RustPython

Length of output: 1219

🏁 Script executed:

# Check if there are comments about lock poisoning or error handling strategy rg -B2 -A2 'if let Ok\(mut' crates/vm/src/gc_state.rs | head -60

Repository: RustPython/RustPython

Length of output: 2198

🏁 Script executed:

# Check if there are any panic!() or other operations that could poison locks rg -n 'panic!|unwrap\(\)|expect\(' crates/vm/src/gc_state.rs

Repository: RustPython/RustPython

Length of output: 47

🏁 Script executed:

# Look at the track_object implementation to understand the full flow sed -n '235,260p' crates/vm/src/gc_state.rs | cat -n

Repository: RustPython/RustPython

Length of output: 1142

🏁 Script executed:

# Check if there's any mechanism to repair or validate the invariant between tracked_objects and generation_objects rg -n 'fn.*invariant|fn.*valid|fn.*check' crates/vm/src/gc_state.rs

Repository: RustPython/RustPython

Length of output: 47

🏁 Script executed:

# Check if there could be any panics during GC collection that might poison locks # Look at the collection phase and see if there are any operations that could panic sed -n '600,750p' crates/vm/src/gc_state.rs | cat -n

Repository: RustPython/RustPython

Length of output: 7501

🏁 Script executed:

# Check if PyObject operations (like gc_clear) could panic rg -n 'gc_clear\|gc_has_clear' crates/vm/src/gc_state.rs

Repository: RustPython/RustPython

Length of output: 47

🏁 Script executed:

# Look for how RwLock is used and if there's documentation about error handling strategy rg -B3 -A3 'RwLock' crates/vm/src/gc_state.rs | head -80

Repository: RustPython/RustPython

Length of output: 2130

🏁 Script executed:

# Check if try_call_finalizer could panic/poison locks rg -n 'try_call_finalizer' crates/vm/src/

Repository: RustPython/RustPython

Length of output: 213

🏁 Script executed:

# Verify line numbers in the actual file - check the exact location of promote_survivors sed -n '726,760p' crates/vm/src/gc_state.rs | cat -n

Repository: RustPython/RustPython

Length of output: 1652

promote_survivors silently loses objects if a write-lock acquisition fails.

If the write lock on generation_objects[next_gen] fails (lines 749–750), the object has already been removed from its source generation (lines 737–745) but is never added to the target — it becomes orphaned in tracked_objects with no generation, violating the invariant that every tracked object belongs to exactly one generation. Generational collection will never visit it again.

The if let Ok(mut ...) = ...write() pattern is used throughout this file (17 instances) and silently swallows poisoned-lock errors. While a panic during normal operations is rare, try_call_finalizer() is called during collection and could trigger one, poisoning locks. For promote_survivors the consequence is data loss; for other operations like track_object/untrack_object it causes inconsistent counts.

Use .expect("...") or .unwrap() on the target-generation insertion to propagate the poison panic and preserve the invariant.

Proposed fix

// Add to next generation - if let Ok(mut next_set) = self.generation_objects[next_gen].write() - && next_set.insert(ptr) - { - // Increment count for target generation - self.generations[next_gen] - .count - .fetch_add(1, Ordering::SeqCst); - } + let mut next_set = self.generation_objects[next_gen] + .write() + .expect("failed to acquire lock for next generation"); + if next_set.insert(ptr) { + // Increment count for target generation + self.generations[next_gen] + .count + .fetch_add(1, Ordering::SeqCst); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Remove from current generation

for gen_idx in 0..=from_gen {

if let Ok(mut gen_set) = self.generation_objects[gen_idx].write()

&& gen_set.remove(&ptr)

{

// Decrement count for source generation

let count = self.generations[gen_idx].count.load(Ordering::SeqCst);

if count > 0 {

self.generations[gen_idx]

.count

.fetch_sub(1, Ordering::SeqCst);

}

// Add to next generation

if let Ok(mut next_set) = self.generation_objects[next_gen].write()

&& next_set.insert(ptr)

{

// Increment count for target generation

self.generations[next_gen]

.count

.fetch_add(1, Ordering::SeqCst);

}

break;

}

}

// Remove from current generation

for gen_idx in 0..=from_gen {

if let Ok(mut gen_set) = self.generation_objects[gen_idx].write()

&& gen_set.remove(&ptr)

{

// Decrement count for source generation

let count = self.generations[gen_idx].count.load(Ordering::SeqCst);

if count > 0 {

self.generations[gen_idx]

.count

.fetch_sub(1, Ordering::SeqCst);

}

// Add to next generation

let mut next_set = self.generation_objects[next_gen]

.write()

.expect("failed to acquire lock for next generation");

if next_set.insert(ptr) {

// Increment count for target generation

self.generations[next_gen]

.count

.fetch_add(1, Ordering::SeqCst);

}

break;

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 734 - 758, promote_survivors currently removes an object from its source generation and then silently skips adding it to next_gen if acquiring the write lock for self.generation_objects[next_gen] fails, orphaning the object; change the target-generation lock acquisition to propagate a poisoned-lock panic instead of silently failing by replacing the conditional try-lock for next_gen with an unconditional write() followed by .expect(...) (e.g. let mut next_set = self.generation_objects[next_gen].write().expect("poisoned generation_objects[next_gen] lock");) and then perform next_set.insert(ptr) and the fetch_add on self.generations[next_gen].count, so that a poisoned lock triggers a panic and the invariant that every tracked object is in exactly one generation is preserved.

crates/vm/src/vm/mod.rs

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/vm/src/vm/interpreter.rs (1)

385-395: ⚠️ Potential issue | 🟡 Minor

Finalize step list is out of sync with the new order.

The doc still mentions a second _shutdown() step and a “Final GC pass” after finalize_modules, which no longer happens.

📝 Suggested comment update

-    /// 1. Wait for thread shutdown (call threading._shutdown).
-    /// 1. Set finalizing flag (suppresses unraisable exceptions).
-    /// 1. Call threading._shutdown() to join non-daemon threads.
-    /// 1. Run atexit exit functions.
-    /// 1. GC pass and module cleanup (finalize_modules).
-    /// 1. Final GC pass.
+    /// 1. Wait for thread shutdown (call threading._shutdown).
+    /// 1. Run atexit exit functions.
+    /// 1. Set finalizing flag (suppresses unraisable exceptions).
+    /// 1. GC pass and module cleanup (finalize_modules, which includes the final GC phase).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/interpreter.rs` around lines 385 - 395, Update the
finalization doc comment in vm::interpreter.rs (the block describing the steps
for finalizing the VM—see the comment above the finalize function) to match the
current implementation: remove the duplicated threading._shutdown() step and
remove the “Final GC pass” entry after finalize_modules, and reorder or renumber
the listed steps so they accurately reflect the actual sequence (flush
stdout/stderr; handle exit exception; wait for thread shutdown/set finalizing
flag/call threading._shutdown to join non-daemon threads; run atexit functions;
GC pass and module cleanup via finalize_modules). Ensure the comment references
threading._shutdown and finalize_modules exactly as named.

🧹 Nitpick comments (1)

crates/vm/src/signal.rs (1)

66-75: Pre-existing: shared RefCell borrow held across Python invocation may panic.

signal_handlers is borrowed immutably at line 66 and kept alive through callable.invoke(...) at line 73. If the Python signal handler calls signal.signal() (which would borrow_mut() the same RefCell), the runtime borrow-check will panic.

IN_SIGNAL_HANDLER only prevents trigger_signals re-entry via check_signals; it does not guard against the Python-level signal.signal() call path mutating the RefCell.

Consider dropping the borrow before invoking, or collecting the handlers to invoke into a local Vec first:

♻️ Suggested approach — snapshot handlers before invoking

-    let signal_handlers = vm.signal_handlers.get().unwrap().borrow();
-    for (signum, trigger) in TRIGGERS.iter().enumerate().skip(1) {
-        let triggered = trigger.swap(false, Ordering::Relaxed);
-        if triggered
-            && let Some(handler) = &signal_handlers[signum]
-            && let Some(callable) = handler.to_callable()
-        {
-            callable.invoke((signum, vm.ctx.none()), vm)?;
-        }
-    }
+    let to_invoke: Vec<(usize, _)> = {
+        let signal_handlers = vm.signal_handlers.get().unwrap().borrow();
+        TRIGGERS
+            .iter()
+            .enumerate()
+            .skip(1)
+            .filter_map(|(signum, trigger)| {
+                if trigger.swap(false, Ordering::Relaxed) {
+                    signal_handlers[signum]
+                        .as_ref()
+                        .and_then(|h| h.to_callable())
+                        .map(|c| (signum, c))
+                } else {
+                    None
+                }
+            })
+            .collect()
+    }; // borrow released here
+    for (signum, callable) in to_invoke {
+        callable.invoke((signum, vm.ctx.none()), vm)?;
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/signal.rs` around lines 66 - 75, The current code holds an
immutable RefCell borrow from vm.signal_handlers.get().unwrap().borrow() across
the Python call (callable.invoke), which can panic if the handler calls
signal.signal() (a borrow_mut); fix by snapshotting the handlers to invoke and
dropping the RefCell borrow before any Python invocation: iterate TRIGGERS and
for each triggered entry, read and clone/cloneable-convert the handler reference
(or store indices/signum + handler.to_callable() results) into a local Vec
(e.g., Vec<(usize, Callable)>) while the borrow is held, then drop the borrow
and loop over that Vec to call callable.invoke((signum, vm.ctx.none()), vm) so
no RefCell borrow is held during invocation; reference functions/fields:
vm.signal_handlers.get(), signal_handlers.borrow(), TRIGGERS, trigger.swap,
handler.to_callable(), and callable.invoke.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/common/src/refcount.rs`:
- Around line 120-138: safe_inc currently increments without checking
strong-count overflow and can wrap into weak bits; before attempting the
compare_exchange in safe_inc, detect the overflow the same way inc_by does: read
old via State::from_raw(self.state.load(...)), and if adding 1 would exceed the
maximum strong count (e.g., compare old.strong() against State's max or check
that old.add_strong(1) would overflow/wrap), return false; only proceed to
compute new_state = old.add_strong(1) and call
self.state.compare_exchange(old.as_raw(), new_state.as_raw(), ...) when the
increment is safe and old.destructed() is false. Ensure you reference
State::from_raw, destructed, add_strong, as_raw and the existing
compare_exchange call when making the change.

---

Outside diff comments:
In `@crates/vm/src/vm/interpreter.rs`:
- Around line 385-395: Update the finalization doc comment in vm::interpreter.rs
(the block describing the steps for finalizing the VM—see the comment above the
finalize function) to match the current implementation: remove the duplicated
threading._shutdown() step and remove the “Final GC pass” entry after
finalize_modules, and reorder or renumber the listed steps so they accurately
reflect the actual sequence (flush stdout/stderr; handle exit exception; wait
for thread shutdown/set finalizing flag/call threading._shutdown to join
non-daemon threads; run atexit functions; GC pass and module cleanup via
finalize_modules). Ensure the comment references threading._shutdown and
finalize_modules exactly as named.

---

Duplicate comments:
In `@crates/common/src/refcount.rs`:
- Around line 98-108: inc() must explicitly prevent strong-count overflow and
clarify the safety assumption; replace the current blind fetch_add pattern with
a compare-and-swap loop (e.g., using fetch_update or compare_exchange) on
self.state to read current State::from_raw, verify !val.destructed() and that
val.strong() + COUNT does not exceed the strong-capacity (call
refcount_overflow() if it would), then atomically store the incremented raw
state; also add an explicit comment or debug_assert in inc() that the caller
already holds a strong reference (the function relies on that invariant) and
remove the ambiguous double-fetch_add; keep references to State::from_raw,
COUNT, self.state, destructed(), strong(), and refcount_overflow() so reviewers
can locate the change.

In `@crates/vm/src/gc_state.rs`:
- Around line 733-755: promote_survivors currently removes `ptr` from the source
generation (`generation_objects[gen_idx]`) before trying to acquire the target
generation write lock (`generation_objects[next_gen].write()`), so if the
next-gen lock acquisition fails (poisoned) the object is lost; change the logic
in `promote_survivors` to either acquire the `next_gen` write lock before
removing from the source, or if you must remove first then handle the `Err` from
`generation_objects[next_gen].write()` by reinserting `ptr` into the original
`gen_set` and restoring the corresponding `generations[gen_idx].count` (and
return/continue) so the object is not orphaned; reference the symbols
`promote_survivors`, `generation_objects`, `generations`, `from_gen`,
`next_gen`, `gen_set`, `next_set`, and `ptr` when making the change.
- Around line 439-444: The unconditional self.generations[0].count.store(0,
Ordering::SeqCst) can clobber increments from concurrent track_object() calls;
remove the unconditional reset in early-return paths (e.g., the branch where
collecting.is_empty()) and only adjust the atomic counter when you actually
remove objects — perform an atomic adjustment (use
AtomicUsize::fetch_sub(amount, Ordering::SeqCst) when you know how many were
collected, or AtomicUsize::swap(0, Ordering::SeqCst) to atomically
grab-and-reset) in the collection code paths (referencing generations[0].count,
track_object(), and the methods that call update_stats) so concurrent increments
aren’t lost.

---

Nitpick comments:
In `@crates/vm/src/signal.rs`:
- Around line 66-75: The current code holds an immutable RefCell borrow from
vm.signal_handlers.get().unwrap().borrow() across the Python call
(callable.invoke), which can panic if the handler calls signal.signal() (a
borrow_mut); fix by snapshotting the handlers to invoke and dropping the RefCell
borrow before any Python invocation: iterate TRIGGERS and for each triggered
entry, read and clone/cloneable-convert the handler reference (or store
indices/signum + handler.to_callable() results) into a local Vec (e.g.,
Vec<(usize, Callable)>) while the borrow is held, then drop the borrow and loop
over that Vec to call callable.invoke((signum, vm.ctx.none()), vm) so no RefCell
borrow is held during invocation; reference functions/fields:
vm.signal_handlers.get(), signal_handlers.borrow(), TRIGGERS, trigger.swap,
handler.to_callable(), and callable.invoke.

coderabbitai · 2026-02-21T14:08:19Z

crates/common/src/refcount.rs

    /// Returns true if successful
    #[inline]
    pub fn safe_inc(&self) -> bool {
-        self.strong
-            .fetch_update(AcqRel, Acquire, |prev| (prev != 0).then_some(prev + 1))
-            .is_ok()
+        let mut old = State::from_raw(self.state.load(Ordering::SeqCst));
+        loop {
+            if old.destructed() {
+                return false;
+            }
+            let new_state = old.add_strong(1);
+            match self.state.compare_exchange(
+                old.as_raw(),
+                new_state.as_raw(),
+                Ordering::SeqCst,
+                Ordering::SeqCst,
+            ) {
+                Ok(_) => return true,
+                Err(curr) => old = State::from_raw(curr),
+            }
+        }


⚠️ Potential issue | 🟠 Major

Guard safe_inc() against strong-count overflow.

safe_inc() can overflow into the weak bits when the strong count is at max; align it with inc_by()’s overflow checks.

♻️ Suggested fix

pub fn safe_inc(&self) -> bool { let mut old = State::from_raw(self.state.load(Ordering::SeqCst)); loop { if old.destructed() { return false; } + if (old.strong() as usize) >= STRONG { + refcount_overflow(); + } let new_state = old.add_strong(1); match self.state.compare_exchange( old.as_raw(), new_state.as_raw(), Ordering::SeqCst, Ordering::SeqCst, ) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/common/src/refcount.rs` around lines 120 - 138, safe_inc currently increments without checking strong-count overflow and can wrap into weak bits; before attempting the compare_exchange in safe_inc, detect the overflow the same way inc_by does: read old via State::from_raw(self.state.load(...)), and if adding 1 would exceed the maximum strong count (e.g., compare old.strong() against State's max or check that old.add_strong(1) would overflow/wrap), return false; only proceed to compute new_state = old.add_strong(1) and call self.state.compare_exchange(old.as_raw(), new_state.as_raw(), ...) when the increment is safe and old.destructed() is false. Ensure you reference State::from_raw, destructed, add_strong, as_raw and the existing compare_exchange call when making the change.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

crates/common/src/refcount.rs (1)
93-96: Consider #[must_use] on dec() and safe_inc().

Both return a bool that must be acted upon: ignoring dec()'s return value causes a memory leak (the object is never freed), and ignoring safe_inc()'s false return means proceeding with an invalid reference to a destructed object. Annotating them with #[must_use] surfaces these mistakes at compile time.
♻️ Proposed addition
+    #[must_use]
     pub fn safe_inc(&self) -> bool {

+    /// Decrement strong count. Returns true when count drops to 0.
+    #[must_use]
     pub fn dec(&self) -> bool {
Also applies to: 120-122, 141-143
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/refcount.rs` around lines 93 - 96, Add the #[must_use]
attribute to the reference-counting methods that return a bool so callers cannot
accidentally ignore their result: annotate the dec() and safe_inc() methods (and
any other overloads/variants of these methods in the same module/class) with
#[must_use] to force checking the returned boolean; locate the implementations
by name (dec and safe_inc in the RefCount/State impl) and add the attribute
above their signatures.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/vm/mod.rs`:
- Around line 824-825: The doc comment above the module-dict clearing block is
mislabeled "Phase 4" while finalize_modules runs it as Phase 5; update the
comment to read "Phase 5: Clear module dicts in reverse import order. Skip
builtins and sys — those are cleared last in Phase 7." so the phase numbering
matches finalize_modules/GC sequencing.
- Around line 723-733: Update the doc comment above the method
finalize_clear_module_dicts to correct the phase label from "Phase 4" to "Phase
5" so it matches the sequencing in finalize_modules (where the first GC collect
is Phase 4 and the dict-clearing pass is Phase 5); locate the comment that
currently precedes the finalize_clear_module_dicts function and change the phase
number text only, leaving the rest of the explanation intact.

---

Duplicate comments:
In `@crates/common/src/refcount.rs`:
- Around line 120-139: safe_inc() can overflow the strong count and corrupt WEAK
bits because it computes new_state = old.add_strong(1) without checking limits;
before calling State::add_strong or performing compare_exchange, check that
old.strong() < STRONG (the max strong sentinel) and return false if not, so you
never attempt to increment past the strong limit; update the logic in safe_inc
(referencing safe_inc, State::strong(), State::add_strong(), STRONG and the
compare_exchange on self.state) to perform this guard and only compute/add the
strong count when it's safe.
- Around line 98-109: inc() currently increments the strong count
unconditionally and can overflow into the weak bits; mirror the overflow guard
used in inc_by(): read State::from_raw(self.state.fetch_add(COUNT,
Ordering::SeqCst)) into val, check if (val.strong() as usize) > STRONG - 1 and
call refcount_overflow() before performing the increment (or bail), and keep the
existing destructed() and permission re-grant logic (the second
self.state.fetch_add(COUNT, Ordering::SeqCst)) unchanged so the function
enforces the same strong-count ceiling as inc_by(); reference functions/values:
inc(), inc_by(), State, STRONG, COUNT, refcount_overflow, and
self.state.fetch_add.
- Around line 141-152: The current dec() unconditionally calls
self.state.fetch_sub(COUNT) which can underflow and clear the LEAKED bit;
replace that unconditional subtraction with an atomic CAS loop that first reads
state via self.state.load(Ordering::SeqCst), checks
State::from_raw(raw).leaked() and returns false if leaked, and otherwise
attempts a compare_exchange_weak from raw to raw - COUNT (using
Ordering::SeqCst) until it succeeds, then compute and return whether the
previous strong count was 1 (use State::from_raw(raw).strong() == 1); update
references to self.state.fetch_sub(COUNT) and keep using State::from_raw,
leaked(), and LEAKED in the logic.

In `@crates/vm/src/vm/mod.rs`:
- Line 880: The Phase 7 label comment in vm::mod.rs (the doc/comment "Phase 7:
Clear sys and builtins dicts last.") has been corrected; no code changes are
required—keep the existing comment as-is and approve the change around the Phase
7 section in the VM teardown logic (look for the Phase 7 label near the VM
cleanup functions in vm::mod.rs).

---

Nitpick comments:
In `@crates/common/src/refcount.rs`:
- Around line 93-96: Add the #[must_use] attribute to the reference-counting
methods that return a bool so callers cannot accidentally ignore their result:
annotate the dec() and safe_inc() methods (and any other overloads/variants of
these methods in the same module/class) with #[must_use] to force checking the
returned boolean; locate the implementations by name (dec and safe_inc in the
RefCount/State impl) and add the attribute above their signatures.

crates/vm/src/vm/mod.rs

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

crates/common/src/refcount.rs (2)

258-270: flush_deferred_drops is not panic-safe: remaining ops are silently discarded on panic

Once the queue is drained into ops, any op() that panics unwinds through the for loop — the remaining Box<dyn FnOnce()> items in ops are dropped (not called). Since these closures are GC untrack operations, silently skipping them leaves the GC holding stale pointers to freed objects, which can cause use-after-free in subsequent collections.

♻️ Proposed panic-safe drain using a RAII call-on-drop wrapper

struct CallOnDrop(Option<Box<dyn FnOnce()>>);
impl Drop for CallOnDrop {
    fn drop(&mut self) {
        if let Some(f) = self.0.take() {
            f();
        }
    }
}

pub fn flush_deferred_drops() {
    DEFERRED_QUEUE.with(|q| {
        let ops: Vec<_> = q
            .borrow_mut()
            .drain(..)
            .map(|f| CallOnDrop(Some(f)))
            .collect();
        // Each op runs either in the loop or on drop during panic unwinding
        for mut wrapper in ops {
            if let Some(f) = wrapper.0.take() {
                f();
            }
        }
    });
}

Alternatively, a simpler approach is to catch panics per-op using std::panic::catch_unwind (requires the closures to be UnwindSafe), or to document that callers must ensure deferred closures do not panic.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/refcount.rs` around lines 258 - 270, flush_deferred_drops
currently drains DEFERRED_QUEUE into a Vec and calls each op, but if any op()
panics the remaining closures are dropped and never run (leading to GC stale
pointers); fix by making the drain panic-safe: wrap each Box<dyn FnOnce()> into
an RAII wrapper (e.g., struct CallOnDrop(Option<Box<dyn FnOnce()>>) with a Drop
impl that calls the closure if still present), change the drain to map each
closure into CallOnDrop(Some(f)) and collect that Vec, then iterate the Vec and
take().call each wrapper to run closures normally—any closures not yet taken
will be invoked from CallOnDrop::drop during panic unwinding, ensuring no
deferred drop is silently skipped.

43-45: / COUNT is a no-op — simplify strong()

Since COUNT = 1, the division is always a no-op and only adds noise.

♻️ Proposed simplification

 fn strong(self) -> u32 {
-    ((self.inner & STRONG) / COUNT) as u32
+    (self.inner & STRONG) as u32
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/refcount.rs` around lines 43 - 45, The strong() method
performs an unnecessary division by COUNT; update the strong function (fn
strong(self) -> u32) to compute ((self.inner & STRONG) as u32) (or equivalent
masking and casting) without dividing by COUNT since COUNT == 1, and ensure the
result is still cast to u32; reference the symbols strong, self.inner, STRONG
and COUNT when making the change.

crates/vm/src/object/core.rs (1)

434-452: Missing re-check for existing generic ref under the re-acquired lock

Between releasing the first lock (line 421) and re-acquiring it (line 435), another thread may have allocated and inserted its own generic weakref. The is_generic insertion branch unconditionally overwrites self.generic without re-checking, resulting in two generic weakref nodes in the list. While this is memory-safe (both nodes are cleared when the object dies), it creates a redundant allocation that the two-phase refactor was intended to avoid.

♻️ Proposed fix — re-check under the re-acquired lock

     // Re-acquire lock for linked list insertion
     let _lock = weakref_lock::lock(obj as *const PyObject as usize);
 
+    // Re-check: another thread may have inserted a generic ref while we were
+    // allocating outside the lock. If so, reuse it and let `weak` drop.
+    if is_generic {
+        let generic_ptr = self.generic.load(Ordering::Relaxed);
+        if !generic_ptr.is_null() {
+            let generic = unsafe { &*generic_ptr };
+            if generic.0.ref_count.safe_inc() {
+                return unsafe { PyRef::from_raw(generic_ptr) };
+            }
+        }
+    }
+
     // Insert into linked list under stripe lock

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/object/core.rs` around lines 434 - 452, Between releasing and
re-acquiring the weakref stripe lock (weakref_lock::lock) the code
unconditionally treats the new weakref as generic and overwrites self.generic,
which can allow two generic nodes; after re-acquiring the lock (before setting
self.generic and self.head) re-check whether a generic already exists (inspect
self.generic/head via NonNull::new or equivalent) and if one is present avoid
marking this node as the generic head—insert it as a normal/basic weakref
(update pointers accordingly) or update the existing generic pointer only if
still None; ensure the check uses the same pointer types/ordering as the current
insertion logic (node_ptr, WeakLink::pointers, self.head, self.generic) to
preserve atomic consistency.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/object/core.rs`:
- Around line 128-133: The end() function can underflow the per-thread
DEALLOC_DEPTH by doing d.get() - 1; change the decrement to a safe operation
(e.g., use saturating_sub or checked_sub with a debug_assert) so depth never
wraps to usize::MAX, then set the new depth and return it; reference
DEALLOC_DEPTH and end(), and keep begin() semantics intact while optionally
adding a debug_assert that ensures a matching begin() existed (and/or log an
error) instead of allowing silent release-mode wraparound that trips
TRASHCAN_LIMIT logic.

---

Duplicate comments:
In `@crates/common/src/refcount.rs`:
- Around line 148-157: The dec() method unconditionally calls
self.state.fetch_sub(COUNT, Ordering::SeqCst) before checking
State::from_raw(...).leaked(), which can underflow the packed counter for LEAKED
states; restore the prior value if leaked() is true by re-adding COUNT (e.g.
call self.state.fetch_add(COUNT, Ordering::SeqCst)) and return false, so update
dec() to fetch_sub, compute old = State::from_raw(...), check old.leaked() and
if true call fetch_add(COUNT) to undo the subtraction and return false; keep the
final return old.strong() == 1 as-is.
- Around line 98-109: The inc() implementation must add the same strong-count
overflow guard used by inc_by(): after computing val =
State::from_raw(self.state.fetch_add(COUNT, ...)) check whether (val.strong() as
usize) > STRONG - 1 (i.e. would overflow by adding 1) and call
refcount_overflow() if so, then preserve the existing destructed() check and the
conditional extra fetch_add for when val.strong() == 0; update function inc() to
mirror inc_by()’s overflow logic using the same COUNT, STRONG and State symbols.

In `@crates/vm/src/vm/mod.rs`:
- Around line 880-884: The phase label in the finalize_clear_sys_builtins_dict
function comment has been corrected to "Phase 7" already; no code changes are
required—leave the comment and the function
Self::module_clear_dict(&self.sys_module.dict(), self) and
Self::module_clear_dict(&self.builtins.dict(), self) as-is in
finalize_clear_sys_builtins_dict.

---

Nitpick comments:
In `@crates/common/src/refcount.rs`:
- Around line 258-270: flush_deferred_drops currently drains DEFERRED_QUEUE into
a Vec and calls each op, but if any op() panics the remaining closures are
dropped and never run (leading to GC stale pointers); fix by making the drain
panic-safe: wrap each Box<dyn FnOnce()> into an RAII wrapper (e.g., struct
CallOnDrop(Option<Box<dyn FnOnce()>>) with a Drop impl that calls the closure if
still present), change the drain to map each closure into CallOnDrop(Some(f))
and collect that Vec, then iterate the Vec and take().call each wrapper to run
closures normally—any closures not yet taken will be invoked from
CallOnDrop::drop during panic unwinding, ensuring no deferred drop is silently
skipped.
- Around line 43-45: The strong() method performs an unnecessary division by
COUNT; update the strong function (fn strong(self) -> u32) to compute
((self.inner & STRONG) as u32) (or equivalent masking and casting) without
dividing by COUNT since COUNT == 1, and ensure the result is still cast to u32;
reference the symbols strong, self.inner, STRONG and COUNT when making the
change.

In `@crates/vm/src/object/core.rs`:
- Around line 434-452: Between releasing and re-acquiring the weakref stripe
lock (weakref_lock::lock) the code unconditionally treats the new weakref as
generic and overwrites self.generic, which can allow two generic nodes; after
re-acquiring the lock (before setting self.generic and self.head) re-check
whether a generic already exists (inspect self.generic/head via NonNull::new or
equivalent) and if one is present avoid marking this node as the generic
head—insert it as a normal/basic weakref (update pointers accordingly) or update
the existing generic pointer only if still None; ensure the check uses the same
pointer types/ordering as the current insertion logic (node_ptr,
WeakLink::pointers, self.head, self.generic) to preserve atomic consistency.

crates/vm/src/object/core.rs

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

crates/common/src/refcount.rs (2)
94-96: get() uses SeqCst while is_leaked() uses Acquire — inconsistent ordering for reads.

Both are simple flag/count reads with no dependent write. Acquire (or even Relaxed for a non-synchronising diagnostic read like get()) is sufficient and consistent with is_leaked().
♻️ Suggested change
 pub fn get(&self) -> usize {
-    State::from_raw(self.state.load(Ordering::SeqCst)).strong() as usize
+    State::from_raw(self.state.load(Ordering::Acquire)).strong() as usize
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/refcount.rs` around lines 94 - 96, The get() method
currently uses Ordering::SeqCst for the atomic load while is_leaked() uses
Ordering::Acquire, creating inconsistent read orderings; change the ordering in
get() (the call to self.state.load and subsequent State::from_raw(...).strong())
to use the same weaker ordering as is_leaked() (e.g., Ordering::Acquire, or
Ordering::Relaxed if this is purely a diagnostic read) so both reads are
consistent.
43-45: / COUNT division in State::strong() is a no-op.

Since COUNT = 1, (self.inner & STRONG) / COUNT is always identical to self.inner & STRONG. The division adds no value and slightly obscures intent.
♻️ Suggested simplification
 fn strong(self) -> u32 {
-    ((self.inner & STRONG) / COUNT) as u32
+    (self.inner & STRONG) as u32
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/refcount.rs` around lines 43 - 45, The State::strong()
method performs a redundant division by COUNT (which equals 1) making
"((self.inner & STRONG) / COUNT) as u32" a no-op; simplify the expression by
removing the "/ COUNT" so the method returns "(self.inner & STRONG) as u32"
(referencing State::strong, the inner field, STRONG and COUNT) to make the
intent clearer.
crates/vm/src/object/core.rs (1)
157-166: Deferred objects invoke drop_slow_inner twice.

When trashcan::begin returns false, drop_slow_inner (finalizer + weakref clearing) has already run. The second default_dealloc invocation, issued from trashcan::end's drain loop, calls drop_slow_inner again. The double-call is safe — gc_finalized() prevents __del__ re-entry and wrl.clear is idempotent on an already-cleared list — but it is redundant work.

Consider moving trashcan::begin ahead of drop_slow_inner so that deferred objects are queued before any finalization occurs, and the finalizer runs exactly once when the item is eventually processed from the queue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/object/core.rs` around lines 157 - 166, Move the trashcan check
before running finalizers: call trashcan::begin(obj, default_dealloc::<T>) at
the top of default_dealloc and if it returns false, return immediately (so the
object is queued) instead of running drop_slow_inner; only when trashcan::begin
returns true should you call drop_slow_inner and later call trashcan::end (so
objects deferred by trashcan::begin are queued before any finalizer/weakref work
and drop_slow_inner is executed exactly once when processed).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/common/src/refcount.rs`:
- Around line 208-219: The Drop impl for DeferredDropGuard currently always
calls flush_deferred_drops() which can panic during unwinding; update
DeferredDropGuard::drop to check std::panic::panicking() (or equivalent) after
restoring IN_DEFERRED_CONTEXT.with(|in_ctx| in_ctx.set(self.was_in_context));
only call flush_deferred_drops() when !self.was_in_context &&
!std::panic::panicking() so flush is skipped during unwinding; keep restoring
IN_DEFERRED_CONTEXT unconditionally and do not change other logic.
- Around line 262-270: flush_deferred_drops currently drains DEFERRED_QUEUE and
calls each op, but if any op panics the rest are lost and callers can invoke it
while IN_DEFERRED_CONTEXT is true; update flush_deferred_drops to first assert
(or debug_assert) that IN_DEFERRED_CONTEXT is false to prevent reentrancy, then
take the ops vec and execute each op inside std::panic::catch_unwind so a panic
from one op does not skip others, and if a panic occurs re-push any remaining
ops back into DEFERRED_QUEUE (using DEFERRED_QUEUE.with) before resuming the
panic unwind; reference flush_deferred_drops, DEFERRED_QUEUE,
IN_DEFERRED_CONTEXT and ensure any untrack_object semantics are preserved by
requeuing unexecuted closures.

In `@crates/vm/src/object/core.rs`:
- Around line 441-449: In the re-check path where you return a resurrected
generic (the block using is_generic, generic_ptr, generic and calling
generic.0.ref_count.safe_inc()), clear the weak's wr_object before the early
return so drop of the local weak won't call unlink_weakref on an uninserted
node; i.e., set the weak/inner wr_object to null (use the same atomic store
semantics as other places, e.g., Ordering::Relaxed) on generic.0 (the PyWeak
inner) immediately prior to returning PyRef::from_raw(generic_ptr).

---

Duplicate comments:
In `@crates/common/src/refcount.rs`:
- Around line 148-157: The dec() implementation unconditionally does
self.state.fetch_sub(COUNT, Ordering::SeqCst), which can underflow and clear the
LEAKED bit; change dec() (and the fetch_sub usage) to use a CAS loop that loads
State::from_raw(self.state.load(...)), checks state.leaked() and if leaked
returns false without performing the subtraction, otherwise attempts a
compare_exchange to subtract COUNT (or retry on failure); ensure the code
references the COUNT constant and uses State::from_raw and leaked() to detect
and skip mutation when LEAKED is set so the LEAKED bit is never cleared by
underflow.

In `@crates/vm/src/object/core.rs`:
- Around line 128-135: The current debug_assert in end() (inside
DEALLOC_DEPTH.with in pub(super) unsafe fn end) prevents usize underflow only in
debug builds; change it to a runtime check so release builds can't underflow:
read d.get() into a local, if it's zero call panic! with "trashcan::end called
without matching begin" (or use checked_sub(1).unwrap_or_else(|| panic!(...)))
before calling d.set and returning the decremented depth. Ensure the check and
decrement happen inside the same DEALLOC_DEPTH.with closure so you update the
same thread-local value.

---

Nitpick comments:
In `@crates/common/src/refcount.rs`:
- Around line 94-96: The get() method currently uses Ordering::SeqCst for the
atomic load while is_leaked() uses Ordering::Acquire, creating inconsistent read
orderings; change the ordering in get() (the call to self.state.load and
subsequent State::from_raw(...).strong()) to use the same weaker ordering as
is_leaked() (e.g., Ordering::Acquire, or Ordering::Relaxed if this is purely a
diagnostic read) so both reads are consistent.
- Around line 43-45: The State::strong() method performs a redundant division by
COUNT (which equals 1) making "((self.inner & STRONG) / COUNT) as u32" a no-op;
simplify the expression by removing the "/ COUNT" so the method returns
"(self.inner & STRONG) as u32" (referencing State::strong, the inner field,
STRONG and COUNT) to make the intent clearer.

In `@crates/vm/src/object/core.rs`:
- Around line 157-166: Move the trashcan check before running finalizers: call
trashcan::begin(obj, default_dealloc::<T>) at the top of default_dealloc and if
it returns false, return immediately (so the object is queued) instead of
running drop_slow_inner; only when trashcan::begin returns true should you call
drop_slow_inner and later call trashcan::end (so objects deferred by
trashcan::begin are queued before any finalizer/weakref work and drop_slow_inner
is executed exactly once when processed).

coderabbitai · 2026-02-22T08:03:50Z

crates/common/src/refcount.rs

+#[cfg(feature = "std")]
+impl Drop for DeferredDropGuard {
+    fn drop(&mut self) {
+        IN_DEFERRED_CONTEXT.with(|in_ctx| {
+            in_ctx.set(self.was_in_context);
+        });
+        // Only flush if we're the outermost context
+        if !self.was_in_context {
+            flush_deferred_drops();
+        }
+    }
+}


⚠️ Potential issue | 🟡 Minor

flush_deferred_drops() called from drop() during unwinding risks double-panic → process abort.

If the closure passed to with_deferred_drops panics, DeferredDropGuard::drop runs during stack unwinding and unconditionally calls flush_deferred_drops(). Should any queued operation also panic, Rust's "panic while panicking" rule causes an unconditional process abort, losing all panic context.

🛡️ Proposed fix — skip flush during unwinding

impl Drop for DeferredDropGuard { fn drop(&mut self) { IN_DEFERRED_CONTEXT.with(|in_ctx| { in_ctx.set(self.was_in_context); }); // Only flush if we're the outermost context - if !self.was_in_context { + if !self.was_in_context && !std::thread::panicking() { flush_deferred_drops(); } } }

Skipping the flush during unwinding means deferred untrack operations will be lost on panic, which is acceptable given GC state is already inconsistent in a panic scenario.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/common/src/refcount.rs` around lines 208 - 219, The Drop impl for DeferredDropGuard currently always calls flush_deferred_drops() which can panic during unwinding; update DeferredDropGuard::drop to check std::panic::panicking() (or equivalent) after restoring IN_DEFERRED_CONTEXT.with(|in_ctx| in_ctx.set(self.was_in_context)); only call flush_deferred_drops() when !self.was_in_context && !std::panic::panicking() so flush is skipped during unwinding; keep restoring IN_DEFERRED_CONTEXT unconditionally and do not change other logic.

coderabbitai · 2026-02-22T08:03:50Z

crates/common/src/refcount.rs

+pub fn flush_deferred_drops() {
+    DEFERRED_QUEUE.with(|q| {
+        // Take all queued operations
+        let ops: Vec<_> = q.borrow_mut().drain(..).collect();
+        // Execute them outside the borrow
+        for op in ops {
+            op();
+        }
+    });


⚠️ Potential issue | 🟡 Minor

flush_deferred_drops silently skips remaining ops if one panics, and is unsafe to call from within an active deferred context.

Two concerns:

Incomplete flush on panic: If op() at index i panics, the remaining ops in the local ops vec are dropped (not called). Any untrack_object() calls they contained will be silently lost, leaving the GC tracking table in an inconsistent state for the duration of the thread's life.

Callable within a deferred context: Since flush_deferred_drops is pub, a caller can invoke it while IN_DEFERRED_CONTEXT is true, prematurely draining the queue and defeating the deferral guarantee. Consider a debug_assert guard.

🛡️ Suggested hardening

pub fn flush_deferred_drops() { + debug_assert!( + !IN_DEFERRED_CONTEXT.with(|c| c.get()), + "flush_deferred_drops called from within an active deferred context" + ); DEFERRED_QUEUE.with(|q| { // Take all queued operations let ops: Vec<_> = q.borrow_mut().drain(..).collect(); // Execute them outside the borrow for op in ops { op(); } }); }

For the panic-completeness concern, consider iterating with a RAII sentinel that re-pushes un-executed ops back to the queue on drop, though this adds complexity that may not be warranted for a GC untrack path.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/common/src/refcount.rs` around lines 262 - 270, flush_deferred_drops currently drains DEFERRED_QUEUE and calls each op, but if any op panics the rest are lost and callers can invoke it while IN_DEFERRED_CONTEXT is true; update flush_deferred_drops to first assert (or debug_assert) that IN_DEFERRED_CONTEXT is false to prevent reentrancy, then take the ops vec and execute each op inside std::panic::catch_unwind so a panic from one op does not skip others, and if a panic occurs re-push any remaining ops back into DEFERRED_QUEUE (using DEFERRED_QUEUE.with) before resuming the panic unwind; reference flush_deferred_drops, DEFERRED_QUEUE, IN_DEFERRED_CONTEXT and ensure any untrack_object semantics are preserved by requeuing unexecuted closures.

coderabbitai · 2026-02-22T08:03:50Z

crates/vm/src/object/core.rs

+        if is_generic {
+            let generic_ptr = self.generic.load(Ordering::Relaxed);
+            if !generic_ptr.is_null() {
+                let generic = unsafe { &*generic_ptr };
+                if generic.0.ref_count.safe_inc() {
+                    return unsafe { PyRef::from_raw(generic_ptr) };
+                }
+            }
+        }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find the linked_list module and inspect Pointers::new() fd "linked_list" -t f -e rs

Repository: RustPython/RustPython

Length of output: 97

🏁 Script executed:

# Read the actual code at the specified location in core.rs cat -n crates/vm/src/object/core.rs | sed -n '410,450p'

Repository: RustPython/RustPython

Length of output: 2089

🏁 Script executed:

# Get more context on the WeakRefList::add method to understand the full flow cat -n crates/vm/src/object/core.rs | sed -n '400,460p'

Repository: RustPython/RustPython

Length of output: 3038

🏁 Script executed:

# Read the Pointers struct and new() implementation in linked_list.rs cat -n crates/common/src/linked_list.rs | head -100

Repository: RustPython/RustPython

Length of output: 4419

🏁 Script executed:

# Search for Pointers::new() definition rg "impl.*Pointers|fn new\(\)" crates/common/src/linked_list.rs -A 5

Repository: RustPython/RustPython

Length of output: 1033

🏁 Script executed:

# Search for unlink_weakref function in core.rs to verify the corruption path rg "fn unlink_weakref" crates/vm/src/object/core.rs -A 20

Repository: RustPython/RustPython

Length of output: 770

Dropping the uninserted weak in the re-check path corrupts wrl.head and causes a subsequent use-after-free.

Rust drops locals in reverse declaration order. _lock (line 437) is declared after weak (line 434), so on the return at line 446 the drop sequence is:

_lock dropped → stripe lock released.

weak dropped → PyWeak::drop_inner runs → acquires the stripe lock (succeeds; _lock already gone) → calls unlink_weakref(wrl, node_ptr).

weak was never inserted into the list, so its Pointers fields are in their Pointers::new() initial state (both prev and next are None). unlink_weakref interprets prev == None as "this node is the head" and executes:

wrl.head.store(ptr::null_mut(), Ordering::Relaxed); // ← corrupts the list

This wipes wrl.head to null while the real list nodes are still alive. When obj is eventually deallocated, wrl.clear iterates from head == null and skips every existing weakref. Those weakrefs retain a non-null wr_object pointing to the now-freed obj memory. Their eventual drop_inner then acquires a stripe lock based on a dangling pointer and calls unlink_weakref on already-freed memory → use-after-free.

Fix: before the early return, null out wr_object so drop_inner exits at its first guard check:

🛡️ Proposed fix

if is_generic { let generic_ptr = self.generic.load(Ordering::Relaxed); if !generic_ptr.is_null() { let generic = unsafe { &*generic_ptr }; if generic.0.ref_count.safe_inc() { + // `weak` was never inserted; nullify wr_object so drop_inner + // returns early and does not call unlink_weakref. + weak.wr_object.store(ptr::null_mut(), Ordering::Relaxed); return unsafe { PyRef::from_raw(generic_ptr) }; } } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/object/core.rs` around lines 441 - 449, In the re-check path where you return a resurrected generic (the block using is_generic, generic_ptr, generic and calling generic.0.ref_count.safe_inc()), clear the weak's wr_object before the early return so drop of the local weak won't call unlink_weakref on an uninserted node; i.e., set the weak/inner wr_object to null (use the same atomic store semantics as other places, e.g., Ordering::Relaxed) on generic.0 (the PyWeak inner) immediately prior to returning PyRef::from_raw(generic_ptr).

…n debug_assert, weakref generic re-check

github-actions · 2026-02-22T11:00:06Z

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] lib: cpython/Lib/asyncio
[ ] test: cpython/Lib/test/test_asyncio (TODO: 38)

dependencies:

asyncio (native: _overlapped, _pyrepl.console, _remote_debugging, _winapi, asyncio.tools, collections.abc, concurrent.futures, errno, itertools, math, msvcrt, sys, time)
- collections (native: _weakref, itertools, sys)
- logging (native: atexit, errno, logging.handlers, sys, time)
- site (native: _io, builtins, errno, sys)
- subprocess (native: builtins, errno, sys, time)
- tempfile (native: _thread, errno, sys)
- _colorize, argparse, ast, contextlib, contextvars, dataclasses, enum, functools, heapq, inspect, io, linecache, os, reprlib, selectors, signal, socket, stat, struct, threading, traceback, types, warnings, weakref

dependent tests: (7 tests)

asyncio: test_asyncio test_contextlib_async test_inspect test_logging test_os test_sys_settrace test_unittest

[ ] test: cpython/Lib/test/test_descr.py (TODO: 47)
[ ] test: cpython/Lib/test/test_descrtut.py (TODO: 3)

dependencies:

dependent tests: (no tests depend on descr)

[x] test: cpython/Lib/test/test_faulthandler.py (TODO: 4)

dependencies:

dependent tests: (1 tests)

faulthandler: test_faulthandler

[x] lib: cpython/Lib/functools.py
[ ] test: cpython/Lib/test/test_functools.py (TODO: 9)

dependencies:

functools

dependent tests: (39 tests)

functools: test_annotationlib test_argparse test_asyncio test_builtin test_bytes test_context test_ctypes test_datetime test_dbm_dumb test_dbm_sqlite3 test_dis test_doctest test_fileio test_fractions test_functools test_genericalias test_gettext test_gzip test_hmac test_inspect test_iter test_itertools test_math_property test_posixpath test_py_compile test_random test_shutil test_signal test_sort test_sqlite3 test_ssl test_sys_settrace test_timeout test_traceback test_typing test_unittest test_winsound test_xml_etree test_zoneinfo

[x] lib: cpython/Lib/inspect.py
[ ] test: cpython/Lib/test/test_inspect (TODO: 46)

dependencies:

inspect

dependent tests: (45 tests)

inspect: test_abc test_argparse test_asyncgen test_buffer test_builtin test_code test_collections test_coroutines test_decimal test_enum test_functools test_generators test_grammar test_inspect test_ntpath test_operator test_patma test_posixpath test_pydoc test_signal test_traceback test_type_annotations test_types test_typing test_unittest test_yield_from test_zipimport
- asyncio: test_asyncio test_contextlib_async test_logging test_os test_sys_settrace test_unittest
- bdb: test_bdb
- dataclasses: test__colorize test_ctypes test_genericalias test_pprint test_regrtest test_zoneinfo
- importlib.metadata: test_importlib
- pydoc:
  - xmlrpc.server: test_docxmlrpc test_xmlrpc
- rlcompleter: test_rlcompleter
- trace: test_trace

[x] lib: cpython/Lib/io.py
[x] lib: cpython/Lib/_pyio.py
[ ] test: cpython/Lib/test/test_io.py (TODO: 21)
[x] test: cpython/Lib/test/test_bufio.py
[x] test: cpython/Lib/test/test_fileio.py (TODO: 1)
[ ] test: cpython/Lib/test/test_memoryio.py (TODO: 28)

dependencies:

io

dependent tests: (89 tests)

io: test__colorize test_android test_argparse test_ast test_asyncio test_bufio test_builtin test_bz2 test_calendar test_cmd test_cmd_line_script test_codecs test_compileall test_compiler_assemble test_concurrent_futures test_configparser test_contextlib test_csv test_dbm_dumb test_dis test_email test_enum test_file test_fileinput test_fileio test_ftplib test_getpass test_gzip test_hashlib test_httplib test_httpservers test_importlib test_inspect test_io test_json test_largefile test_logging test_lzma test_mailbox test_marshal test_memoryio test_memoryview test_mimetypes test_optparse test_pathlib test_pickle test_pickletools test_plistlib test_pprint test_print test_pty test_pulldom test_pydoc test_pyexpat test_regrtest test_robotparser test_shlex test_shutil test_site test_smtplib test_socket test_socketserver test_subprocess test_support test_sys test_tarfile test_tempfile test_threadedtempfile test_timeit test_tokenize test_traceback test_typing test_unittest test_univnewlines test_urllib test_urllib2 test_uuid test_wave test_winconsoleio test_wsgiref test_xml_dom_xmlbuilder test_xml_etree test_xml_etree_c test_xmlrpc test_zipapp test_zipfile test_zipimport test_zoneinfo test_zstd

[ ] lib: cpython/Lib/logging
[ ] test: cpython/Lib/test/test_logging.py (TODO: 7)

dependencies:

logging

dependent tests: (10 tests)

logging: test_asyncio test_concurrent_futures test_decimal test_genericalias test_hashlib test_logging test_support test_unittest
- concurrent.futures._base: test_concurrent_futures
- venv: test_venv

[ ] lib: cpython/Lib/collections
[x] lib: cpython/Lib/_collections_abc.py
[x] test: cpython/Lib/test/test_collections.py (TODO: 3)
[x] test: cpython/Lib/test/test_deque.py (TODO: 3)
[x] test: cpython/Lib/test/test_defaultdict.py (TODO: 1)
[x] test: cpython/Lib/test/test_ordered_dict.py (TODO: 8)

dependencies:

collections

dependent tests: (196 tests)

[ ] lib: cpython/Lib/subprocess.py
[x] test: cpython/Lib/test/test_subprocess.py (TODO: 4)

dependencies:

subprocess

dependent tests: (53 tests)

subprocess: test_android test_asyncio test_audit test_bz2 test_c_locale_coercion test_cmd_line test_cmd_line_script test_ctypes test_dtrace test_faulthandler test_file_eintr test_gzip test_inspect test_json test_launcher test_msvcrt test_ntpath test_os test_osx_env test_platform test_plistlib test_poll test_py_compile test_regrtest test_repl test_runpy test_script_helper test_select test_shutil test_signal test_site test_sqlite3 test_subprocess test_support test_sys test_sysconfig test_tempfile test_threading test_unittest test_urllib2 test_utf8_mode test_venv test_wait3 test_webbrowser test_zipfile
- asyncio: test_asyncio test_contextlib_async test_logging test_sys_settrace test_unittest
- ctypes.util: test_ctypes
- ensurepip: test_ensurepip
- multiprocessing.util: test_concurrent_futures

[x] lib: cpython/Lib/threading.py
[x] lib: cpython/Lib/_threading_local.py
[ ] test: cpython/Lib/test/test_threading.py (TODO: 18)
[ ] test: cpython/Lib/test/test_threadedtempfile.py
[ ] test: cpython/Lib/test/test_threading_local.py (TODO: 3)

dependencies:

threading

dependent tests: (113 tests)

[x] lib: cpython/Lib/traceback.py
[ ] test: cpython/Lib/test/test_traceback.py (TODO: 34)

dependencies:

traceback

dependent tests: (35 tests)

traceback: test_asyncio test_builtin test_code_module test_contextlib test_contextlib_async test_coroutines test_dictcomps test_importlib test_iter test_listcomps test_pyexpat test_setcomps test_socket test_ssl test_subprocess test_threadedtempfile test_threading test_traceback test_unittest test_with test_zipimport
- code:
  - sqlite3.main: test_sqlite3
- concurrent.futures.process: test_concurrent_futures
- logging.config: test_logging
- py_compile: test_argparse test_cmd_line_script test_compileall test_importlib test_multiprocessing_main_handling test_py_compile test_pydoc test_runpy
- pydoc: test_enum
  - xmlrpc.server: test_docxmlrpc test_xmlrpc

[x] lib: cpython/Lib/warnings.py
[x] lib: cpython/Lib/_py_warnings.py
[ ] test: cpython/Lib/test/test_warnings (TODO: 12)

dependencies:

warnings

dependent tests: (49 tests)

warnings: test_argparse test_array test_asyncio test_buffer test_builtin test_codecs test_codeop test_coroutines test_ctypes test_decimal test_descr test_fnmatch test_fstring test_funcattrs test_genericpath test_glob test_global test_grammar test_gzip test_hashlib test_hmac test_importlib test_inspect test_io test_logging test_ntpath test_os test_pickle test_posix test_pty test_pyclbr test_random test_re test_runpy test_set test_socket test_sqlite3 test_str test_string_literals test_support test_sys test_tempfile test_threading test_typing test_unicode_file_functions test_unittest test_unparse test_xml_etree test_zipimport

[x] lib: cpython/Lib/weakref.py
[x] lib: cpython/Lib/_weakrefset.py
[x] test: cpython/Lib/test/test_weakref.py (TODO: 20)
[ ] test: cpython/Lib/test/test_weakset.py

dependencies:

weakref

dependent tests: (158 tests)

Legend:

[+] path exists in CPython
[x] up-to-date, [ ] outdated

youknowone force-pushed the gc-infra branch from f0f3874 to 6023e74 Compare February 21, 2026 10:34

youknowone marked this pull request as ready for review February 21, 2026 10:36

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

youknowone force-pushed the gc-infra branch from 6023e74 to a485341 Compare February 21, 2026 13:56

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

crates/vm/src/vm/mod.rs Show resolved Hide resolved

crates/vm/src/vm/mod.rs Outdated Show resolved Hide resolved

youknowone force-pushed the gc-infra branch from 80e89eb to 0ba3a55 Compare February 22, 2026 06:41

coderabbitai bot reviewed Feb 22, 2026

View reviewed changes

crates/vm/src/object/core.rs Show resolved Hide resolved

coderabbitai bot reviewed Feb 22, 2026

View reviewed changes

youknowone force-pushed the gc-infra branch from 1050ca4 to 797474d Compare February 22, 2026 09:48

youknowone added 4 commits February 22, 2026 19:56

mark poluting tests

a49a732

GC-infra independent to EBR

df8b67d

trashcan

d479d00

add overflow guard to inc(), #[must_use] on dec()/safe_inc(), trashca…

be29d7c

…n debug_assert, weakref generic re-check

youknowone force-pushed the gc-infra branch from 797474d to be29d7c Compare February 22, 2026 10:56

youknowone merged commit 6950baf into RustPython:main Feb 22, 2026
14 checks passed

youknowone deleted the gc-infra branch February 22, 2026 12:31

Comments

Conversation

youknowone commented Feb 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 22, 2026

📦 Library Dependencies

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

youknowone commented Feb 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading