Add DeleteRange Support with Range Tombstones#99
Add DeleteRange Support with Range Tombstones#99feichai0017 merged 7 commits intofeichai0017:mainfrom
Conversation
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Adds DeleteRange support to NoKV by representing range deletions as LSM-level “range tombstone” entries and teaching the read path, iterators, and compaction to honor and eventually drop them.
Changes:
- Introduces
DB.DeleteRange/DB.DeleteRangeCFAPIs that write range tombstones using a newBitRangeDeleteentry meta flag. - Updates read path (
Get+DBIterator) to hide range tombstone records and filter keys covered by range tombstones. - Updates compaction to track range tombstones and drop covered keys (and drop tombstones at the max level), plus adds dedicated tests for correctness and WAL recovery.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
db.go |
Adds DeleteRange APIs, range-tombstone coverage checks on Get, and ensures range tombstones stay inline in LSM (not vlog). |
iterator.go |
Filters out range tombstone entries and skips keys covered by range tombstones during iteration. |
lsm/executor.go |
Applies range tombstones during compaction to drop covered keys and drop tombstones at max level. |
kv/const.go |
Adds BitRangeDelete meta flag. |
kv/entry.go |
Adds helpers IsRangeDelete, RangeEnd, and KeyInRange. |
db_deleterange_test.go |
Adds tests for core semantics, validation, CF isolation, compaction interaction, and WAL recovery. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lsm/executor.go
Outdated
| var rangeTombstones []*kv.Entry | ||
| isMaxLevel := cd.nextLevel != nil && cd.nextLevel.levelNum == lm.opt.MaxLevelNum | ||
|
|
||
| for ; it.Valid(); it.Next() { | ||
| key := it.Item().Entry().Key | ||
| //version := kv.ParseTs(key) | ||
| isExpired := IsDeletedOrExpired(it.Item().Entry()) | ||
| entry := it.Item().Entry() | ||
| key := entry.Key | ||
| isExpired := IsDeletedOrExpired(entry) | ||
|
|
||
| if entry.IsRangeDelete() { | ||
| if isMaxLevel { | ||
| updateStats(entry) | ||
| continue | ||
| } | ||
| rangeTombstones = append(rangeTombstones, entry) | ||
| } |
There was a problem hiding this comment.
rangeTombstones stores *kv.Entry pointers returned by the iterator. Table/block iterators reuse a single kv.Entry instance as they advance, so the pointers captured here can later point to mutated data, breaking coverage checks. Store a stable copy of the tombstone’s start/end (e.g., copy key/value bytes into a dedicated struct) instead of keeping the iterator-owned entry pointer.
lsm/executor.go
Outdated
| _, userKey, version := kv.SplitInternalKey(key) | ||
| for _, rt := range rangeTombstones { | ||
| _, rtStart, rtVersion := kv.SplitInternalKey(rt.Key) | ||
| if rtVersion >= version && kv.KeyInRange(userKey, rtStart, rt.RangeEnd()) { |
There was a problem hiding this comment.
Coverage filtering ignores column family: SplitInternalKey results for both the current key and the range tombstone discard the returned CF, so a tombstone from one CF could incorrectly discard keys from another CF during compaction. Include CF equality in the predicate before applying KeyInRange.
| _, userKey, version := kv.SplitInternalKey(key) | |
| for _, rt := range rangeTombstones { | |
| _, rtStart, rtVersion := kv.SplitInternalKey(rt.Key) | |
| if rtVersion >= version && kv.KeyInRange(userKey, rtStart, rt.RangeEnd()) { | |
| cf, userKey, version := kv.SplitInternalKey(key) | |
| for _, rt := range rangeTombstones { | |
| rtCF, rtStart, rtVersion := kv.SplitInternalKey(rt.Key) | |
| if cf == rtCF && rtVersion >= version && kv.KeyInRange(userKey, rtStart, rt.RangeEnd()) { |
db.go
Outdated
| func (db *DB) isKeyCoveredByRangeTombstone(cf kv.ColumnFamily, userKey []byte, version uint64) bool { | ||
| opt := &utils.Options{IsAsc: true} | ||
| iters := db.lsm.NewIterators(opt) | ||
| for _, it := range iters { | ||
| if it == nil { | ||
| continue | ||
| } | ||
| it.Rewind() | ||
| for it.Valid() { | ||
| item := it.Item() | ||
| if item == nil { | ||
| it.Next() | ||
| continue | ||
| } | ||
| e := item.Entry() | ||
| if e == nil || !e.IsRangeDelete() { | ||
| it.Next() | ||
| continue | ||
| } | ||
| rangeCF, rangeStart, rangeVersion := kv.SplitInternalKey(e.Key) | ||
| if rangeCF != cf || rangeVersion < version { | ||
| it.Next() | ||
| continue | ||
| } | ||
| rangeEnd := e.RangeEnd() | ||
| if kv.KeyInRange(userKey, rangeStart, rangeEnd) { | ||
| return true | ||
| } | ||
| it.Next() | ||
| } | ||
| it.Close() | ||
| } |
There was a problem hiding this comment.
isKeyCoveredByRangeTombstone returns true from inside the iterator loop without closing the current iterator (and any remaining iterators), which can leak resources/file handles. Ensure iterators are closed on all exit paths (e.g., defer it.Close() per iterator, and close previously created iterators before returning).
db.go
Outdated
| // Store range tombstone: Entry.Key=start, Entry.Value=end | ||
| entry := kv.NewEntryWithCF(cf, start, end) | ||
| entry.Meta = kv.BitRangeDelete | ||
| defer entry.DecrRef() | ||
| return db.setEntry(entry) |
There was a problem hiding this comment.
Range tombstones are written using the same internal-key scheme as point entries (Entry.Key=start + fixed nonTxnMaxVersion). This collides with normal writes to the start key: a subsequent Set(start, ...) will overwrite the tombstone record and can unintentionally remove the deletion for other keys in the range. To preserve correct semantics, encode range tombstones under a distinct internal key kind/prefix so they can coexist with point keys at the same start key.
db.go
Outdated
| func (db *DB) isKeyCoveredByRangeTombstone(cf kv.ColumnFamily, userKey []byte, version uint64) bool { | ||
| opt := &utils.Options{IsAsc: true} | ||
| iters := db.lsm.NewIterators(opt) | ||
| for _, it := range iters { | ||
| if it == nil { | ||
| continue | ||
| } | ||
| it.Rewind() | ||
| for it.Valid() { | ||
| item := it.Item() |
There was a problem hiding this comment.
This helper scans from Rewind() across every LSM iterator for each lookup, which is O(total-keys) per Get/iterator step and can become a major regression once tombstones exist. Consider indexing active range tombstones (per-CF) or at least short-circuiting the scan based on key ordering; also consider using Seek to avoid full scans when possible.
| // Skip range tombstone entries themselves | ||
| if src.IsRangeDelete() { | ||
| return false | ||
| } | ||
| iter.entry = *src | ||
| cf, userKey, ts := kv.SplitInternalKey(iter.entry.Key) | ||
| // Check if this key is covered by a range tombstone | ||
| if iter.db != nil && iter.db.isKeyCoveredByRangeTombstone(cf, userKey, ts) { | ||
| return false | ||
| } |
There was a problem hiding this comment.
materialize calls db.isKeyCoveredByRangeTombstone for every entry produced by the iterator, which currently performs a full LSM scan per key. This can make iteration O(n²) and can also amplify the resource-leak risk if iterators aren’t always closed. Consider reworking iterator-side filtering to share a precomputed/seekable tombstone view rather than doing per-key full scans.
db_deleterange_test.go
Outdated
| // Test 1: Basic deletion with [start, end) semantics | ||
| db.Set([]byte("a"), []byte("1")) | ||
| db.Set([]byte("b"), []byte("2")) | ||
| db.Set([]byte("c"), []byte("3")) | ||
|
|
||
| if err := db.DeleteRange([]byte("a"), []byte("c")); err != nil { | ||
| t.Fatal(err) | ||
| } |
There was a problem hiding this comment.
The test ignores errors from writes/deletes in multiple places (e.g., db.Set, db.DeleteRange, db.Del). If any of these operations fails, later assertions may produce misleading results. Prefer asserting require.NoError/t.Fatal on these calls to keep failures actionable.
lsm/executor.go
Outdated
| if isMaxLevel { | ||
| updateStats(entry) | ||
| continue | ||
| } | ||
| rangeTombstones = append(rangeTombstones, entry) |
There was a problem hiding this comment.
In max-level compactions, range tombstones are currently skipped (continue) and not recorded in rangeTombstones, so covered keys won’t be filtered and the deletion can be lost (keys may reappear after compaction). Consider still using max-level tombstones to discard covered keys (but avoid emitting the tombstone record into the output tables).
| if isMaxLevel { | |
| updateStats(entry) | |
| continue | |
| } | |
| rangeTombstones = append(rangeTombstones, entry) | |
| // Track all range tombstones so they can filter subsequent keys. | |
| rangeTombstones = append(rangeTombstones, entry) | |
| if isMaxLevel { | |
| // At max level, use the tombstone to discard covered keys but do not | |
| // emit the tombstone itself into the output tables. | |
| updateStats(entry) | |
| continue | |
| } |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds range-tombstone (DeleteRange/DeleteRangeCF) support: new Tombstone entry type, in-memory and on-disk collector, memtable and LSM integration for tombstone recording and coverage checks, iterator/readpath suppression of covered keys, compaction handling to preserve/apply tombstones, and extensive tests for behavior and recovery. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant DB as DB
participant Mem as Memtable
participant LSM as LSM
participant Col as RangeTombstoneCollector
App->>DB: DeleteRangeCF(cf, start, end)
DB->>DB: validate start<end
DB->>Mem: write BitRangeDelete entry (Key=start, Value=end)
Mem->>Mem: record mem-range-tombstone + wal seq
Mem-->>DB: ack
Note over Mem,LSM: On flush
Mem->>LSM: flush entries incl. range deletes
LSM->>Col: Add(range tombstone)
Col-->>LSM: stored
sequenceDiagram
participant App as Application
participant DB as DB
participant Mem as Memtable
participant LSM as LSM
participant Col as RangeTombstoneCollector
participant Comp as CompactionEngine
App->>DB: GetCF(cf, key)
DB->>Mem: check memtable coverage (isKeyCoveredByRangeTombstone)
alt covered in memtable
Mem-->>DB: covered -> ErrKeyNotFound
else not covered
DB->>LSM: get entry
LSM->>Col: IsKeyCovered(cf, key, entryVersion)
alt covered by collector
Col-->>LSM: true
LSM-->>DB: ErrKeyNotFound
else
LSM-->>DB: return entry
end
end
Note over Comp,LSM: During compaction
Comp->>Comp: collect range deletes, skip covered entries
Comp->>Col: rebuild/register tombstones after max-level rewrite
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@db_deleterange_test.go`:
- Line 16: Tests in db_deleterange_test.go call functions that return errors
which are currently ignored (e.g., db.Close() and various setup calls); add a
small test helper like mustNoErr(t, err) that calls t.Helper() and t.Fatalf on
non-nil errors, then replace all unchecked error-returning calls (the ones
flagged around lines with db.Close() and the other setup/operation calls) by
checking their error via mustNoErr(t, err) (or call mustNoErr immediately after
the call) so every error is handled and tests fail fast with a clear message.
In `@db.go`:
- Around line 624-657: The current isKeyCoveredByRangeTombstone function
performs a full LSM iterator scan (db.lsm.NewIterators) per lookup which is
expensive; instead implement and use a CF-scoped in-memory range-tombstone index
(e.g., an interval tree or sorted interval list) built/updated from memtables
and SSTable metadata and query it in O(log n) from isKeyCoveredByRangeTombstone;
replace the iterator scan in isKeyCoveredByRangeTombstone with a lookup against
that index (keyed by cf and version/sequence) and ensure the index is updated
during memtable inserts, memtable flushes and SSTable load/compaction routines
so range tombstones are visible to Get and iterator materialization without
scanning the LSM.
- Around line 624-655: The function isKeyCoveredByRangeTombstone leaks iterator
resources by returning early without closing the current iterator and remaining
iterators from db.lsm.NewIterators; change the logic so that before any early
return (including when kv.KeyInRange(...) causes a true result) you close the
current iterator and all remaining iterators, and propagate or at least log any
errors returned by it.Close(); ensure the cleanup runs regardless of the return
path (e.g., use defer per-iterator or a single deferred cleanup over the
iterators slice) while keeping the checks around kv.SplitInternalKey and
e.IsRangeDelete() intact.
In `@lsm/executor.go`:
- Around line 1001-1004: The local variable rangeTombstones is being
reinitialized each time addKeys (or the builder chunk loop) runs, causing
tombstones to be lost when a builder splits; instead persist and reuse the slice
across chunks by moving the declaration and initialization out of the per-chunk
scope (or into the builder/compaction state) so entries accumulate between
successive addKeys invocations; update references around the blocks that declare
var rangeTombstones []*kv.Entry (and the related uses near tableKr and
isMaxLevel) so they append to the shared slice rather than resetting it,
ensuring tombstones cover keys across split chunks.
- Around line 1034-1038: The code currently discards column-family info when
checking range tombstone coverage; capture and compare CFs before marking a key
as covered. Change the kv.SplitInternalKey call in the loop to capture the CF
(e.g., cf, userKey, version := kv.SplitInternalKey(key)) and likewise extract
the CF for each tombstone (e.g., rtCF, rtStart, rtVersion :=
kv.SplitInternalKey(rt.Key)) and only set covered = true when cf == rtCF &&
rtVersion >= version && kv.KeyInRange(userKey, rtStart, rt.RangeEnd()). This
ensures a tombstone in one column-family does not affect keys in another.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
db.godb_deleterange_test.goiterator.gokv/const.gokv/entry.golsm/executor.go
|
Thanks for the contribution. The feature direction is good, but I think we should fix a few blocking issues before merge: compaction range coverage must include CF match, |
Thank you for the detailed review and the helpful suggestions! I'll address the blocking issues you've listed: compaction range coverage must include CF match, avoid storing *kv.Entry tombstones (iterator entry is reused), keep tombstone state across builder splits in subcompact, fix iterator cleanup + current full-scan read-path overhead, resolve lint (errcheck) failures. Regarding the performance point (iterator cleanup and full-scan read-path overhead), I'll fix the immediate issues as required. However, to keep the changes focused and avoid scope creep, I plan to add a TODO comment in the code to highlight potential further optimizations in that area. That way, we can track it as a future improvement. I'll update the PR shortly. Thanks again for the feedback! |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lsm/executor.go (1)
1051-1065: CF matching and coverage check logic looks correct.The implementation properly:
- Extracts CF from both the entry (
cf) and tombstone (rt.cf)- Compares CF before applying coverage:
rt.cf == cf- Checks version ordering:
rt.version >= version(newer tombstone shadows older entry)- Validates range coverage:
kv.KeyInRange(userKey, rt.start, rt.end)This addresses the previous review concern about cross-CF tombstone application.
Consider adding a TODO for performance optimization.
The O(entries × tombstones) complexity may become a concern with many active range tombstones. Per the PR objectives and reviewer feedback, a TODO was planned for read-path optimization—consider adding one here for the compaction path as well.
,
💡 Optional: Add TODO for future optimization
if !entry.IsRangeDelete() { covered := false cf, userKey, version := kv.SplitInternalKey(key) + // TODO: Optimize range tombstone lookup. Consider: + // - Interval tree for O(log n) coverage checks + // - Pruning tombstones whose range is entirely before current key for _, rt := range rangeTombstones { // Check CF match and version/range coverage. if rt.cf == cf && rt.version >= version && kv.KeyInRange(userKey, rt.start, rt.end) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lsm/executor.go` around lines 1051 - 1065, The CF-matching and coverage loop is correct but can be costly O(entries × rangeTombstones); add a clear TODO near the loop that mentions optimizing the compaction/read-path for many active range tombstones and suggests future approaches (e.g., index by CF or use interval tree) so maintainers know to revisit; place the TODO adjacent to the code using entry.IsRangeDelete(), kv.SplitInternalKey(...), rangeTombstones, and updateStats to make it easy to find.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lsm/executor.go`:
- Around line 1051-1065: The CF-matching and coverage loop is correct but can be
costly O(entries × rangeTombstones); add a clear TODO near the loop that
mentions optimizing the compaction/read-path for many active range tombstones
and suggests future approaches (e.g., index by CF or use interval tree) so
maintainers know to revisit; place the TODO adjacent to the code using
entry.IsRangeDelete(), kv.SplitInternalKey(...), rangeTombstones, and
updateStats to make it easy to find.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
db.golsm/executor.go
🚧 Files skipped from review as they are similar to previous changes (1)
- db.go
|
Thanks for the PR — really appreciate the work here. The overall direction looks good, but I found a few issues that I think should be addressed before merge:
Happy to re-review after these are addressed — this feature is valuable, and getting these semantics right will make it much safer to land. |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
db_deleterange_test.go (1)
15-16:⚠️ Potential issue | 🟠 MajorCheck every error-returning DB call in these tests.
The new cases still ignore
Close,Set,SetCF,Del,DeleteRange, andDeleteRangeCFerrors in multiple places. That preserves the currenterrcheckfailure and can turn setup failures into misleading assertions.🧪 Proposed fix pattern
+func mustNoErr(t *testing.T, err error) { + t.Helper() + if err != nil { + t.Fatal(err) + } +} + func TestDeleteRangeCore(t *testing.T) { opt := getTestOptions(t.TempDir()) db := Open(opt) - defer db.Close() + t.Cleanup(func() { mustNoErr(t, db.Close()) }) - db.Set([]byte("a"), []byte("1")) - db.Set([]byte("b"), []byte("2")) - db.Set([]byte("c"), []byte("3")) + mustNoErr(t, db.Set([]byte("a"), []byte("1"))) + mustNoErr(t, db.Set([]byte("b"), []byte("2"))) + mustNoErr(t, db.Set([]byte("c"), []byte("3"))) - db.DeleteRange([]byte("key1"), []byte("key3")) + mustNoErr(t, db.DeleteRange([]byte("key1"), []byte("key3")))Also applies to: 19-21, 38-42, 55-56, 70-71, 87-87, 109-114, 128-137, 153-157, 179-186, 208-216, 234-238
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@db_deleterange_test.go` around lines 15 - 16, Every error-returning DB call in these tests (Open/Close, Set, SetCF, Del, DeleteRange, DeleteRangeCF) must have their errors captured and asserted instead of being ignored; change statements like db := Open(opt); defer db.Close() and calls to Set/SetCF/Del/DeleteRange/DeleteRangeCF to capture the returned error (err := db.Set(...), err := db.SetCF(...), etc.) and assert no error (use require.NoError(t, err) or t.Fatalf on err), and change defer db.Close() to defer a closure that checks and fails the test if db.Close() returns an error (defer func(){ if err := db.Close(); err != nil { t.Fatalf(...) } }()). Apply this to all occurrences referencing Open, Close, Set, SetCF, Del, DeleteRange, and DeleteRangeCF in the test file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lsm/executor.go`:
- Around line 552-556: The rebuild vs incremental-update race: serialize
rebuildRangeTombstones()/Rebuild() with concurrent rtCollector.Add() from flush
paths by introducing a shared synchronization point (e.g., a mutex or RWLock)
around lm.rtCollector mutations; in practice, wrap the full rebuild performed in
runCompactDef() (rebuildRangeTombstones()/Rebuild()) in the same lock you use
when calling rtCollector.Add() in lsm/levels.go, or alternatively change the
flush path to build a new collector and swap it under the same lock so no
tombstones can be lost during the swap. Ensure you reference lm.rtCollector,
runCompactDef(), rebuildRangeTombstones(), Rebuild(), and rtCollector.Add() when
adding the locking to guarantee both code paths use the same synchronization.
In `@lsm/memtable.go`:
- Around line 213-221: The memRangeTombstone created for live writes in the
setBatch() path is missing its version, causing isKeyCoveredByRangeTombstone()
to skip version ordering for new tombstones; populate memRangeTombstone.version
when constructing the tombstone (use the memtable's current version field—e.g.,
m.version or the same version value used/restored in openMemTable()) so new
range-deletes have the correct version ordering, mirroring how openMemTable()
restores version for recovered tombstones.
- Around line 131-137: GetMemTables() only pins the memtable index but not
keyWalOffset or rangeTombstones, allowing close() to clear them while a pinned
reader is still using them; change the cleanup so tombstone metadata is retained
until the last pinned reader releases the memtable: introduce/use a
memtable-owned refcount (incremented by GetMemTables pin and decremented by the
reader unpin) and move the calls keyWalOffset.Clear() and the rangeTombstones =
nil assignment (currently under rtMu in close()) to run only when that refcount
reaches zero (or add a dedicated release method that does this when
refcount==0), leaving close() to mark the memtable closed but not free tombstone
metadata prematurely.
In `@lsm/range_tombstone.go`:
- Around line 13-18: Range tombstones currently only store Version which loses
tie-break ordering after flush/rebuild; add a stable tie-breaker field (e.g.,
WalSeq or Seq uint64) to RangeTombstone and persist it so equality of Version
can be resolved deterministically across restarts, then update IsKeyCovered()
and any logic in memTable.isKeyCoveredByRangeTombstone() to compare (Version,
Seq) instead of just Version (use Seq when Versions are equal) so flushed
tombstones keep the same ordering as in-memory walSeqKey; ensure
serialization/deserialization and any constructors/builders for RangeTombstone
(and code paths at the locations mentioned around lines ~39-52 and ~111-117) are
updated to read/write the new field.
- Around line 55-59: The Add method on RangeTombstoneCollector currently appends
caller-owned RangeTombstone slices directly, allowing external mutation of
Start/End; change RangeTombstoneCollector.Add(rt RangeTombstone) to take
ownership by making defensive copies of rt.Start and rt.End (e.g., allocate new
byte slices and copy contents) before locking and appending so the collector
stores its own slices; update any tests if needed and keep this behavior
consistent with rebuildRangeTombstones/Rebuild which already safe-copies.
---
Duplicate comments:
In `@db_deleterange_test.go`:
- Around line 15-16: Every error-returning DB call in these tests (Open/Close,
Set, SetCF, Del, DeleteRange, DeleteRangeCF) must have their errors captured and
asserted instead of being ignored; change statements like db := Open(opt); defer
db.Close() and calls to Set/SetCF/Del/DeleteRange/DeleteRangeCF to capture the
returned error (err := db.Set(...), err := db.SetCF(...), etc.) and assert no
error (use require.NoError(t, err) or t.Fatalf on err), and change defer
db.Close() to defer a closure that checks and fails the test if db.Close()
returns an error (defer func(){ if err := db.Close(); err != nil { t.Fatalf(...)
} }()). Apply this to all occurrences referencing Open, Close, Set, SetCF, Del,
DeleteRange, and DeleteRangeCF in the test file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e8a53532-8152-45ce-91bd-0025926bf553
📒 Files selected for processing (12)
db.godb_deleterange_test.goiterator.golsm/executor.golsm/levels.golsm/lsm.golsm/lsm_test.golsm/memtable.golsm/range_tombstone.golsm/range_tombstone_test.goutils/art.goutils/skiplist.go
🚧 Files skipped from review as they are similar to previous changes (1)
- iterator.go
| // After max-level compaction, range tombstone layout may change. | ||
| // Rebuild the in-memory range tombstone index to keep read visibility correct. | ||
| if cd.nextLevel != nil && cd.nextLevel.levelNum == lm.opt.MaxLevelNum-1 && lm.rtCollector != nil { | ||
| lm.rebuildRangeTombstones() | ||
| } |
There was a problem hiding this comment.
Serialize collector rebuilds with concurrent flush updates.
runCompactDef() now does a full rebuildRangeTombstones(), but lsm/levels.go still mutates the same collector incrementally with rtCollector.Add(). If a flush installs a tombstone after the rebuild snapshot is taken but before Rebuild() swaps the slice, that tombstone is dropped from the in-memory index and covered keys can become visible again. These paths need a shared synchronization point, or a single-source rebuild/update strategy.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lsm/executor.go` around lines 552 - 556, The rebuild vs incremental-update
race: serialize rebuildRangeTombstones()/Rebuild() with concurrent
rtCollector.Add() from flush paths by introducing a shared synchronization point
(e.g., a mutex or RWLock) around lm.rtCollector mutations; in practice, wrap the
full rebuild performed in runCompactDef() (rebuildRangeTombstones()/Rebuild())
in the same lock you use when calling rtCollector.Add() in lsm/levels.go, or
alternatively change the flush path to build a new collector and swap it under
the same lock so no tombstones can be lost during the swap. Ensure you reference
lm.rtCollector, runCompactDef(), rebuildRangeTombstones(), Rebuild(), and
rtCollector.Add() when adding the locking to guarantee both code paths use the
same synchronization.
lsm/memtable.go
Outdated
| // Release the keyWalOffset side map and range tombstone cache to prevent | ||
| // unbounded memory growth after flush. Range tombstones have already been | ||
| // transferred to rtCollector by the flush path before close is called. | ||
| m.keyWalOffset.Clear() | ||
| m.rtMu.Lock() | ||
| m.rangeTombstones = nil | ||
| m.rtMu.Unlock() |
There was a problem hiding this comment.
Keep tombstone metadata alive until the last pinned reader is done.
GetMemTables() pins the memtable index, not keyWalOffset or rangeTombstones. A concurrent flush can remove an immutable, call close(), and clear both before an in-flight read finishes its tombstone check, which reopens the same-version visibility bug on that pinned table.
Possible minimal fix
func (m *memTable) close() error {
if m == nil {
return nil
}
- // Release the keyWalOffset side map and range tombstone cache to prevent
- // unbounded memory growth after flush. Range tombstones have already been
- // transferred to rtCollector by the flush path before close is called.
- m.keyWalOffset.Clear()
- m.rtMu.Lock()
- m.rangeTombstones = nil
- m.rtMu.Unlock()
return nil
}Longer term, if you still want early reclamation, this cleanup needs to move behind a memtable-owned refcount rather than close().
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Release the keyWalOffset side map and range tombstone cache to prevent | |
| // unbounded memory growth after flush. Range tombstones have already been | |
| // transferred to rtCollector by the flush path before close is called. | |
| m.keyWalOffset.Clear() | |
| m.rtMu.Lock() | |
| m.rangeTombstones = nil | |
| m.rtMu.Unlock() | |
| func (m *memTable) close() error { | |
| if m == nil { | |
| return nil | |
| } | |
| return nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lsm/memtable.go` around lines 131 - 137, GetMemTables() only pins the
memtable index but not keyWalOffset or rangeTombstones, allowing close() to
clear them while a pinned reader is still using them; change the cleanup so
tombstone metadata is retained until the last pinned reader releases the
memtable: introduce/use a memtable-owned refcount (incremented by GetMemTables
pin and decremented by the reader unpin) and move the calls keyWalOffset.Clear()
and the rangeTombstones = nil assignment (currently under rtMu in close()) to
run only when that refcount reaches zero (or add a dedicated release method that
does this when refcount==0), leaving close() to mark the memtable closed but not
free tombstone metadata prematurely.
| type RangeTombstone struct { | ||
| CF kv.ColumnFamily | ||
| Start []byte // inclusive | ||
| End []byte // exclusive | ||
| Version uint64 | ||
| } |
There was a problem hiding this comment.
Flushed tombstones still lose equal-version ordering.
memTable.isKeyCoveredByRangeTombstone() uses walSeqKey to break ties when versions are equal, but RangeTombstone and IsKeyCovered() persist only Version. In the non-transactional path, point writes and range tombstones share the same version, so after flush/rebuild a deleted key can become visible again because rt.Version > version is false for the exact case that still needs ordering.
This needs a stable ordering signal that survives flush/rebuild, not just the version.
Also applies to: 39-52, 111-117
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lsm/range_tombstone.go` around lines 13 - 18, Range tombstones currently only
store Version which loses tie-break ordering after flush/rebuild; add a stable
tie-breaker field (e.g., WalSeq or Seq uint64) to RangeTombstone and persist it
so equality of Version can be resolved deterministically across restarts, then
update IsKeyCovered() and any logic in memTable.isKeyCoveredByRangeTombstone()
to compare (Version, Seq) instead of just Version (use Seq when Versions are
equal) so flushed tombstones keep the same ordering as in-memory walSeqKey;
ensure serialization/deserialization and any constructors/builders for
RangeTombstone (and code paths at the locations mentioned around lines ~39-52
and ~111-117) are updated to read/write the new field.
| // Add appends a single tombstone to the collector. | ||
| func (c *RangeTombstoneCollector) Add(rt RangeTombstone) { | ||
| c.mu.Lock() | ||
| c.tombstones = append(c.tombstones, rt) | ||
| c.mu.Unlock() |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Make Add() take ownership of Start and End.
rebuildRangeTombstones() safe-copies before Rebuild(), but Add() still stores caller-owned slices verbatim. One iterator-backed caller here is enough to reintroduce mutated tombstone boundaries.
Suggested fix
func (c *RangeTombstoneCollector) Add(rt RangeTombstone) {
+ rt.Start = kv.SafeCopy(nil, rt.Start)
+ rt.End = kv.SafeCopy(nil, rt.End)
c.mu.Lock()
c.tombstones = append(c.tombstones, rt)
c.mu.Unlock()
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lsm/range_tombstone.go` around lines 55 - 59, The Add method on
RangeTombstoneCollector currently appends caller-owned RangeTombstone slices
directly, allowing external mutation of Start/End; change
RangeTombstoneCollector.Add(rt RangeTombstone) to take ownership by making
defensive copies of rt.Start and rt.End (e.g., allocate new byte slices and copy
contents) before locking and appending so the collector stores its own slices;
update any tests if needed and keep this behavior consistent with
rebuildRangeTombstones/Rebuild which already safe-copies.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
lsm/levels.go (1)
212-228: Consider logging or handling malformed range tombstone keys.When
kv.SplitInternalKey(entry.Key)fails for a range tombstone entry (Line 219-220), the codecontinues without adding the tombstone tonewTombstones, but the entry is still added to the SSTable viabuilder.AddKey(entry)on Line 229. This creates an inconsistency where the SSTable contains a range tombstone that the in-memory collector doesn't know about.While this is unlikely to happen with well-formed data, a defensive approach would be to either skip adding malformed tombstones to the builder entirely, or log a warning.
Proposed fix
if entry != nil && entry.IsRangeDelete() && lm.rtCollector != nil { cf, start, version, ok := kv.SplitInternalKey(entry.Key) if !ok { + // Skip malformed range tombstone entries entirely continue } newTombstones = append(newTombstones, RangeTombstone{ CF: cf, Start: kv.SafeCopy(nil, start), End: kv.SafeCopy(nil, entry.RangeEnd()), Version: version, }) } + if entry == nil { + continue + } builder.AddKey(entry)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lsm/levels.go` around lines 212 - 228, The loop currently continues when kv.SplitInternalKey(entry.Key) fails for a range-delete entry but still allows builder.AddKey(entry) later, causing SSTables to include tombstones the in-memory collector (lm.rtCollector / newTombstones) doesn't know about; modify the logic so that if entry.IsRangeDelete() and kv.SplitInternalKey(...) returns ok==false you either (a) skip adding that entry to the builder (avoid calling builder.AddKey(entry)), or (b) emit a warning log including identifying info (e.g., entry.Key, CF info if available) before continuing, and ensure newTombstones and builder remain consistent; update the code paths around newTombstones, kv.SplitInternalKey, entry.IsRangeDelete, and builder.AddKey to implement one of these fixes.db.go (1)
443-453: Range tombstone coverage checking could be more robust with sequence-based ordering throughout the LSM.The current implementation uses memtable sequence numbers to determine write order during coverage checks, but falls back to version-based comparison for flushed range tombstones. Since all non-transactional operations share
nonTxnMaxVersion, the version comparison alone cannot distinguish write ordering for flushed tombstones—though the current code avoids the bug by checkingrt.Version > version, which returns false when both have the same version.This is fragile: the correctness depends on memtable-level sequence tracking plus the implicit assumption that flushed tombstones don't incorrectly cover newer writes. The test
TestDeleteRangeVisibilityBugvalidates the expected behavior for the scenarioSet("a1", old) → DeleteRange("a0","a9") → Set("a1", new) → Get("a1"), but relying on version-based checks for flushed tombstones is not a robust long-term solution.Consider propagating write-sequence information through flushing or using a fully sequence-based ordering mechanism to eliminate the version-based logic for tombstone coverage checks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@db.go` around lines 443 - 453, The range-tombstone coverage check is fragile because DeleteRangeCF creates entries with nonTxnMaxVersion and relies on Version-based comparisons; instead propagate a write sequence and use it for ordering: extend kv.NewInternalEntry / the kv.Entry to carry a sequence number (set by DeleteRangeCF when creating the tombstone) and ensure the flush path preserves that sequence on flushed range tombstones, then change the tombstone-coverage logic to compare entry.Sequence (or the new sequence field) rather than relying on Version; update DeleteRangeCF to populate the sequence (and any memtable/flush plumbing) and ensure ApplyInternalEntries and any coverage checks use sequence-based ordering consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@db.go`:
- Around line 443-453: The range-tombstone coverage check is fragile because
DeleteRangeCF creates entries with nonTxnMaxVersion and relies on Version-based
comparisons; instead propagate a write sequence and use it for ordering: extend
kv.NewInternalEntry / the kv.Entry to carry a sequence number (set by
DeleteRangeCF when creating the tombstone) and ensure the flush path preserves
that sequence on flushed range tombstones, then change the tombstone-coverage
logic to compare entry.Sequence (or the new sequence field) rather than relying
on Version; update DeleteRangeCF to populate the sequence (and any
memtable/flush plumbing) and ensure ApplyInternalEntries and any coverage checks
use sequence-based ordering consistently.
In `@lsm/levels.go`:
- Around line 212-228: The loop currently continues when
kv.SplitInternalKey(entry.Key) fails for a range-delete entry but still allows
builder.AddKey(entry) later, causing SSTables to include tombstones the
in-memory collector (lm.rtCollector / newTombstones) doesn't know about; modify
the logic so that if entry.IsRangeDelete() and kv.SplitInternalKey(...) returns
ok==false you either (a) skip adding that entry to the builder (avoid calling
builder.AddKey(entry)), or (b) emit a warning log including identifying info
(e.g., entry.Key, CF info if available) before continuing, and ensure
newTombstones and builder remain consistent; update the code paths around
newTombstones, kv.SplitInternalKey, entry.IsRangeDelete, and builder.AddKey to
implement one of these fixes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cd14551a-d2fc-4ae6-b688-7cc83e01e9f1
📒 Files selected for processing (10)
db.godb_deleterange_test.goiterator.gokv/entry.golsm/executor.golsm/levels.golsm/lsm.golsm/lsm_test.golsm/memtable.golsm/range_tombstone.go
🚧 Files skipped from review as they are similar to previous changes (1)
- lsm/range_tombstone.go
Thanks for the detailed review — I addressed each of the issues you called out.
You were right that comparing only rangeVersion >= version is insufficient for the non-transactional path, because point writes and range I fixed this by introducing explicit in-mem write ordering for memtable entries:
Relevant paths:
This fixes the repro:
Agreed — dropping range tombstones immediately in the max-level path is unsafe when compaction is range-sharded. I fixed this in lsm/executor.go by preserving range tombstones through the max-level compaction scan instead of dropping them eagerly. The
Relevant area:
This prevents older covered keys outside a partial shard rewrite from becoming visible again.
Also addressed. I merged from the latest origin/main, resolved the iterator.go conflict, and reran the test suite against the updated Validation run:
So at this point the branch is rebased/merged onto current main, builds cleanly, passes the full test suite, and passes targeted race Happy to push the updated branch for another look. |
…tion - move range tombstone coverage checks into the LSM read path - track memtable tombstones with WAL ordering for same-version reads - rebuild flushed tombstone state on startup, flush and max-level compaction - keep max-level tombstones during partial rewrites and add regressions
c6bbef2 to
db95615
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
lsm/executor.go (1)
556-560:⚠️ Potential issue | 🔴 CriticalSerialize collector rebuilds with flush-time updates.
rebuildRangeTombstones()rebuilds from a snapshot and swaps the collector, whilelsm/levels.gostill appends flushed tombstones viartCollector.Add(). A flush that lands between the scan andRebuild()can be dropped from the in-memory index, making covered keys visible until the next rebuild.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lsm/executor.go` around lines 556 - 560, rebuildRangeTombstones() can race with concurrent flush-time rtCollector.Add() calls causing tombstones to be lost; serialize rebuilds with flush updates by performing the collector swap under the same lock used for flush-time adds (or by adding an atomic swap method on the collector). Concretely, change lm.rebuildRangeTombstones() to acquire the flush/collector mutex (the same lock used around rtCollector.Add()), take the snapshot and replace the collector while holding that lock, or add a ReplaceCollectorAtomic(old,new) on rtCollector and call that from rebuildRangeTombstones() so no rtCollector.Add() can run concurrently and no flush can be dropped.lsm/range_tombstone.go (1)
13-18:⚠️ Potential issue | 🔴 CriticalPersist the equal-version tie-breaker in the collector.
The collector still reduces ordering to
Versiononly. After flush/rebuild, equal-version point writes and range tombstones become indistinguishable again, so a newer point write can be hidden or an older key can resurface.RangeTombstoneneeds the persisted WAL/sequence tie-breaker,IsKeyCovered()needs to compare(Version, Seq), andrebuildRangeTombstones()needs to restore that field from disk too.Also applies to: 39-52, 111-121
🧹 Nitpick comments (1)
db_deleterange_test.go (1)
247-291: Add an on-disk equal-version regression.These tests cover WAL recovery and same-version visibility separately, but not the case after the tombstone leaves the memtable. Please add a
Set -> DeleteRange -> force flush/rotation -> reopen/Getpath so the SST/collector ordering is exercised too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@db_deleterange_test.go` around lines 247 - 291, Update TestDeleteRangeVisibilityBug to exercise the on-disk path: after calling mustSet(... "old") and mustDeleteRange(...), force the tombstone out of the memtable (e.g., trigger a memtable flush/rotation — use the existing helpers such as mustClose/Open to persist memtables to SSTs or any db.Flush/db.Rotate helper if available), then perform the newer mustSet(... "new"), close/reopen the DB with Open, and finally call db.Get on "a1" to assert the value is "new"; reference TestDeleteRangeVisibilityBug, mustSet, mustDeleteRange, mustClose/Open, and db.Get to locate and implement the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lsm/executor.go`:
- Around line 1030-1031: The split/subcompaction currently only accumulates
tombstones seen after the worker starts, so wide tombstones that begin before
kr.Left get skipped and cease covering later splits; update the split
initialization to seed rangeTombstones with any existing rangeTombstone whose
span overlaps the split's left bound (kr.Left) before scanning forward.
Concretely: when starting a new builder split/subcompaction (the code around
where rangeTombstones is declared and where code seeks past tombstones at
kr.Left), perform a lookup/scan for tombstones with start<=kr.Left and
end>kr.Left and append them into rangeTombstones so they are carried into the
split, then proceed with the existing forward accumulation logic.
In `@lsm/lsm.go`:
- Around line 89-115: checkRangeTombstone currently breaks same-version ties
using WAL sequence from memTables (walSeqForKey) but falls back to
rtCollector.IsKeyCovered which only checks Version, so equal-version ordering
can be lost; persist the range-tombstone WAL/sequence when writing SSTs and
reconstruct that sequence into the collector during rebuild, then update
checkRangeTombstone to pass the tombstone sequence (or have
rtCollector.IsKeyCovered consult stored tombstone sequences) so the collector
can compare entrySeq vs tombstoneSeq the same way
memTable.isKeyCoveredByRangeTombstone does; relevant symbols:
checkRangeTombstone, memTable.walSeqForKey,
memTable.isKeyCoveredByRangeTombstone, levels.rtCollector and
rtCollector.IsKeyCovered — ensure SST writer, rebuild logic, and rtCollector
store and expose tombstone sequence metadata and use it in IsKeyCovered.
---
Duplicate comments:
In `@lsm/executor.go`:
- Around line 556-560: rebuildRangeTombstones() can race with concurrent
flush-time rtCollector.Add() calls causing tombstones to be lost; serialize
rebuilds with flush updates by performing the collector swap under the same lock
used for flush-time adds (or by adding an atomic swap method on the collector).
Concretely, change lm.rebuildRangeTombstones() to acquire the flush/collector
mutex (the same lock used around rtCollector.Add()), take the snapshot and
replace the collector while holding that lock, or add a
ReplaceCollectorAtomic(old,new) on rtCollector and call that from
rebuildRangeTombstones() so no rtCollector.Add() can run concurrently and no
flush can be dropped.
---
Nitpick comments:
In `@db_deleterange_test.go`:
- Around line 247-291: Update TestDeleteRangeVisibilityBug to exercise the
on-disk path: after calling mustSet(... "old") and mustDeleteRange(...), force
the tombstone out of the memtable (e.g., trigger a memtable flush/rotation — use
the existing helpers such as mustClose/Open to persist memtables to SSTs or any
db.Flush/db.Rotate helper if available), then perform the newer mustSet(...
"new"), close/reopen the DB with Open, and finally call db.Get on "a1" to assert
the value is "new"; reference TestDeleteRangeVisibilityBug, mustSet,
mustDeleteRange, mustClose/Open, and db.Get to locate and implement the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 39f58b65-3bc2-4e50-9b2b-0ac08eca8f4a
📒 Files selected for processing (12)
db.godb_deleterange_test.goiterator.gokv/const.gokv/entry.golsm/executor.golsm/levels.golsm/lsm.golsm/lsm_test.golsm/memtable.golsm/range_tombstone.golsm/range_tombstone_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- kv/entry.go
- iterator.go
- lsm/memtable.go
lsm/executor.go
Outdated
| // Keep tombstone state across builder splits. | ||
| var rangeTombstones []rangeTombstone |
There was a problem hiding this comment.
Seed each split with tombstones that overlap its left bound.
rangeTombstones only accumulates tombstones seen after this worker starts scanning. On later subcompactions, Line 1103 seeks past tombstones whose start key is before kr.Left, so a wide tombstone can stop covering keys in subsequent splits and stale keys can be emitted.
Also applies to: 1101-1118
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lsm/executor.go` around lines 1030 - 1031, The split/subcompaction currently
only accumulates tombstones seen after the worker starts, so wide tombstones
that begin before kr.Left get skipped and cease covering later splits; update
the split initialization to seed rangeTombstones with any existing
rangeTombstone whose span overlaps the split's left bound (kr.Left) before
scanning forward. Concretely: when starting a new builder split/subcompaction
(the code around where rangeTombstones is declared and where code seeks past
tombstones at kr.Left), perform a lookup/scan for tombstones with start<=kr.Left
and end>kr.Left and append them into rangeTombstones so they are carried into
the split, then proceed with the existing forward accumulation logic.
lsm/lsm.go
Outdated
| func (lsm *LSM) checkRangeTombstone(cf kv.ColumnFamily, userKey []byte, entryVersion uint64, internalKey []byte, tables []*memTable) bool { | ||
| // Find the entry's write sequence from the side map. | ||
| entrySeq, seqFound := walSeqKey{}, false | ||
| for _, mt := range tables { | ||
| if mt == nil { | ||
| continue | ||
| } | ||
| if seq, ok := mt.walSeqForKey(internalKey); ok { | ||
| entrySeq = seq | ||
| seqFound = true | ||
| break | ||
| } | ||
| } | ||
| // Check memtable tombstones (O(M) where M = number of range tombstones). | ||
| for _, mt := range tables { | ||
| if mt == nil { | ||
| continue | ||
| } | ||
| if mt.isKeyCoveredByRangeTombstone(cf, userKey, entryVersion, entrySeq, seqFound) { | ||
| return true | ||
| } | ||
| } | ||
| // Check flushed range tombstones via collector (version-based). | ||
| if lsm.levels == nil || lsm.levels.rtCollector == nil { | ||
| return false | ||
| } | ||
| return lsm.levels.rtCollector.IsKeyCovered(cf, userKey, entryVersion) |
There was a problem hiding this comment.
The SST fallback still loses equal-version ordering.
The memtable path uses WAL sequence to break same-version ties, but the fallback here only asks the collector about Version. Once both records are read from SSTs, Set(k) followed by DeleteRange(...) at nonTxnMaxVersion can become visible again because the collector cannot tell the tombstone is newer. Persist and consult the tombstone sequence in the collector/rebuild path too.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lsm/lsm.go` around lines 89 - 115, checkRangeTombstone currently breaks
same-version ties using WAL sequence from memTables (walSeqForKey) but falls
back to rtCollector.IsKeyCovered which only checks Version, so equal-version
ordering can be lost; persist the range-tombstone WAL/sequence when writing SSTs
and reconstruct that sequence into the collector during rebuild, then update
checkRangeTombstone to pass the tombstone sequence (or have
rtCollector.IsKeyCovered consult stored tombstone sequences) so the collector
can compare entrySeq vs tombstoneSeq the same way
memTable.isKeyCoveredByRangeTombstone does; relevant symbols:
checkRangeTombstone, memTable.walSeqForKey,
memTable.isKeyCoveredByRangeTombstone, levels.rtCollector and
rtCollector.IsKeyCovered — ensure SST writer, rebuild logic, and rtCollector
store and expose tombstone sequence metadata and use it in IsKeyCovered.
Summary
Implements efficient range deletion using LSM-level range tombstones, eliminating the need for expensive point deletes during large key-range cleanup operations.
Motivation
Previously, deleting a large key range required issuing individual point deletes for each key, resulting in:
Changes
API Layer
DB.DeleteRange(start, end []byte)for default column familyDB.DeleteRangeCF(cf, start, end []byte)for specific column families[start, end)(inclusive start, exclusive end)Storage Layer
BitRangeDeleteflag to mark range tombstone entriesEntry.Key=start, Entry.Value=end, Meta=BitRangeDeleteEntry.IsRangeDelete()andEntry.RangeEnd()helper methodsRead Path
Getoperations check if keys are covered by range tombstones viaisKeyCoveredByRangeTombstone()Compaction
WAL & Recovery
Usage Example
Notes
Summary by CodeRabbit