-
Notifications
You must be signed in to change notification settings - Fork 92
[dbsp, compiler] GC for the right-hand side of ASOF joins. #5370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mihaibudiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a commit with the compiler support
| } | ||
|
|
||
| /// Merger that merges up to 64 batches at a time. | ||
| // FIXME: can we apply GC to the output of the merger instead of individual cursors? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If N is not large, this is not a big issue.
And I suspect in practice it is never large.
| return Ok(true); | ||
| } | ||
| } else { | ||
| // TODO: We currently don't handle the case when value_filter is a LastN filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we make sure we don't forget this? should an issue be filed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll file an issue
|
|
||
| // Find n'th value below the waterline. | ||
| let mut count = 1; | ||
| while count < *n && cursor.val_valid() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the amortized cost of this? this looks expensive.
|
Partially fixes #1975 |
Not until we actually apply it to top-k operators. |
FilteredMergeCursor was implemented as a wrapper around a MergeCursor. In order to enable more flexible implementations that can do reverse search and lookahead we modify it to take a full-featured `Cursor` as an argument. Signed-off-by: Leonid Ryzhyk <[email protected]>
Add support for a new form of GC described in #1975 that preserves the last N values before the waterline. This can be used to GC the right-hand side of ASOF joins and top-k operators. Limitations: - The new GC mechanism doesn't yet work for PushMerger's. It's not hard to add, but since we only use PushMerger's via devtweaks, this doesn't seem like a priority - The current implementation can postpone discarding unused values quite a bit, potentially keeping several times more state than necessary (even though the state should be bounded): Specifically, it discards up to last the N values in each cursor when merging. The resulting batch may end up with more than N values below the waterline, which won't be discarded until the next merge operation. Moreover, the entire spine can have many more values before the waterline stored in other batches that were not part of the merge. Signed-off-by: Leonid Ryzhyk <[email protected]>
Signed-off-by: Mihai Budiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds garbage collection support for the right-hand side of ASOF JOIN operations. It introduces a new DBSPIntegrateTraceRetainValuesLastNOperator that retains the last N values below a waterline threshold, which is necessary for ASOF joins where we need to preserve a fixed number of recent values regardless of how far in the past they are.
Key changes:
- Added
DBSPIntegrateTraceRetainValuesLastNOperatorfor retaining the last N values per key - Implemented
GroupFilterenum to support both simple and LastN filtering strategies - Updated ASOF join implementation to apply GC to both sides of the join
- Added comprehensive test coverage for the new functionality
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/circuit/operator/DBSPIntegrateTraceRetainValuesLastNOperator.java |
New operator for retaining last N values with GC |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/circuit/operator/GCOperator.java |
Added asOperator() method to interface |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/monotonicity/InsertLimiters.java |
Updated ASOF join to create GC operators for both sides |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/LowerAsof.java |
Extended to handle LastN operators in ASOF join lowering |
crates/dbsp/src/trace/filter.rs |
New file defining GroupFilter enum for filtering strategies |
crates/dbsp/src/trace/cursor.rs |
Implemented GroupFilterCursor state machine for LastN filtering |
crates/dbsp/src/operator/dynamic/trace.rs |
Added test coverage for integrate_trace_retain_values_last_n |
Comments suppressed due to low confidence (2)
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/LowerAsof.java:1
- Missing closing parenthesis in comment. The comment should end with
)).
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/LowerAsof.java:1 - Missing closing parenthesis in comment. The comment should end with
)).
| /// Assumes that the predicate is monotonic: once it is satisfied for a value, | ||
| /// it is also satisfied for all subsequent values for the same key. | ||
| /// Also assumed that the values are ordered in some way, so that the last N | ||
| /// values under the cursor ate the ones that need to be preserved. |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'ate' to 'are'.
| /// values under the cursor ate the ones that need to be preserved. | |
| /// values under the cursor are the ones that need to be preserved. |
Fixes #2010