Skip to content

Conversation

@ryzhyk
Copy link
Contributor

@ryzhyk ryzhyk commented Jan 4, 2026

Fixes #2010

@ryzhyk ryzhyk requested a review from mihaibudiu January 4, 2026 22:51
@ryzhyk ryzhyk added SQL compiler Related to the SQL compiler DBSP core Related to the core DBSP library labels Jan 4, 2026
Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a commit with the compiler support

}

/// Merger that merges up to 64 batches at a time.
// FIXME: can we apply GC to the output of the merger instead of individual cursors?
Copy link
Contributor

@mihaibudiu mihaibudiu Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If N is not large, this is not a big issue.
And I suspect in practice it is never large.

return Ok(true);
}
} else {
// TODO: We currently don't handle the case when value_filter is a LastN filter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we make sure we don't forget this? should an issue be filed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file an issue


// Find n'th value below the waterline.
let mut count = 1;
while count < *n && cursor.val_valid() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the amortized cost of this? this looks expensive.

@mihaibudiu
Copy link
Contributor

mihaibudiu commented Jan 6, 2026

Partially fixes #1975
(I think)

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Jan 6, 2026

Fixes #1975 (I think)

Not until we actually apply it to top-k operators.

ryzhyk and others added 3 commits January 6, 2026 10:10
FilteredMergeCursor was implemented as a wrapper around a MergeCursor. In order
to enable more flexible implementations that can do reverse search and
lookahead we modify it to take a full-featured `Cursor` as an argument.

Signed-off-by: Leonid Ryzhyk <[email protected]>
Add support for a new form of GC described in #1975 that preserves the last N
values before the waterline. This can be used to GC the right-hand side of ASOF
joins and top-k operators.

Limitations:
- The new GC mechanism doesn't yet work for PushMerger's. It's not hard to add,
  but since we only use PushMerger's via devtweaks, this doesn't seem like a
  priority

- The current implementation can postpone discarding unused values quite a bit,
  potentially keeping several times more state than necessary (even though the
  state should be bounded): Specifically, it discards up to last the N values in each
  cursor when merging. The resulting batch may end up with more than N values
  below the waterline, which won't be discarded until the next merge operation.
  Moreover, the entire spine can have many more values before
  the waterline stored in other batches that were not part of the merge.

Signed-off-by: Leonid Ryzhyk <[email protected]>
@ryzhyk ryzhyk changed the title [WIP] GC for the right-hand side of ASOF joins. [dbsp, compiler] GC for the right-hand side of ASOF joins. Jan 6, 2026
@ryzhyk ryzhyk marked this pull request as ready for review January 6, 2026 18:29
Copilot AI review requested due to automatic review settings January 6, 2026 18:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds garbage collection support for the right-hand side of ASOF JOIN operations. It introduces a new DBSPIntegrateTraceRetainValuesLastNOperator that retains the last N values below a waterline threshold, which is necessary for ASOF joins where we need to preserve a fixed number of recent values regardless of how far in the past they are.

Key changes:

  • Added DBSPIntegrateTraceRetainValuesLastNOperator for retaining the last N values per key
  • Implemented GroupFilter enum to support both simple and LastN filtering strategies
  • Updated ASOF join implementation to apply GC to both sides of the join
  • Added comprehensive test coverage for the new functionality

Reviewed changes

Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/circuit/operator/DBSPIntegrateTraceRetainValuesLastNOperator.java New operator for retaining last N values with GC
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/circuit/operator/GCOperator.java Added asOperator() method to interface
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/monotonicity/InsertLimiters.java Updated ASOF join to create GC operators for both sides
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/LowerAsof.java Extended to handle LastN operators in ASOF join lowering
crates/dbsp/src/trace/filter.rs New file defining GroupFilter enum for filtering strategies
crates/dbsp/src/trace/cursor.rs Implemented GroupFilterCursor state machine for LastN filtering
crates/dbsp/src/operator/dynamic/trace.rs Added test coverage for integrate_trace_retain_values_last_n
Comments suppressed due to low confidence (2)

sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/LowerAsof.java:1

  • Missing closing parenthesis in comment. The comment should end with )).
    sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/LowerAsof.java:1
  • Missing closing parenthesis in comment. The comment should end with )).

/// Assumes that the predicate is monotonic: once it is satisfied for a value,
/// it is also satisfied for all subsequent values for the same key.
/// Also assumed that the values are ordered in some way, so that the last N
/// values under the cursor ate the ones that need to be preserved.
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'ate' to 'are'.

Suggested change
/// values under the cursor ate the ones that need to be preserved.
/// values under the cursor are the ones that need to be preserved.

Copilot uses AI. Check for mistakes.
@ryzhyk ryzhyk added this pull request to the merge queue Jan 6, 2026
Merged via the queue into main with commit bc7c2b7 Jan 6, 2026
5 of 7 checks passed
@ryzhyk ryzhyk deleted the issue-2010 branch January 6, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DBSP core Related to the core DBSP library SQL compiler Related to the SQL compiler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GC for AsOf joins

3 participants