-
Notifications
You must be signed in to change notification settings - Fork 92
Adhoc consistency #5396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adhoc consistency #5396
Conversation
It was unused. Signed-off-by: Ben Pfaff <[email protected]>
The cast wasn't necessary in AdHocQueryExecution::children(). The clone written inline in AdHocQueryExecution::with_new_children() can be more easily written as a real clone. Signed-off-by: Ben Pfaff <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes consistency issues with ad-hoc queries by ensuring that queries execute against a consistent snapshot of table data. Previously, the snapshot was accessed directly from a shared mutable reference, which could lead to inconsistent reads during query execution.
Key changes:
- Refactored snapshot management to use immutable snapshots via
Arc<BTreeMap<...>>instead ofArc<TokioMutex<BTreeMap<...>>> - Modified ad-hoc query execution to capture and use a consistent snapshot through DataFusion's session state
- Centralized SQL execution logic into a new
execute_sqlhelper function
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| crates/adapters/src/server.rs | Updated query handler to pass Controller instead of SessionContext to support consistent snapshots |
| crates/adapters/src/controller.rs | Added consistent_snapshot() method and refactored snapshot management to use immutable Arc-wrapped snapshots |
| crates/adapters/src/adhoc/table.rs | Modified AdHocTable to retrieve snapshot from session state instead of storing it as a field; simplified execution plan cloning |
| crates/adapters/src/adhoc.rs | Added execute_sql helper that captures consistent snapshot in session state before query execution |
| async fn scan( | ||
| &self, | ||
| _state: &dyn Session, | ||
| state: &dyn Session, |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter was renamed from _state to state to indicate it's now being used. However, consider a more descriptive name like session or df_session to better reflect that it's a DataFusion session containing the consistent snapshot.
| projection: Option<&Vec<usize>>, | ||
| filters: &[Expr], | ||
| limit: Option<usize>, | ||
| _limit: Option<usize>, |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit parameter is now prefixed with underscore since it's no longer used, but the parameter remains in the function signature. Consider whether this parameter should be removed entirely from the trait implementation if it's not being utilized.
| Ok(Arc::new(Self { | ||
| children, | ||
| ..Arc::unwrap_or_clone(self) | ||
| })) | ||
| } | ||
|
|
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Arc::unwrap_or_clone with struct spread syntax assumes all fields except children should be cloned from the original. This is fragile if new fields are added to the struct, as they would be automatically included without explicit consideration. Consider explicitly listing all fields that should be preserved to make future modifications safer.
| Ok(Arc::new(Self { | |
| children, | |
| ..Arc::unwrap_or_clone(self) | |
| })) | |
| } | |
| let mut base = Arc::unwrap_or_clone(self); | |
| base.children = children; | |
| Ok(Arc::new(base)) | |
| } |
gz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't say I still understand this well since it's been a few months but at least didn't see anything wrong
Until now, each table scan during an ad-hoc query separately locked the table of snapshots. The table of snapshots could change from one scan to the next. This meant that ad-hoc queries that involved multiple tables, or that scanned a single table multiple times, could work with inconsistent data. This fixes the problem. Signed-off-by: Ben Pfaff <[email protected]>
This fixes the consistency of ad-hoc queries. It was probably difficult for the inconsistency to be visible, but it's still better to fix it.