-
Notifications
You must be signed in to change notification settings - Fork 92
[SQL] Rework the implementation of SAFE_CAST #5407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Mihai Budiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request reworks the SAFE_CAST implementation in the SQL compiler by transitioning from a boolean safe parameter to an enum-based CastType with three variants (Unsafe, Safe, SafeInner). This enables more sophisticated error handling for structured data types like arrays and maps.
Changes:
- Refactored
DBSPCastExpression.safefrom boolean toCastTypeenum with Unsafe, Safe, and SafeInner variants - Added new
DBSPTypeSqlResulttype to represent sqllib'sSqlResult[T]type - Added
DBSPResultQuestionExpressionfor handling Result type unwrapping with?operator - Introduced
ARRAY_CONVERT_SAFEopcode and corresponding Rust runtime functions for safe array conversions - Updated all cast call sites across the codebase (~150+ locations) to use
CastType.Unsafeexplicitly - Added regression tests for various SAFE_CAST scenarios with structured types
- Updated UUID parsing to use Result types instead of panicking
Reviewed changes
Copilot reviewed 52 out of 52 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| DBSPCastExpression.java | Core refactoring: boolean safe → CastType enum, added enforcement checks |
| DBSPTypeSqlResult.java | New type representing sqllib's SqlResult[T] for safe cast results |
| DBSPResultQuestionExpression.java | New expression type for Result unwrapping with ? operator |
| ExpandCasts.java | Updated to handle SafeInner casts, added array_map_safe support |
| CreateRuntimeErrorWrappers.java | Modified to skip wrapping for safe casts and array_convert_safe |
| ExpressionCompiler.java | Added duplicate MAP key detection, updated cast calls |
| CircuitOptimizer.java | Removed two ExpandCasts passes from optimization pipeline |
| ToRustInnerVisitor.java | Added special handling for SqlResult types in cast code generation |
| Regression1Tests.java | Added comprehensive tests for SAFE_CAST with structured types |
| array.rs | Added array_map_safe functions for safe element-wise conversions |
| uuid.rs | Changed from panicking to Result-based parsing |
| casts.rs | Updated UUID cast to use try_from_ref |
Comments suppressed due to low confidence (1)
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/ir/expression/DBSPZSetExpression.java:222
- The removal of the totalWeight method appears unrelated to the SAFE_CAST refactoring. If this method is truly unused, its removal should be in a separate cleanup commit or PR with proper verification that it's not needed. Otherwise, this could indicate incomplete changes.
| "runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false), | ||
| new Func(SqlStdOperatorTable.REPLACE, "REPLACE", SqlLibrary.STANDARD, "string#replace", | ||
| """ | ||
| runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py | ||
| """, true), | ||
| "runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false), | ||
| // new Func(SqlStdOperatorTable.CONVERT, "CONVERT", SqlLibrary.STANDARD, "casts#casts-and-data-type-conversions", FunctionDocumentation.NO_FILE, false), | ||
| new Func(SqlStdOperatorTable.TRANSLATE, "TRANSLATE", SqlLibrary.STANDARD, "", FunctionDocumentation.NO_FILE, false), | ||
|
|
||
| new Func(SqlStdOperatorTable.OVERLAY, "OVERLAY", SqlLibrary.STANDARD, "string#overlay,binary#overlay", | ||
| """ | ||
| runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py | ||
| """, true), | ||
| "runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false), | ||
| new Func(SqlStdOperatorTable.TRIM, "TRIM", SqlLibrary.STANDARD, "string#trim", | ||
| """ | ||
| runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py | ||
| """, true), | ||
| "runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false), | ||
| new Func(SqlStdOperatorTable.POSITION, "POSITION", SqlLibrary.STANDARD, "string#position", | ||
| """ | ||
| runtime_aggtest/illarg_tests/test_{arr_map_type_fn,str_bin_type_fn,str_unicode_fn}.py | ||
| """, true), | ||
| "runtime_aggtest/illarg_tests/test_{arr_map_type_fn,str_bin_type_fn,str_unicode_fn}.py", false), | ||
| new Func(SqlStdOperatorTable.CHAR_LENGTH, "CHAR_LENGTH", SqlLibrary.STANDARD, "string#char_length", | ||
| """ | ||
| runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py | ||
| """, true), | ||
| "runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false), | ||
| new Func(SqlStdOperatorTable.OCTET_LENGTH, "OCTET_LENGTH", SqlLibrary.STANDARD, "binary#octet_length", | ||
| """ | ||
| runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py | ||
| """, true), | ||
| "runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false), |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These documentation string formatting changes appear unintended. Multi-line text blocks have been changed to single-line strings. This formatting change is inconsistent with the PR's stated purpose of reworking SAFE_CAST implementation. These changes should either be reverted or moved to a separate documentation cleanup PR.
| import org.dbsp.sqlCompiler.ir.expression.DBSPMapExpression; | ||
| import org.dbsp.sqlCompiler.ir.expression.literal.DBSPLiteral; | ||
| import org.dbsp.sqlCompiler.ir.expression.literal.DBSPStringLiteral; |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of the DBSPLiteral import and the comment regarding "N" variable appear unrelated to the SAFE_CAST refactoring. These should be in a separate cleanup commit or verified as intentional changes related to the refactoring.
| public DBSPCompiler compileQuery(String statements, String query) { | ||
| DBSPCompiler compiler = this.testCompiler(); | ||
| compiler.options.languageOptions.optimizationLevel = 0; | ||
| compiler.options.ioOptions.quiet = false; |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the quiet option to false in tests will produce extra output during test runs. This change appears unintended and should be removed unless there's a specific reason for enabling verbose output in this test.
| @Test | ||
| public void testDuplicateKeys() { | ||
| var compiler = this.compileQuery("", "CREATE VIEW V AS SELECT MAP['hi', 1, 'hi', 2]"); | ||
| System.out.println(compiler.messages); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test contains a System.out.println which should be removed. Test output should use the test framework's logging mechanisms or assertions, not direct console output.
| if (typeName == SqlTypeName.TIMESTAMP) | ||
| return DBSPTypeTimestamp.PRECISION; | ||
| if (typeName == SqlTypeName.TIME) | ||
| else if (typeName == SqlTypeName.TIME) |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change from if to else if appears unrelated to the SAFE_CAST refactoring. While the logic is equivalent (both conditions are mutually exclusive), this style change should be in a separate cleanup commit unless it's required for the SAFE_CAST functionality.
| else if (typeName == SqlTypeName.TIME) | |
| if (typeName == SqlTypeName.TIME) |
| this.add(new AdjustSqlIndex(compiler).circuitRewriter(true)); | ||
| this.add(new OptimizeWithGraph(compiler, g -> new BalancedJoins(compiler, g), 1)); | ||
| this.add(new OptimizeWithGraph(compiler, g -> new ChainVisitor(compiler, g))); | ||
| this.add(new ImplementChains(compiler)); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of these two ExpandCasts passes in the optimizer pipeline should be carefully verified. The comment "Lowering may surface additional casts that need to be expanded" suggests these were important. If this removal is intentional due to the SAFE_CAST refactoring, it should be explicitly documented in the PR description or code comments explaining why these passes are no longer needed.
| this.add(new ImplementChains(compiler)); | |
| this.add(new ImplementChains(compiler)); | |
| // NOTE: These final ExpandCasts passes are intentionally kept after lowering. | |
| // LowerAsof/LowerCircuitVisitor can surface additional SAFE_CASTs and other casts | |
| // that must be expanded before we finalize joins and views. Earlier ExpandCasts | |
| // passes in the pipeline were simplified as part of the SAFE_CAST refactoring; | |
| // if these remaining passes are modified or removed, re-verify that lowering | |
| // does not introduce casts that need to be expanded at this stage. |
| DBSPExpression index; | ||
| if (sourceType.is(DBSPTypeTupleBase.class)) { | ||
| index = source.field(i); | ||
| index = source.field(i).simplify(); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a simplify() call here changes the behavior. This could potentially affect performance or correctness. Please verify this change is intentional and necessary for the SAFE_CAST implementation, and consider adding a comment explaining why simplification is now needed at this point.
| index = source.field(i).simplify(); | |
| index = source.field(i); |
This is not ready, I will break it into two PRs. I will mark it as draft for now.