Skip to content

Conversation

@mihaibudiu
Copy link
Contributor

This is not ready, I will break it into two PRs. I will mark it as draft for now.

Copilot AI review requested due to automatic review settings January 10, 2026 18:28
@mihaibudiu mihaibudiu marked this pull request as draft January 10, 2026 18:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request reworks the SAFE_CAST implementation in the SQL compiler by transitioning from a boolean safe parameter to an enum-based CastType with three variants (Unsafe, Safe, SafeInner). This enables more sophisticated error handling for structured data types like arrays and maps.

Changes:

  • Refactored DBSPCastExpression.safe from boolean to CastType enum with Unsafe, Safe, and SafeInner variants
  • Added new DBSPTypeSqlResult type to represent sqllib's SqlResult[T] type
  • Added DBSPResultQuestionExpression for handling Result type unwrapping with ? operator
  • Introduced ARRAY_CONVERT_SAFE opcode and corresponding Rust runtime functions for safe array conversions
  • Updated all cast call sites across the codebase (~150+ locations) to use CastType.Unsafe explicitly
  • Added regression tests for various SAFE_CAST scenarios with structured types
  • Updated UUID parsing to use Result types instead of panicking

Reviewed changes

Copilot reviewed 52 out of 52 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
DBSPCastExpression.java Core refactoring: boolean safe → CastType enum, added enforcement checks
DBSPTypeSqlResult.java New type representing sqllib's SqlResult[T] for safe cast results
DBSPResultQuestionExpression.java New expression type for Result unwrapping with ? operator
ExpandCasts.java Updated to handle SafeInner casts, added array_map_safe support
CreateRuntimeErrorWrappers.java Modified to skip wrapping for safe casts and array_convert_safe
ExpressionCompiler.java Added duplicate MAP key detection, updated cast calls
CircuitOptimizer.java Removed two ExpandCasts passes from optimization pipeline
ToRustInnerVisitor.java Added special handling for SqlResult types in cast code generation
Regression1Tests.java Added comprehensive tests for SAFE_CAST with structured types
array.rs Added array_map_safe functions for safe element-wise conversions
uuid.rs Changed from panicking to Result-based parsing
casts.rs Updated UUID cast to use try_from_ref
Comments suppressed due to low confidence (1)

sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/ir/expression/DBSPZSetExpression.java:222

  • The removal of the totalWeight method appears unrelated to the SAFE_CAST refactoring. If this method is truly unused, its removal should be in a separate cleanup commit or PR with proper verification that it's not needed. Otherwise, this could indicate incomplete changes.

Comment on lines +225 to +240
"runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false),
new Func(SqlStdOperatorTable.REPLACE, "REPLACE", SqlLibrary.STANDARD, "string#replace",
"""
runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py
""", true),
"runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false),
// new Func(SqlStdOperatorTable.CONVERT, "CONVERT", SqlLibrary.STANDARD, "casts#casts-and-data-type-conversions", FunctionDocumentation.NO_FILE, false),
new Func(SqlStdOperatorTable.TRANSLATE, "TRANSLATE", SqlLibrary.STANDARD, "", FunctionDocumentation.NO_FILE, false),

new Func(SqlStdOperatorTable.OVERLAY, "OVERLAY", SqlLibrary.STANDARD, "string#overlay,binary#overlay",
"""
runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py
""", true),
"runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false),
new Func(SqlStdOperatorTable.TRIM, "TRIM", SqlLibrary.STANDARD, "string#trim",
"""
runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py
""", true),
"runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false),
new Func(SqlStdOperatorTable.POSITION, "POSITION", SqlLibrary.STANDARD, "string#position",
"""
runtime_aggtest/illarg_tests/test_{arr_map_type_fn,str_bin_type_fn,str_unicode_fn}.py
""", true),
"runtime_aggtest/illarg_tests/test_{arr_map_type_fn,str_bin_type_fn,str_unicode_fn}.py", false),
new Func(SqlStdOperatorTable.CHAR_LENGTH, "CHAR_LENGTH", SqlLibrary.STANDARD, "string#char_length",
"""
runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py
""", true),
"runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false),
new Func(SqlStdOperatorTable.OCTET_LENGTH, "OCTET_LENGTH", SqlLibrary.STANDARD, "binary#octet_length",
"""
runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py
""", true),
"runtime_aggtest/illarg_tests/test_{str_bin_type_fn,str_unicode_fn}.py", false),
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These documentation string formatting changes appear unintended. Multi-line text blocks have been changed to single-line strings. This formatting change is inconsistent with the PR's stated purpose of reworking SAFE_CAST implementation. These changes should either be reverted or moved to a separate documentation cleanup PR.

Copilot uses AI. Check for mistakes.
Comment on lines 13 to 14
import org.dbsp.sqlCompiler.ir.expression.DBSPMapExpression;
import org.dbsp.sqlCompiler.ir.expression.literal.DBSPLiteral;
import org.dbsp.sqlCompiler.ir.expression.literal.DBSPStringLiteral;
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the DBSPLiteral import and the comment regarding "N" variable appear unrelated to the SAFE_CAST refactoring. These should be in a separate cleanup commit or verified as intentional changes related to the refactoring.

Copilot uses AI. Check for mistakes.
public DBSPCompiler compileQuery(String statements, String query) {
DBSPCompiler compiler = this.testCompiler();
compiler.options.languageOptions.optimizationLevel = 0;
compiler.options.ioOptions.quiet = false;
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the quiet option to false in tests will produce extra output during test runs. This change appears unintended and should be removed unless there's a specific reason for enabling verbose output in this test.

Copilot uses AI. Check for mistakes.
@Test
public void testDuplicateKeys() {
var compiler = this.compileQuery("", "CREATE VIEW V AS SELECT MAP['hi', 1, 'hi', 2]");
System.out.println(compiler.messages);
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test contains a System.out.println which should be removed. Test output should use the test framework's logging mechanisms or assertions, not direct console output.

Copilot uses AI. Check for mistakes.
if (typeName == SqlTypeName.TIMESTAMP)
return DBSPTypeTimestamp.PRECISION;
if (typeName == SqlTypeName.TIME)
else if (typeName == SqlTypeName.TIME)
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change from if to else if appears unrelated to the SAFE_CAST refactoring. While the logic is equivalent (both conditions are mutually exclusive), this style change should be in a separate cleanup commit unless it's required for the SAFE_CAST functionality.

Suggested change
else if (typeName == SqlTypeName.TIME)
if (typeName == SqlTypeName.TIME)

Copilot uses AI. Check for mistakes.
this.add(new AdjustSqlIndex(compiler).circuitRewriter(true));
this.add(new OptimizeWithGraph(compiler, g -> new BalancedJoins(compiler, g), 1));
this.add(new OptimizeWithGraph(compiler, g -> new ChainVisitor(compiler, g)));
this.add(new ImplementChains(compiler));
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of these two ExpandCasts passes in the optimizer pipeline should be carefully verified. The comment "Lowering may surface additional casts that need to be expanded" suggests these were important. If this removal is intentional due to the SAFE_CAST refactoring, it should be explicitly documented in the PR description or code comments explaining why these passes are no longer needed.

Suggested change
this.add(new ImplementChains(compiler));
this.add(new ImplementChains(compiler));
// NOTE: These final ExpandCasts passes are intentionally kept after lowering.
// LowerAsof/LowerCircuitVisitor can surface additional SAFE_CASTs and other casts
// that must be expanded before we finalize joins and views. Earlier ExpandCasts
// passes in the pipeline were simplified as part of the SAFE_CAST refactoring;
// if these remaining passes are modified or removed, re-verify that lowering
// does not introduce casts that need to be expanded at this stage.

Copilot uses AI. Check for mistakes.
DBSPExpression index;
if (sourceType.is(DBSPTypeTupleBase.class)) {
index = source.field(i);
index = source.field(i).simplify();
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a simplify() call here changes the behavior. This could potentially affect performance or correctness. Please verify this change is intentional and necessary for the SAFE_CAST implementation, and consider adding a comment explaining why simplification is now needed at this point.

Suggested change
index = source.field(i).simplify();
index = source.field(i);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants