-
Notifications
You must be signed in to change notification settings - Fork 92
[SQL] Do not preserve source positions for internally-generated SQL #5308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces the distinction between internal and external source positions in the SQL compiler to prevent internally-generated SQL from polluting profiler views with incorrect source code origins. The compiler now marks source positions as "internal" for SQL code it generates itself (such as error view definitions and UDF implementations), ensuring only user-written code positions are propagated externally.
Key Changes:
- Introduced
ExtendedSqlParserPosto track whether source positions are internal or external - Fixed multiple Calcite SQL node classes to properly implement visitor APIs by overriding
createCall()andgetOperandList()methods - Updated
SourcePositionRangeto filter out internal positions when requested
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| metadataTests-generateDFRecursive.json | Test output showing empty positions array for internal ERROR_VIEW |
| metadataTests-generateDF.json | Test output showing empty positions array for internal ERROR_VIEW |
| CreateTypeStatement.java | Import path change to use custom SqlCreateType |
| SqlViewColumnDeclaration.java | Fixed getOperandList() to include all operands and added createCall() override |
| SqlRemove.java | Fixed getOperandList() to include all three operands |
| SqlPrimaryKey.java | New class replacing Calcite's SqlKeyConstraint with proper visitor support |
| SqlLateness.java | Removed empty TODO validation method |
| SqlForeignKey.java | Added createCall() override for proper visitor support |
| SqlExtendedColumnDeclaration.java | Fixed getOperandList() to include all operands, added createCall() override and constructor |
| SqlDropView.java | New class with proper createCall() override |
| SqlDropTable.java | New class with proper createCall() override |
| SqlDropObject.java | New base class for drop operations |
| SqlDeclareView.java | Added createCall() override for proper visitor support |
| SqlCreateView.java | Added createCall() override, clone() method, and fixed getOperandList() |
| SqlCreateType.java | New class replacing Calcite's version with proper visitor support |
| SqlCreateTable.java | Added createCall() override, clone() method, and fixed getOperandList() |
| SqlCreateIndex.java | Added createCall() override for proper visitor support |
| SqlCreateFunctionDeclaration.java | Added createCall() override and fixed getOperandList() |
| SqlCreateAggregate.java | Added createCall() override and fixed getOperandList() |
| SqlAttributeDefinition.java | New class replacing Calcite's version with proper visitor support |
| SqlToRelCompiler.java | Added ReplacePositions visitor to mark internal positions and updated parsing flow |
| ExtendedSqlParserPos.java | New class extending SqlParserPos to track internal/external distinction |
| CalciteToDBSPCompiler.java | Import path changes to use custom SQL classes |
| SourcePositionRange.java | Added logic to skip internal positions when requested |
| ddl.ftl | Updated parser template to use custom classes instead of Calcite's SqlDdlNodes |
| config.fmpp | Updated imports to include custom SQL classes |
78aa027 to
c919375
Compare
|
This needs apache/calcite#4696 to be merged first |
Signed-off-by: Mihai Budiu <[email protected]>
c919375 to
94f42f9
Compare
Signed-off-by: Mihai Budiu <[email protected]>
94f42f9 to
b4706df
Compare
The compiler generates some dataflow graph fragments by generating SQL strings and recursively invoking itself on these SQL program. These dataflow graph fragments are then stitched into the final dataflow graph.
For example, the error view is obtained by the following declaration:
The
FELDERA_ERROR_TABLEis deleted later and the view is hooked to the rightful sources.SQL User-defined functions are implemented in a similar way.
An unpleasant side-effect of this process is that the source code propagation mechanism of the compiler records source positions for these fictitious programs, which then surface while browsing the profiles (where they point to incorrect source code origins, since these fictitious programs are not recorded in the actual program sources).
This PR distinguished two kinds of source positions: internal and external. Only the external ones are propagated to the outside world.
Unfortunately this PR is much larger than I expected, because I realized that many Calcite classes incorrectly implement the APIs used by the Calcite visitors, and I am using a visitor to rewrite the position information. As a consequence I had to clone and fix all the Calcite classes in question. I may file an issue for Calcite as well, but no one seems to have noticed this problem so far.
The final two resource files in this PR show indeed that the source positions are no longer present for the
ERROR_VIEW.