-
Notifications
You must be signed in to change notification settings - Fork 390
[Hive] Support TLP oracle #1201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks! Will try to review as soon as possible. |
|
It seems the CI cannot resolve the Maven dependency:
|
mrigger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great! Sorry for the delay in reviewing, I was traveling. I added some comments to improve the PR, but I believe we can merge the change soon.
|
|
||
| errors.add("cannot recognize input near"); | ||
| errors.add("Argument type mismatch"); | ||
| errors.add("Error while compiling statement"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an expected error or could it be an actual bug (e.g., like an internal error)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These errors are expected since I use an untyped expression generator for Hive, and syntactically invalid statements will cause these errors so they're ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick follow up question on this: for an untyped expression generator, I would expected semantically invalid statements, but not syntactically invalid ones. Would it be possible to show an example where this causes such an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for responding. Yes, Agreed! Does the "Invalid Constraint syntax" below also relate to an actual semantic error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks for mentioning, "Invalid Constraint syntax" also relates to mismatched types, but I believe it's not needed anymore since it will be covered by the above HiveErrors. It's removed in the latest commit.
mrigger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-up changes and sorry for the delayed response. I have another follow-up questions on the check constraints. Besides this, the PR looks ready to be merged.
|
Just to double-check: this is a contribution independent from the GSoC program, right? Just want to be sure that we do not overlook your proposal in case you submitted one for GSoC. |
|
Right, this PR is not part of the GSoC program. It’s just pure interest :) |
This problem is a tricky one. It seems related with JDK version but I haven't encountered it on my PC using the same JDK 11. I'm trying to solve it by upgrading Hive dependency versions (3.1.3 -> 4.0.1) because this seems the only change are related with JDK tools. |
mrigger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot again for your contribution. The code looks good to me. The CI tests also run successfully.
SQLancer seems to be finding bugs in Apache Hive. Did you already report any of these? If so, would it be possible to disable the bug-inducing patterns for now (see https://github.com/sqlancer/sqlancer/blob/main/CONTRIBUTING.md#unfixed-bugs)? This would prevent the CI test from exiting with a non-zero code.
Here is one example of a bug-inducing test reported by SQLancer:
CREATE TABLE database4.t0 (c0 BOOLEAN NOT NULL);
INSERT INTO t0 VALUES (true);
SELECT * FROM t0; -- cardinality: 1
SELECT * FROM t0 WHERE (t0.c0 IN ((((('q?[Tw5')=(t0.c0)))<=(((t0.c0)=(t0.c0)))))) UNION ALL SELECT * FROM t0 WHERE (NOT (t0.c0 IN ((((('q?[Tw5')=(t0.c0)))<=(((t0.c0)=(t0.c0))))))) UNION ALL SELECT * FROM t0 WHERE (((t0.c0 IN ((((('q?[Tw5')=(t0.c0)))<=(((t0.c0)=(t0.c0))))))) IS NULL); -- cardinality: 0;|
Merging this first as I think having support for Hive as well as the CI tests is already very valuable. Thanks a lot for contributing the code! As a next step, we should make sure to make the CI tests green so that we will not accidentally break the implementation in the future. |
|
@mrigger Sorry for the late reply. And really thank you for merging this! I plan to continue supporting more oracles for Hive. About the bugs it found, I haven't reported them yet but will soon do so, and I'm also going to disable these patterns in the code to avoid generating similar bugs. |


This PR creates a basic framework for Hive SQL generation and uses TLP oracle to test Hive.