Skip to content

Conversation

@liyxiris
Copy link
Contributor

@liyxiris liyxiris commented Apr 2, 2025

This PR creates a basic framework for Hive SQL generation and uses TLP oracle to test Hive.

  • Implements JDBC connection to Hive cluster.
  • Implements creating Hive tables with basic column types, i.e. BOOLEAN, INT, DOUBLE, STRING.
  • Implements some Hive table/column constraints, e.g. PRIMARY_KEY_DISABLE, CHECK, NOT_NULL, etc.
  • Implements common unary, binary and other expressions.
  • Implements row insertion.
  • Implement SELECT WHERE clause for TLP. (joins and aggregations are on my TODO list)

@mrigger
Copy link
Contributor

mrigger commented Apr 3, 2025

Thanks! Will try to review as soon as possible.

@mrigger
Copy link
Contributor

mrigger commented Apr 8, 2025

It seems the CI cannot resolve the Maven dependency:

Error: Failed to execute goal on project sqlancer: Could not resolve dependencies for project com.sqlancer:sqlancer:jar:2.0.0
Error: dependency: jdk.tools:jdk.tools:jar:1.6

Copy link
Contributor

@mrigger mrigger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great! Sorry for the delay in reviewing, I was traveling. I added some comments to improve the PR, but I believe we can merge the change soon.


errors.add("cannot recognize input near");
errors.add("Argument type mismatch");
errors.add("Error while compiling statement");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an expected error or could it be an actual bug (e.g., like an internal error)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors are expected since I use an untyped expression generator for Hive, and syntactically invalid statements will cause these errors so they're ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick follow up question on this: for an untyped expression generator, I would expected semantically invalid statements, but not syntactically invalid ones. Would it be possible to show an example where this causes such an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type mismatch wrong argument Those are error details. They are showed as Hive SemanticException but actually are all caused by mismatched types. So I believe maybe they are expected errors when using untyped expression generator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for responding. Yes, Agreed! Does the "Invalid Constraint syntax" below also relate to an actual semantic error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks for mentioning, "Invalid Constraint syntax" also relates to mismatched types, but I believe it's not needed anymore since it will be covered by the above HiveErrors. It's removed in the latest commit.

Copy link
Contributor

@mrigger mrigger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up changes and sorry for the delayed response. I have another follow-up questions on the check constraints. Besides this, the PR looks ready to be merged.

@mrigger
Copy link
Contributor

mrigger commented Apr 13, 2025

Just to double-check: this is a contribution independent from the GSoC program, right? Just want to be sure that we do not overlook your proposal in case you submitted one for GSoC.

@liyxiris
Copy link
Contributor Author

Right, this PR is not part of the GSoC program. It’s just pure interest ​​:)

@liyxiris
Copy link
Contributor Author

It seems the CI cannot resolve the Maven dependency:

Error: Failed to execute goal on project sqlancer: Could not resolve dependencies for project com.sqlancer:sqlancer:jar:2.0.0
Error: dependency: jdk.tools:jdk.tools:jar:1.6

This problem is a tricky one. It seems related with JDK version but I haven't encountered it on my PC using the same JDK 11. I'm trying to solve it by upgrading Hive dependency versions (3.1.3 -> 4.0.1) because this seems the only change are related with JDK tools.

Copy link
Contributor

@mrigger mrigger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot again for your contribution. The code looks good to me. The CI tests also run successfully.

SQLancer seems to be finding bugs in Apache Hive. Did you already report any of these? If so, would it be possible to disable the bug-inducing patterns for now (see https://github.com/sqlancer/sqlancer/blob/main/CONTRIBUTING.md#unfixed-bugs)? This would prevent the CI test from exiting with a non-zero code.

Here is one example of a bug-inducing test reported by SQLancer:

CREATE TABLE database4.t0 (c0 BOOLEAN NOT NULL);
INSERT INTO t0 VALUES (true);
SELECT * FROM t0; -- cardinality: 1
SELECT * FROM t0 WHERE (t0.c0 IN ((((('q?[Tw5')=(t0.c0)))<=(((t0.c0)=(t0.c0)))))) UNION ALL SELECT * FROM t0 WHERE (NOT (t0.c0 IN ((((('q?[Tw5')=(t0.c0)))<=(((t0.c0)=(t0.c0))))))) UNION ALL SELECT * FROM t0 WHERE (((t0.c0 IN ((((('q?[Tw5')=(t0.c0)))<=(((t0.c0)=(t0.c0))))))) IS NULL); -- cardinality: 0;

@mrigger
Copy link
Contributor

mrigger commented Apr 21, 2025

Merging this first as I think having support for Hive as well as the CI tests is already very valuable. Thanks a lot for contributing the code!

As a next step, we should make sure to make the CI tests green so that we will not accidentally break the implementation in the future.

@mrigger mrigger merged commit 3863169 into sqlancer:main Apr 21, 2025
17 of 26 checks passed
@liyxiris
Copy link
Contributor Author

@mrigger Sorry for the late reply. And really thank you for merging this! I plan to continue supporting more oracles for Hive.

About the bugs it found, I haven't reported them yet but will soon do so, and I'm also going to disable these patterns in the code to avoid generating similar bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants