Parquet: Should not throw exception for filter contains non-reference term#8243
Conversation
|
|
||
| @Override | ||
| public <T> Boolean handleNonReference(Bound<T> term) { | ||
| return ROWS_MIGHT_MATCH; |
There was a problem hiding this comment.
When the column is not referenced, then the column is missing, right? How could it then match?
There was a problem hiding this comment.
I think here the NonReference should be it is not a column reference term. It can be a transform term, for example equal(year(ts), 10).
There was a problem hiding this comment.
@Fokko, I think at this point the columns are bound and transforms have already been removed if they can be (e.g. if they can be translated to a normal predicate on a partition field). This is the case where the transform is left over because there is no corresponding partition field. We could implement future optimizations to run the filter, but returning all rows is good enough for now.
| boolean shouldRead = | ||
| new ParquetMetricsRowGroupFilter(SCHEMA, equal(truncate("required", 2), "some_value"), true) | ||
| .shouldRead(parquetSchema, rowGroupMetadata); | ||
| Assumptions.assumeThat(shouldRead) |
There was a problem hiding this comment.
I think you want to use an assertion rather than an assumption? If you use an assumption the test will be skipped if it fails, instead of failing.
|
Thanks, @ConeyLiu! |
The parquet filter evaluator should not throw exceptions for those filters containing non-reference terms. For example, the filter evaluator apply on
year(ts) = 20will throwValidationException;