Spark 3.3: Optimize metadata-only DELETEs#6899
Conversation
flyrain
left a comment
There was a problem hiding this comment.
+1 Thanks @aokolnychyi for the change!
singhpk234
left a comment
There was a problem hiding this comment.
+1 for the change, Thanks @aokolnychyi !
| } | ||
|
|
||
| return deleteExpr == Expressions.alwaysTrue() || canDeleteUsingMetadata(deleteExpr); | ||
| return selectsPartitions(deleteExpr) || canDeleteUsingMetadata(deleteExpr); |
There was a problem hiding this comment.
[minor] should we still keep deleteExpr == Expressions.alwaysTrue() it can even skip selectPartitions call
There was a problem hiding this comment.
I'd keep a simpler return statement in this case as the selects partition call should be fairly cheap for a true literal. Previously, we wanted to avoid planning in some obvious cases. The new check should be instant.
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Some comments but non-blocking, overall looks great to me Anton thanks!
| } | ||
|
|
||
| public static boolean caseSensitive(SparkSession spark) { | ||
| return Boolean.parseBoolean(spark.conf().get("spark.sql.caseSensitive")); |
There was a problem hiding this comment.
We don't need to take it in this PR, but I see hardcoded "spark.sql.caseSensitive" in a few places in the spark code, is there a good place to define this constant? I know it's a general spark sql option and not Iceberg specific so SparkSQLProperties is probably not the right place
There was a problem hiding this comment.
I tried to partially address that by introducing this utility call and migrating all other places to use it.
There was a problem hiding this comment.
Ah I see I missed that this is public, then that should be good!
|
Thanks for reviewing, @flyrain @singhpk234 @amogh-jahagirdar! |
This change backports PR #6899 into Spark 3.2.
This change backports PR apache#6899 into Spark 3.2.
This change backports PR apache#6899 into Spark 3.2.
This PR avoids planning a scan to check whether a metadata-only DELETE is possible if the delete condition always matches entire partitions.