Skip to content

Spark 3.3: Optimize metadata-only DELETEs#6899

Merged
aokolnychyi merged 6 commits into
apache:masterfrom
aokolnychyi:improve-metadata-delete
Feb 22, 2023
Merged

Spark 3.3: Optimize metadata-only DELETEs#6899
aokolnychyi merged 6 commits into
apache:masterfrom
aokolnychyi:improve-metadata-delete

Conversation

@aokolnychyi
Copy link
Copy Markdown
Contributor

This PR avoids planning a scan to check whether a metadata-only DELETE is possible if the delete condition always matches entire partitions.

@github-actions github-actions Bot added the spark label Feb 22, 2023
@aokolnychyi
Copy link
Copy Markdown
Contributor Author

Comment thread spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java Outdated
Copy link
Copy Markdown
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Thanks @aokolnychyi for the change!

Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the change, Thanks @aokolnychyi !

Comment thread spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java Outdated
Comment thread spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java Outdated
}

return deleteExpr == Expressions.alwaysTrue() || canDeleteUsingMetadata(deleteExpr);
return selectsPartitions(deleteExpr) || canDeleteUsingMetadata(deleteExpr);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] should we still keep deleteExpr == Expressions.alwaysTrue() it can even skip selectPartitions call

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep a simpler return statement in this case as the selects partition call should be fairly cheap for a true literal. Previously, we wanted to avoid planning in some obvious cases. The new check should be instant.

@github-actions github-actions Bot added the API label Feb 22, 2023
Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments but non-blocking, overall looks great to me Anton thanks!

}

public static boolean caseSensitive(SparkSession spark) {
return Boolean.parseBoolean(spark.conf().get("spark.sql.caseSensitive"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to take it in this PR, but I see hardcoded "spark.sql.caseSensitive" in a few places in the spark code, is there a good place to define this constant? I know it's a general spark sql option and not Iceberg specific so SparkSQLProperties is probably not the right place

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to partially address that by introducing this utility call and migrating all other places to use it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see I missed that this is public, then that should be good!

@aokolnychyi aokolnychyi merged commit c0555c8 into apache:master Feb 22, 2023
@aokolnychyi
Copy link
Copy Markdown
Contributor Author

Thanks for reviewing, @flyrain @singhpk234 @amogh-jahagirdar!

aokolnychyi added a commit that referenced this pull request Feb 23, 2023
This change backports PR #6899 into Spark 3.2.
haizhou-zhao pushed a commit to haizhou-zhao/iceberg that referenced this pull request Feb 23, 2023
haizhou-zhao pushed a commit to haizhou-zhao/iceberg that referenced this pull request Feb 23, 2023
krvikash pushed a commit to krvikash/iceberg that referenced this pull request Mar 16, 2023
krvikash pushed a commit to krvikash/iceberg that referenced this pull request Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants