Spark: Add isolation level support to DataFrame.overwrite(filter) API#4293
Conversation
|
Let me take a look. Sorry for the delay. |
That's a bug. Let me fix it. |
aokolnychyi
left a comment
There was a problem hiding this comment.
This change looks correct to me. I had a few nits.
I'll fix OverwriteFiles and it would be great to submit changes to the dynamic overwrites in a separate PR to limit this one to static overwrites.
d4e6bae to
4f793e7
Compare
|
@aokolnychyi thanks, will create another pr soon for the refactoring/cleanup of existing code to be consistent with this one. Added another test for deleted data files after fix of #3581 , works now. |
|
Restarting as it looks like a temporary build artifact download failure |
|
Thanks, @szehon-ho! Sorry it took so long to get back to this PR. |
#2925 adds isolation level support to the core Iceberg API ReplacePartitions, and also exposes it via Spark DataFrame.overwritePartitions() API.
This change is to extend isolation level support to the Spark DataFrame.overwrite(filter) API for symmetry. The underlying core Iceberg API (OverwriteFiles) already supported isolation level validation in this case, so the change is smaller.
One observation, DF.overwrite(filter) will be less aggressive than DF.overwritePartitions() in concurrent validation due to the two different API code paths. OverwriteFiles checks exactly for the file that will be re-written, so it will not throw an exception if another file was deleted in the same partition. This is unlike ReplacePartitions API which throws exception if any file was deleted in the same partition, as it does not keep track of files but rather whole partitions.