Spark: Add isolation level support to DataFrame.overwrite(filter) API by szehon-ho · Pull Request #4293 · apache/iceberg

szehon-ho · 2022-03-08T19:44:34Z

#2925 adds isolation level support to the core Iceberg API ReplacePartitions, and also exposes it via Spark DataFrame.overwritePartitions() API.

This change is to extend isolation level support to the Spark DataFrame.overwrite(filter) API for symmetry. The underlying core Iceberg API (OverwriteFiles) already supported isolation level validation in this case, so the change is smaller.

One observation, DF.overwrite(filter) will be less aggressive than DF.overwritePartitions() in concurrent validation due to the two different API code paths. OverwriteFiles checks exactly for the file that will be re-written, so it will not throw an exception if another file was deleted in the same partition. This is unlike ReplacePartitions API which throws exception if any file was deleted in the same partition, as it does not keep track of files but rather whole partitions.

aokolnychyi · 2022-03-09T19:28:41Z

Let me take a look. Sorry for the delay.

aokolnychyi · 2022-03-09T21:04:56Z

One observation, DF.overwrite(filter) will be less aggressive than DF.overwritePartitions() in concurrent validation due to the two different API code paths. OverwriteFiles checks exactly for the file that will be re-written, so it will not throw an exception if another file was deleted in the same partition. This is unlike ReplacePartitions API which throws exception if any file was deleted in the same partition, as it does not keep track of files but rather whole partitions.

That's a bug. Let me fix it.

aokolnychyi

This change looks correct to me. I had a few nits.

I'll fix OverwriteFiles and it would be great to submit changes to the dynamic overwrites in a separate PR to limit this one to static overwrites.

… files in same partition

szehon-ho · 2022-03-10T01:46:59Z

@aokolnychyi thanks, will create another pr soon for the refactoring/cleanup of existing code to be consistent with this one.

Added another test for deleted data files after fix of #3581 , works now.

szehon-ho · 2022-03-10T06:47:46Z

Restarting as it looks like a temporary build artifact download failure

aokolnychyi · 2022-03-17T06:38:20Z

Thanks, @szehon-ho! Sorry it took so long to get back to this PR.

github-actions Bot added the spark label Mar 8, 2022

szehon-ho commented Mar 8, 2022

View reviewed changes

Comment thread spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

szehon-ho requested a review from aokolnychyi March 8, 2022 19:46

aokolnychyi reviewed Mar 9, 2022

View reviewed changes

aokolnychyi mentioned this pull request Mar 9, 2022

Core: Fix deleted data files validation in OverwriteFiles #4303

Merged

szehon-ho added 2 commits March 9, 2022 15:45

Spark: Add isolation level support to DataFrame.overwrite(filter) API

f38210e

Move out unrelated refactoring changes, add new test for deleted data…

4f793e7

… files in same partition

szehon-ho force-pushed the overwrite_filter_isolation branch from d4e6bae to 4f793e7 Compare March 10, 2022 01:43

Fix minor typo in test

5c62cda

szehon-ho closed this Mar 10, 2022

szehon-ho reopened this Mar 10, 2022

szehon-ho mentioned this pull request Mar 10, 2022

Spark 3.2: Cleanup DataFrame overwritePartitions isolation level handling #4310

Merged

aokolnychyi approved these changes Mar 17, 2022

View reviewed changes

aokolnychyi merged commit 5b9cd7a into apache:master Mar 17, 2022

szehon-ho mentioned this pull request Apr 13, 2023

Spark - Accept an output-spec-id that allows writing to a desired partition spec #7120

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Add isolation level support to DataFrame.overwrite(filter) API#4293

Spark: Add isolation level support to DataFrame.overwrite(filter) API#4293
aokolnychyi merged 3 commits into
apache:masterfrom
szehon-ho:overwrite_filter_isolation

szehon-ho commented Mar 8, 2022 •

edited

Loading

Uh oh!

Uh oh!

aokolnychyi commented Mar 9, 2022

Uh oh!

aokolnychyi commented Mar 9, 2022

Uh oh!

aokolnychyi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szehon-ho commented Mar 10, 2022

Uh oh!

szehon-ho commented Mar 10, 2022

Uh oh!

aokolnychyi commented Mar 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

szehon-ho commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aokolnychyi commented Mar 9, 2022

Uh oh!

aokolnychyi commented Mar 9, 2022

Uh oh!

aokolnychyi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szehon-ho commented Mar 10, 2022

Uh oh!

szehon-ho commented Mar 10, 2022

Uh oh!

aokolnychyi commented Mar 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szehon-ho commented Mar 8, 2022 •

edited

Loading