Skip to content

Core, Puffin: Add DV file writer#11476

Merged
aokolnychyi merged 3 commits into
apache:mainfrom
aokolnychyi:dv-writer
Nov 6, 2024
Merged

Core, Puffin: Add DV file writer#11476
aokolnychyi merged 3 commits into
apache:mainfrom
aokolnychyi:dv-writer

Conversation

@aokolnychyi
Copy link
Copy Markdown
Contributor

This PR adds a base DV file writer with basic tests. More tests to come once we support DV reads.

Benchmark                                                           (deletedRowsRatio)  (referencedDataFileCount)  Mode  Cnt           Score         Error   Units
DVWriterBenchmark.dv                                                              0.01                          5    ss   10           0.062 ±       0.003    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                        0.01                          5    ss   10           0.219 ±       0.017    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                   0.01                          5    ss   10           0.146 ±       0.006    s/op

DVWriterBenchmark.dv                                                              0.01                         10    ss   10           0.103 ±       0.010    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                        0.01                         10    ss   10           0.435 ±       0.016    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                   0.01                         10    ss   10           0.265 ±       0.014    s/op

DVWriterBenchmark.dv                                                              0.03                          5    ss   10           0.136 ±       0.006    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                        0.03                          5    ss   10           0.479 ±       0.018    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                   0.03                          5    ss   10           0.337 ±       0.018    s/op

DVWriterBenchmark.dv                                                              0.03                         10    ss   10           0.219 ±       0.013    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                        0.03                         10    ss   10           0.918 ±       0.028    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                   0.03                         10    ss   10           0.618 ±       0.014    s/op

DVWriterBenchmark.dv                                                              0.05                          5    ss   10           0.218 ±       0.014    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                        0.05                          5    ss   10           0.696 ±       0.010    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                   0.05                          5    ss   10           0.527 ±       0.008    s/op

DVWriterBenchmark.dv                                                              0.05                         10    ss   10           0.366 ±       0.011    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                        0.05                         10    ss   10           1.389 ±       0.024    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                   0.05                         10    ss   10           0.986 ±       0.007    s/op

DVWriterBenchmark.dv                                                              0.10                          5    ss   10           0.307 ±       0.007    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                        0.10                          5    ss   10           1.179 ±       0.039    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                   0.10                          5    ss   10           1.004 ±       0.010    s/op

DVWriterBenchmark.dv                                                              0.10                         10    ss   10           0.594 ±       0.006    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                        0.10                         10    ss   10           2.260 ±       0.039    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                   0.10                         10    ss   10           2.045 ±       0.054    s/op

DVWriterBenchmark.dv                                                               0.2                          5    ss   10           0.709 ±       0.011    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                         0.2                          5    ss   10           2.151 ±       0.017    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                    0.2                          5    ss   10           1.969 ±       0.046    s/op

DVWriterBenchmark.dv                                                               0.2                         10    ss   10           1.370 ±       0.047    s/op
DVWriterBenchmark.fileScopedParquetDeletes                                         0.2                         10    ss   10           4.386 ±       0.034    s/op
DVWriterBenchmark.partitionScopedParquetDeletes                                    0.2                         10    ss   10           3.849 ±       0.030    s/op

This work is part of #11122.


private final OutputFileFactory fileFactory;
private final Function<CharSequence, PositionDeleteIndex> loadPreviousDeletes;
private final CharSequenceMap<Deletes> deletesByPath = CharSequenceMap.create();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss whether we should accept String or CharSequence in DV writers. We migrated to String for our new DataFile and DeleteFile fields but it is a different use case. We don't have to necessarily serialize these values. If an engine supports custom CharSequence objects, it may benefit from this API and skip the conversion.

It has no benefits for Spark, though. We have to convert to String.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that String makes more sense for now because:

  • to my knowledge, most of query engines support String
  • if a query engine needs CharSequence, the conversion is not very painful

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any engine that would take advantage of it here, but I'm not sure it really causes any significant complexity to maintain as CharSequence. I'm fine either way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avro Utf8 is the only example I remember.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It's probably a good idea to use String. We can always move to CharSequence later with minimal changes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on using String, which I've also mentioned in https://github.com/apache/iceberg/pull/11302/files#r1824137648

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let's use String. It will have less overhead for engines like Spark that use String.

import org.apache.iceberg.StructLike;

/** Copy the StructLike's values into a new one. It does not handle list or map values now. */
class StructCopy implements StructLike {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to move this into a utility. Just not sure which at this point. Ideas are welcome.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a "generic" IoUtil ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this even belong in io? This feels like it should be in types.TypeUtil

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it belongs to StructLikeUtil or something like this. Doesn't seem to belong to io.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for StructLikeUtil or similar.

@jbonofre, the io classes typically refer to FileIO and buffer management utils in Iceberg. I'm not quite sure why this is currently in the io package. I think it's probably better to not continue using io so that it's easier to find later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I like the StructLikeUtil name.

+1 for that. Thanks !

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the logic to the util and deprecated this class not to break anyone.

}

@TestTemplate
public void testBasicDVs() throws IOException {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more once we support DV reads.

return new Blob(
StandardBlobTypes.DV_V1,
ImmutableList.of(MetadataColumns.ROW_POSITION.fieldId()),
-1 /* snapshot ID is inherited */,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per V1 Puffin spec.

* @param spec the data file partition spec
* @param partition the data file partition
*/
void write(CharSequence path, long pos, PartitionSpec spec, StructLike partition);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little bit of a nit here, but does write make sense? We're actually marking for delete. FileAppender and most "Writers" use add, should this just be delete?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. What about DVFileWriter as the name? That seems OK but if there are alternatives, let me know.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielcweeks from my standpoint, delete() would be confusing. As the class is named DeleteVectorFileWriter (yes, I would prefer a full name instead of DV), I think it makes sense to use write() there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, delete makes sense now as previously we passed PositionDelete object and used the common writer abstraction. That said, I am OK either way.

I do want to keep the shorter class name. I use DV naming consistently throughout the code and it allows me to stay on one line most of the time. Our 100 character line limit is no joke!

return StructCopy.copy(struct);
}

private static class StructCopy implements StructLike {
Copy link
Copy Markdown
Contributor Author

@aokolnychyi aokolnychyi Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied the existing class but made private. The old one is deprecated and will be removed in 1.9.

Comment on lines +68 to +70
this.tableDir = Files.createTempDirectory(temp, "junit").toFile();
assertThat(tableDir.delete()).isTrue(); // created during table creation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.tableDir = Files.createTempDirectory(temp, "junit").toFile();
assertThat(tableDir.delete()).isTrue(); // created during table creation

FYI this typically isn't needed, because tableDir is already initialized in the super class. See also #11460 where existing tests have been refactored to remove this unnecessary re-initialization

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good call. Missed that change!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also removed the metadata dir and empty line.

@aokolnychyi aokolnychyi merged commit 7938403 into apache:main Nov 6, 2024
@aokolnychyi
Copy link
Copy Markdown
Contributor Author

Thanks for reviewing, @nastra @danielcweeks @jbonofre @rdblue!

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
@dentiny
Copy link
Copy Markdown

dentiny commented Apr 27, 2025

Hi team, may I ask a dummy question?
I'm wondering if this is the official java implementation for deletion vector mentioned at https://iceberg.apache.org/puffin-spec/#deletion-vector-v1-blob-type?
In other words, if I want to write deletion vector (in iceberg V3), should I use BaseDVFileWriter?

My confusion lies in:

  • I see serialized blob should contain magic sequence "D1 D3 39 64", but I didn't find it anywhere
  • The spec mentions blob contains CRC32 for magic bytes and serialized bytes, I also don't find it

Appreciate any answer! I could tell iceberg is evolving really fast, so as a newcomer I'm kinda lost in the (multiple) proposals and different versions of implementation.

@nastra
Copy link
Copy Markdown
Contributor

nastra commented May 5, 2025

Hi team, may I ask a dummy question? I'm wondering if this is the official java implementation for deletion vector mentioned at https://iceberg.apache.org/puffin-spec/#deletion-vector-v1-blob-type? In other words, if I want to write deletion vector (in iceberg V3), should I use BaseDVFileWriter?

My confusion lies in:

  • I see serialized blob should contain magic sequence "D1 D3 39 64", but I didn't find it anywhere
  • The spec mentions blob contains CRC32 for magic bytes and serialized bytes, I also don't find it

Appreciate any answer! I could tell iceberg is evolving really fast, so as a newcomer I'm kinda lost in the (multiple) proposals and different versions of implementation.

@dentiny yes this repository contains the java implementation. You might want to start browsing the code around BitmapPositionDeleteIndex / RoaringPositionBitmap / BaseDVFileWriter, which should contain the info you're looking for.

@dentiny
Copy link
Copy Markdown

dentiny commented May 27, 2025

yes this repository contains the java implementation. You might want to start browsing the code around BitmapPositionDeleteIndex / RoaringPositionBitmap / BaseDVFileWriter, which should contain the info you're looking for.

Thank you for the reply and code pointer!
I did find crc and magic number in BitmapPositionDeleteIndex!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants