Skip to content

Support delete corrupted Iceberg table#5510

Merged
rdblue merged 1 commit into
apache:masterfrom
yabola:drop
Sep 1, 2022
Merged

Support delete corrupted Iceberg table#5510
rdblue merged 1 commit into
apache:masterfrom
yabola:drop

Conversation

@yabola
Copy link
Copy Markdown
Contributor

@yabola yabola commented Aug 12, 2022

When the metadata file doesn't exist any more, drop table operation will fail.
This PR aims to support deleting the table even if the table was corrupted.

@github-actions github-actions Bot added the hive label Aug 12, 2022
@yabola yabola changed the title Support delete corrupted Iceberg table Support delete corrupted Iceberg table in HiveCatalog Aug 12, 2022
@yabola yabola changed the title Support delete corrupted Iceberg table in HiveCatalog Support delete corrupted Iceberg table Aug 12, 2022
lastMetadata = null;
}
} catch (Exception e) {
if (e instanceof NotFoundException) {
Copy link
Copy Markdown
Contributor Author

@yabola yabola Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szehon-ho cc~ does this need to make a separate method ? if separate , it may load table twice.

} else {
lastMetadata = null;
}
} catch (Exception e) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can directly just catch NotFoundException? Instead of catching all the exceptions and throwing again if it is not a NotFoundException?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm sorry, my compiler had a problem before, it has been modified

} catch (Exception e) {
if (e instanceof NotFoundException) {
LOG.warn("The metadata file doesn't exist any more. Continue to drop the table.", e);
lastMetadata = null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this needs to be supported.

See the line 199,
If the lastMetadata is null, table files are never deleted even with purge = true.

if (purge && lastMetadata != null) {
        CatalogUtil.dropTableData(ops.io(), lastMetadata);
      }

catalog.createTable(identifier, tableSchema);
TableOperations ops = catalog.newTableOps(identifier);

Assert.assertTrue(catalog.dropTable(identifier, true));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel no need to test line 473 to 480

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's remove L473 - 478 and just focus on the actual case where we don't have a metadata file for the given table

Copy link
Copy Markdown
Member

@ajantha-bhat ajantha-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the NotFoundException is thrown,
we can call drop table again with purge = false which will succeed.

So, I think these changes are not required. Let us wait for others' opinions.

Also, this PR handles only one catalog. Instead, manually calling drop table again with purge = false works for all the catalogs.

Copy link
Copy Markdown
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the approach LGTM, just a few nits to address

lastMetadata = null;
}
} catch (NotFoundException e) {
LOG.warn("The metadata file doesn't exist any more. Continue to drop the table.", e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.warn("The metadata file doesn't exist any more. Continue to drop the table.", e);
LOG.warn("The metadata file doesn't exist anymore. Continuing to drop the table.", e);

NoSuchTableException.class,
() -> catalog.loadTable(identifier));

// delete corrupted table
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just // delete metadata file

Assert.assertTrue(catalog.dropTable(identifier, true));
AssertHelpers.assertThrows(
"Should fail to load table after dropping it",
NoSuchTableException.class,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's use

Assertions.assertThatThrownBy(() -> catalog.loadTable(identifier)).isInstanceOf(NoSuchTableException.class).hasMessage("xx")

catalog.createTable(identifier, tableSchema);
TableOperations ops = catalog.newTableOps(identifier);

Assert.assertTrue(catalog.dropTable(identifier, true));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's remove L473 - 478 and just focus on the actual case where we don't have a metadata file for the given table

}

@Test
public void testDropTable() throws TException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void testDropTable() throws TException {
public void testDropTableWithoutMetadataFile() throws TException {

@nastra nastra added this to the Iceberg 0.14.1 Release milestone Aug 19, 2022
@yabola
Copy link
Copy Markdown
Contributor Author

yabola commented Aug 19, 2022

@nastra I had addressed comments. Thanks for review.

Copy link
Copy Markdown
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM, @pvary or @rdblue could one of you please review this as well?
This should fix #5584

@gaborkaszab
Copy link
Copy Markdown
Contributor

We also bumped into this when testing through Apache Impala. This fix could be useful for us as well.

if (purge && ops.current() != null) {
lastMetadata = ops.current();
} else {
lastMetadata = null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: Wondering, while we are in this code, it may be a bit more concise to initialize lastMetadata = null, then we can remove two null assignments in the else/catch blocks? Not sure what people think, though its not a big in any case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a personal preference, but I also prefer initialising to null and then save an else branch of an if.

Copy link
Copy Markdown
Contributor Author

@yabola yabola Aug 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to initialize to null from the concise code, but I think keeping null is easier to understand in terms of readability.
I think It depends on whether the lastMetadata = null has a significant effect on the method . If it is not that important, we can initialise it as null.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastMetadata = null seems not that important for dropTable. So I initialise it as null.

lastMetadata = ops.current();
}
} catch (NotFoundException e) {
LOG.warn("The metadata file doesn't exist anymore. Continuing to drop the table.", e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Iceberg, log messages should be specific statements and should not speculate about underlying causes. In this case, the operations failed to load table metadata. That doesn't necessarily mean that it doesn't exist. The error should state that clearly. Failed to load table metadata for table: %s, continuing drop without purge

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated log message.


@Test
public void testDropTableWithoutMetadataFile() {
Namespace namespace = Namespace.of("dbname_drop");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new namespace in the HiveCatalog tests we are usually using a static db:

TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it.

@pvary
Copy link
Copy Markdown
Contributor

pvary commented Aug 21, 2022

Quick question: do we need this only for HiveCatalog? Do we need this for other catalogs as well? (Maybe a different PR, but I think we would expect the same behavior for every Catalog implementation, or at least a clean documentation for the differences)

@yabola
Copy link
Copy Markdown
Contributor Author

yabola commented Aug 21, 2022

@pvary I can complete other catalogs in this PR if need.

}
} catch (NotFoundException e) {
LOG.warn(
"Failed to load table metadata for table: {}, continuing drop without purge.",
Copy link
Copy Markdown
Contributor

@rdblue rdblue Aug 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please removing ending punctuation. There's no need for it.

@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Aug 21, 2022

I agree with @pvary. It would be better to add this test to CatalogTests and ensure that all the catalogs behave this way.

@singhpk234
Copy link
Copy Markdown
Contributor

singhpk234 commented Aug 22, 2022

+1, seems like same issue is reported for GlueCatalog : #5604, we should have similar handling for GlueCatalog as well, having tests for all catalogs could have helped detecting :) !!

@yabola
Copy link
Copy Markdown
Contributor Author

yabola commented Aug 22, 2022

@singhpk234 I will complete all catalogs in this PR~

@yabola
Copy link
Copy Markdown
Contributor Author

yabola commented Aug 24, 2022

I have implemented HiveCatalog, GlueCatalog,JdbcCatalog, DynamoDbCatalog.
BaseSessionCatalog , NessieCatalog, RESTCatalog don't have purge table.
EcsCatalog, HadoopCatalog purge logic (fail if metadata doesn't exist) was different from HiveCatalog. So I am not sure if it need to change.

@yabola
Copy link
Copy Markdown
Contributor Author

yabola commented Aug 25, 2022

Also note that, JdbcCatalog, GlueTableOperationsand other will retry 20 times and cost some time when refresh metadata and there is no metadata file. Please take a look at BaseMetastoreTableOperations#refreshFromMetadataLocation.
HiveCatalog will only retry 2 times , please see parameter iceberg.hive.metadata-refresh-max-retries.

@yabola yabola requested review from rdblue and szehon-ho and removed request for rdblue and szehon-ho August 26, 2022 02:19
@yabola
Copy link
Copy Markdown
Contributor Author

yabola commented Aug 29, 2022

@rdblue @szehon-ho @pvary If you have time, please take a look. And I have one question as above. If it is not suitable, can I only change HiveCatalog ?

@szehon-ho
Copy link
Copy Markdown
Member

@yabola I'm ok with some retry, not sure if its worth the additional risk we introduce to turn it off.

It does seem 20 can be a bit excessive, wonder whether we can turn that down or it is by design.. maybe others will have an idea like @singhpk234 for GlueCatalog?

But one concern here, does it have an impact on how long the test takes to run, and can we turn it down at least for this test?

@singhpk234
Copy link
Copy Markdown
Contributor

It does seem 20 can be a bit excessive, wonder whether we can turn that down or it is by design.. maybe others will have an idea like @singhpk234 for GlueCatalog

20 seems to be quite excessive IMHO as well, going by the calculations mentioned in the Hive PR to reduce retries it can lead to hang > 10 min. Unfortunately I don't have much context into what the motivation was to keep it to 20 from the beginning ... The closest I could track it down was this commit 9349a25 from 2018, unfortunately there is not a pr associated with it.

But definitely it makes sense to reduce it to may be 3, and make it configurable like how we did for hive.

@yabola
Copy link
Copy Markdown
Contributor Author

yabola commented Sep 1, 2022

@szehon-ho My newly added testDropTableWithoutMetadataFile method will take 100s in TestJdbcCatalog, the original Test class will not have this test path, so the time remains unchanged. Sorry that I didn't find a way to avoid retry 20 time for JDBC (20 times is a hard code). So should I delete my new added test method for TestJdbcCatalog or any better idea?

@szehon-ho
Copy link
Copy Markdown
Member

szehon-ho commented Sep 1, 2022

Ugh, yea I feel like it would be nice to have this configurable, not just the test but in general, but I feel it may be another pr. If @rdblue is ok to merge, I'm ok.

About the config, it seems like today the way to config things in TableOperations is table property, which could be inherited from CatalogProperties.

@gaborkaszab
Copy link
Copy Markdown
Contributor

@szehon-ho @yabola I created a draft PR to make the number of metadata load retries configurable: #5688
So far I checked JdbcCatalog only but will cover other affected catalogs.

@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Sep 1, 2022

Thanks for taking a look at the metadata retries, @gaborkaszab!

@rdblue rdblue merged commit 26cdc7a into apache:master Sep 1, 2022
@yabola
Copy link
Copy Markdown
Contributor Author

yabola commented Sep 2, 2022

@ajantha-bhat @pvary @rdblue @szehon-ho @nastra @singhpk234 @gaborkaszab Thank you all for careful review

rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 10, 2023
zhongyujiang added a commit to zhongyujiang/iceberg that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Delete table entry from Glue data catalog if corresponding table metadata files are deleted in s3

8 participants