Support delete corrupted Iceberg table#5510
Conversation
| lastMetadata = null; | ||
| } | ||
| } catch (Exception e) { | ||
| if (e instanceof NotFoundException) { |
There was a problem hiding this comment.
@szehon-ho cc~ does this need to make a separate method ? if separate , it may load table twice.
| } else { | ||
| lastMetadata = null; | ||
| } | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
Can directly just catch NotFoundException? Instead of catching all the exceptions and throwing again if it is not a NotFoundException?
There was a problem hiding this comment.
Yes, I'm sorry, my compiler had a problem before, it has been modified
| } catch (Exception e) { | ||
| if (e instanceof NotFoundException) { | ||
| LOG.warn("The metadata file doesn't exist any more. Continue to drop the table.", e); | ||
| lastMetadata = null; |
There was a problem hiding this comment.
I am not sure if this needs to be supported.
See the line 199,
If the lastMetadata is null, table files are never deleted even with purge = true.
if (purge && lastMetadata != null) {
CatalogUtil.dropTableData(ops.io(), lastMetadata);
}
| catalog.createTable(identifier, tableSchema); | ||
| TableOperations ops = catalog.newTableOps(identifier); | ||
|
|
||
| Assert.assertTrue(catalog.dropTable(identifier, true)); |
There was a problem hiding this comment.
nit: I feel no need to test line 473 to 480
There was a problem hiding this comment.
I agree, let's remove L473 - 478 and just focus on the actual case where we don't have a metadata file for the given table
There was a problem hiding this comment.
When the NotFoundException is thrown,
we can call drop table again with purge = false which will succeed.
So, I think these changes are not required. Let us wait for others' opinions.
Also, this PR handles only one catalog. Instead, manually calling drop table again with purge = false works for all the catalogs.
nastra
left a comment
There was a problem hiding this comment.
the approach LGTM, just a few nits to address
| lastMetadata = null; | ||
| } | ||
| } catch (NotFoundException e) { | ||
| LOG.warn("The metadata file doesn't exist any more. Continue to drop the table.", e); |
There was a problem hiding this comment.
| LOG.warn("The metadata file doesn't exist any more. Continue to drop the table.", e); | |
| LOG.warn("The metadata file doesn't exist anymore. Continuing to drop the table.", e); |
| NoSuchTableException.class, | ||
| () -> catalog.loadTable(identifier)); | ||
|
|
||
| // delete corrupted table |
There was a problem hiding this comment.
maybe just // delete metadata file
| Assert.assertTrue(catalog.dropTable(identifier, true)); | ||
| AssertHelpers.assertThrows( | ||
| "Should fail to load table after dropping it", | ||
| NoSuchTableException.class, |
There was a problem hiding this comment.
nit: let's use
Assertions.assertThatThrownBy(() -> catalog.loadTable(identifier)).isInstanceOf(NoSuchTableException.class).hasMessage("xx")
| catalog.createTable(identifier, tableSchema); | ||
| TableOperations ops = catalog.newTableOps(identifier); | ||
|
|
||
| Assert.assertTrue(catalog.dropTable(identifier, true)); |
There was a problem hiding this comment.
I agree, let's remove L473 - 478 and just focus on the actual case where we don't have a metadata file for the given table
| } | ||
|
|
||
| @Test | ||
| public void testDropTable() throws TException { |
There was a problem hiding this comment.
| public void testDropTable() throws TException { | |
| public void testDropTableWithoutMetadataFile() throws TException { |
|
@nastra I had addressed comments. Thanks for review. |
|
We also bumped into this when testing through Apache Impala. This fix could be useful for us as well. |
| if (purge && ops.current() != null) { | ||
| lastMetadata = ops.current(); | ||
| } else { | ||
| lastMetadata = null; |
There was a problem hiding this comment.
Style nit: Wondering, while we are in this code, it may be a bit more concise to initialize lastMetadata = null, then we can remove two null assignments in the else/catch blocks? Not sure what people think, though its not a big in any case.
There was a problem hiding this comment.
Might be a personal preference, but I also prefer initialising to null and then save an else branch of an if.
There was a problem hiding this comment.
It is better to initialize to null from the concise code, but I think keeping null is easier to understand in terms of readability.
I think It depends on whether the lastMetadata = null has a significant effect on the method . If it is not that important, we can initialise it as null.
There was a problem hiding this comment.
lastMetadata = null seems not that important for dropTable. So I initialise it as null.
| lastMetadata = ops.current(); | ||
| } | ||
| } catch (NotFoundException e) { | ||
| LOG.warn("The metadata file doesn't exist anymore. Continuing to drop the table.", e); |
There was a problem hiding this comment.
In Iceberg, log messages should be specific statements and should not speculate about underlying causes. In this case, the operations failed to load table metadata. That doesn't necessarily mean that it doesn't exist. The error should state that clearly. Failed to load table metadata for table: %s, continuing drop without purge
There was a problem hiding this comment.
I have updated log message.
|
|
||
| @Test | ||
| public void testDropTableWithoutMetadataFile() { | ||
| Namespace namespace = Namespace.of("dbname_drop"); |
There was a problem hiding this comment.
Instead of creating a new namespace in the HiveCatalog tests we are usually using a static db:
TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");
|
Quick question: do we need this only for HiveCatalog? Do we need this for other catalogs as well? (Maybe a different PR, but I think we would expect the same behavior for every Catalog implementation, or at least a clean documentation for the differences) |
|
@pvary I can complete other catalogs in this PR if need. |
| } | ||
| } catch (NotFoundException e) { | ||
| LOG.warn( | ||
| "Failed to load table metadata for table: {}, continuing drop without purge.", |
There was a problem hiding this comment.
Please removing ending punctuation. There's no need for it.
|
I agree with @pvary. It would be better to add this test to |
|
+1, seems like same issue is reported for GlueCatalog : #5604, we should have similar handling for GlueCatalog as well, having tests for all catalogs could have helped detecting :) !! |
|
@singhpk234 I will complete all catalogs in this PR~ |
|
I have implemented |
|
Also note that, |
|
@rdblue @szehon-ho @pvary If you have time, please take a look. And I have one question as above. If it is not suitable, can I only change |
|
@yabola I'm ok with some retry, not sure if its worth the additional risk we introduce to turn it off. It does seem 20 can be a bit excessive, wonder whether we can turn that down or it is by design.. maybe others will have an idea like @singhpk234 for GlueCatalog? But one concern here, does it have an impact on how long the test takes to run, and can we turn it down at least for this test? |
20 seems to be quite excessive IMHO as well, going by the calculations mentioned in the Hive PR to reduce retries it can lead to hang > 10 min. Unfortunately I don't have much context into what the motivation was to keep it to 20 from the beginning ... The closest I could track it down was this commit 9349a25 from 2018, unfortunately there is not a pr associated with it. But definitely it makes sense to reduce it to may be 3, and make it configurable like how we did for hive. |
|
@szehon-ho My newly added |
|
Ugh, yea I feel like it would be nice to have this configurable, not just the test but in general, but I feel it may be another pr. If @rdblue is ok to merge, I'm ok. About the config, it seems like today the way to config things in TableOperations is table property, which could be inherited from CatalogProperties. |
|
@szehon-ho @yabola I created a draft PR to make the number of metadata load retries configurable: #5688 |
|
Thanks for taking a look at the metadata retries, @gaborkaszab! |
|
@ajantha-bhat @pvary @rdblue @szehon-ho @nastra @singhpk234 @gaborkaszab Thank you all for careful review |
When the metadata file doesn't exist any more, drop table operation will fail.
This PR aims to support deleting the table even if the table was corrupted.