Skip to content

Core: Retry connections in JDBC catalog with user configured error code list#10140

Merged
amogh-jahagirdar merged 1 commit into
apache:mainfrom
amogh-jahagirdar:jdbc-retries
May 10, 2024
Merged

Core: Retry connections in JDBC catalog with user configured error code list#10140
amogh-jahagirdar merged 1 commit into
apache:mainfrom
amogh-jahagirdar:jdbc-retries

Conversation

@amogh-jahagirdar
Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Apr 15, 2024

This change is an alternative to #7561 for retrying JDBC connections.
First, we currently retry SQLNonTransientExceptions which is a bug. Those are non-retryable by definition, we should retry SQLTransientExceptions.

Next, this change also enables configuring multiple retries since just retrying once may not increase the chance of success sufficiently.

Lastly, this change introduces the ability for a user to configure which SQL error codes they want to retry on on top of the standard retryable error codes. The reason is there's so many different databases out there each with their own status codes that Iceberg shouldn't define all those in the library. This approach allows a user to specify the ones they want to retry via a catalog property called "retryable_status_codes" which is a comma separated list of status codes that are considered retryable.

@github-actions github-actions Bot added the core label Apr 15, 2024
@amogh-jahagirdar amogh-jahagirdar changed the title Retry connections in JDBC catalog Retry connections in JDBC catalog with user configured error code list Apr 15, 2024
Comment thread core/src/main/java/org/apache/iceberg/ClientPoolImpl.java Outdated
Comment on lines +59 to +74
retryableStatusCodes.addAll(COMMON_RETRYABLE_CONNECTION_SQL_STATES);
String configuredRetryableStatuses = props.get(JdbcCatalog.RETRYABLE_STATUS_CODES);
if (configuredRetryableStatuses != null) {
Copy link
Copy Markdown
Contributor Author

@amogh-jahagirdar amogh-jahagirdar Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm presenting this configuration option since at the moment I believe it's the most practical way for people to work around production issues they hit today.

Long term, if we separate JDBC catalog into a separate library (I forgot where we landed on where JDBC fits in in terms of the catalog discussion that's happened in the past) that library could define status codes for each common, industry-standard database so a user doesn't have to specify anything.

See https://www.youtube.com/watch?v=uAQVGd5zV4I&t=1323s for the discussion on catalogs in the future.

@amogh-jahagirdar
Copy link
Copy Markdown
Contributor Author

Need to add tests

@ajantha-bhat
Copy link
Copy Markdown
Member

cc: @jbonofre

implements Configurable<Object>, SupportsNamespaces {

public static final String PROPERTY_PREFIX = "jdbc.";
public static final String RETRYABLE_STATUS_CODES = "retryable_status_codes";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to move this, as properties are defined in JdbcUtil

implements Configurable<Object>, SupportsNamespaces {

public static final String PROPERTY_PREFIX = "jdbc.";
public static final String RETRYABLE_STATUS_CODES = "retryable_status_codes";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent, I would move this property in JdbcUtil as it's where all properties are located.


private Set<String> retryableStatusCodes;

private static final Set<String> COMMON_RETRYABLE_CONNECTION_SQL_STATES =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it by default but also configurable (not necessarily in this PR). For instance, in ActiveMQ JDBC Store, I did some special handling specific to vendor (some SQL states are specific to vendor). I can include that in my PR (in preparation) working with multiple vendors.

Copy link
Copy Markdown
Contributor

@nastra nastra Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to add some javadoc and document what those errors actually mean.

nit: the definition of that var should probably be moved to the top (before the non-static vars)

@cccs-eric
Copy link
Copy Markdown
Contributor

@amogh-jahagirdar I'd be happy with this fix since it will solve the problem my organization is facing. Thanks for looking into this and bringing the football past the goal line. 👍🏻

@amogh-jahagirdar amogh-jahagirdar changed the title Retry connections in JDBC catalog with user configured error code list Core: Retry connections in JDBC catalog with user configured error code list Apr 15, 2024
@amogh-jahagirdar
Copy link
Copy Markdown
Contributor Author

I didn't forget about this (again :) ) I've added a test to verify the error code configuration get parsed and considered correctly and am adding tests for ClientPoolImpl which we can have a mocked implementation (mocking the actual connection action and failure cases) to verify the new retry path works as expected. This combination of tests should give us confidence on the real behavior.

@amogh-jahagirdar amogh-jahagirdar force-pushed the jdbc-retries branch 4 times, most recently from 58e0a9c to 9ef9ed7 Compare April 25, 2024 20:21
@amogh-jahagirdar amogh-jahagirdar marked this pull request as ready for review April 25, 2024 20:21
Comment thread core/src/main/java/org/apache/iceberg/ClientPoolImpl.java
Comment thread core/src/main/java/org/apache/iceberg/jdbc/JdbcClientPool.java
// property to control if view support is added to the existing database
static final String SCHEMA_VERSION_PROPERTY = JdbcCatalog.PROPERTY_PREFIX + "schema-version";

static final String RETRYABLE_STATUS_CODES = "retryable_status_codes";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have JdbcCatalog.PROPERTY_PREFIX as a prefix like the other properties have?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so , looking at https://iceberg.apache.org/docs/nightly/jdbc/?h=jdbc#iceberg-jdbc-integration

The properties which are prefixed with "jdbc" look to be specifically for configuring the JDBC connection url, like user,password, ssl etc. These are standard options when establishing a JDBC connection.

What we're trying to do here is independent of the JDBC connection URL and more like a typical catalog property.

I do see that we have strict-mode and schema-version catalog properties which kind of break this pattern since those aren't relevant for the JDBC connection URL, but I'm not sure we want to keep following that since it does seem like the original intention of the prefix was pretty clear on having it be for the connection URL. cc @jbonofre if you have a take on this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take on this (maybe I'm wrong) is that anything specific to JDBC catalog should be prefixed by jdbc, just to be sure we don't have conflict with properties with same name not related to JDBC Catalog.
As JDBC Connection is used in JDBC Catalog, I would prefix with jdbc for consistency. As it's user facing properties, it's important to be consistent here.

Copy link
Copy Markdown
Contributor Author

@amogh-jahagirdar amogh-jahagirdar May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that anything specific to JDBC catalog should be prefixed by jdbc, just to be sure we don't have conflict with properties with same name not related to JDBC Catalog. As JDBC Connection is used in JDBC Catalog, I would prefix with jdbc for consistency. As it's user facing properties, it's important to be consistent here.

I think if we do this it's not really consistent with what we do with other catalogs. For example, REST, Glue, Nessie catalogs don't neccessarily prefix any of their custom configurations with "glue", "rest", or "nessie" respectively; configuring
the catalog configuration is just a matter of setting the property via <catalog_name>.<property>. I also don't think we need to worry about conflicts in property names across different catalogs; in the end we use reflection to load the catalog and it's up to the catalog implementation to respect whichever properties we want, so we just need to make sure the properties are well defined.

The docs seemed pretty clear to me about the original intention of the "jdbc" prefix for the JDBC catalog https://iceberg.apache.org/docs/nightly/jdbc/?h=jdbc#configurations is that it's specifically meant for configuring JDBC connections (username/port/db name that are in the URL itself).

retryable_status_codes is not configuring the JDBC connection URL.
It is a catalog level property which controls when the catalog should retry establishing an entirely new connection. That's why I think it should be namespaced as it's own property (same as any other property that we do for other catalogs).

I feel like that's actually more consistent behavior for a user; they can know that everything under jdbc is part of the actual connection URL, and everything outside of the JDBC prefix is a general catalog level property.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. It makes sense indeed. But it also means that strict-mode and schema-version should not have been prefixed by jdbc then.
We can change that, but as it's user facing, it could be problematic.
I can work on a PR to deprecated jdbc prefix (not still supporting it) for strict-mode and schema-version.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it also means that strict-mode and schema-version should not have been prefixed by jdbc then.
Right, I think they should not have been.

We can change that, but as it's user facing, it could be problematic.
I can work on a PR to deprecated jdbc prefix (not still supporting it) for strict-mode and schema-version

Sure, we can either deprecate the prefix or just leave as is. Since it's relatively early days for strict-mode/schema-version in the catalog (e.g. it'll practically take time for these paths to get exercised, as views get adopted) I think it makes most sense just to deprecate and correct it.

Comment thread core/src/main/java/org/apache/iceberg/jdbc/JdbcClientPool.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java
Comment thread core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java Outdated
public class JdbcClientPool extends ClientPoolImpl<Connection, SQLException> {

/**
* The following are common retryable SQLSTATEs error codes which are generic across vendors.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The following are common retryable SQLSTATEs error codes which are generic across vendors.
* The following are common retryable SQLSTATE error codes which are generic across vendors.

Copy link
Copy Markdown
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only open discussion point is whether retryable_status_codes should have a prefix or not, but everything else LGTM

@amogh-jahagirdar amogh-jahagirdar force-pushed the jdbc-retries branch 2 times, most recently from c89e12a to cb45320 Compare May 2, 2024 04:56
*
* See https://en.wikipedia.org/wiki/SQLSTATE for more details.
*/
static final Set<String> COMMON_RETRYABLE_CONNECTION_SQL_STATES =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth to add 40001 and 40P01 state to the list:

Deadlocks are a candidate to retry imho.

Copy link
Copy Markdown
Contributor Author

@amogh-jahagirdar amogh-jahagirdar May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure it seems like 40001 is typically a transaction rollback prompted by a database's deadlock detector kicking in and essentially killing one of the operations.

I think I'll add 40001 since it can be safely retried and is a common status code across vendors.

I'm a bit more hesitant to add database vendor specific codes (like 40P01) to this default set, since it sets a precedent to be a dumping ground for many states. I'd advocate (at least for now) we keep this default set to be a common set, and users pass in custom codes for other cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

40P01 can be provided by the user, if using PostgreSQL. So, OK to add 40001 state only.

properties = props;
retryableStatusCodes = Sets.newHashSet();
retryableStatusCodes.addAll(COMMON_RETRYABLE_CONNECTION_SQL_STATES);
String configuredRetryableStatuses = props.get(JdbcUtil.RETRYABLE_STATUS_CODES);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to make it configuration (as discussed in the PR): it will allow users to define database specific states.

// property to control if view support is added to the existing database
static final String SCHEMA_VERSION_PROPERTY = JdbcCatalog.PROPERTY_PREFIX + "schema-version";

static final String RETRYABLE_STATUS_CODES = "retryable_status_codes";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take on this (maybe I'm wrong) is that anything specific to JDBC catalog should be prefixed by jdbc, just to be sure we don't have conflict with properties with same name not related to JDBC Catalog.
As JDBC Connection is used in JDBC Catalog, I would prefix with jdbc for consistency. As it's user facing properties, it's important to be consistent here.

@amogh-jahagirdar
Copy link
Copy Markdown
Contributor Author

I'll go ahead and merge this since I think we have consensus on #10140 (comment). Thanks for reviewing @nastra @jbonofre !

@amogh-jahagirdar amogh-jahagirdar merged commit 2b21020 into apache:main May 10, 2024
@guyco33
Copy link
Copy Markdown

guyco33 commented Jun 2, 2024

@amogh-jahagirdar When is it going to be released ?

@jbonofre
Copy link
Copy Markdown
Member

jbonofre commented Jun 2, 2024

@guyco33 It will be included in Iceberg 1.6.0, plan in a few weeks.

@guyco33
Copy link
Copy Markdown

guyco33 commented Jun 2, 2024

Thanks @jbonofre
It would be great to have it earlier as a cherry picking patch of 1.5.x on top of pr-10124 that is also needed to avoid conflicts.

@ochanism
Copy link
Copy Markdown

ochanism commented Jun 7, 2024

Oh, I also need this feature for my REST/JDBC catalog. I've been suffering from the JDBC connection failure issue.

@palkx
Copy link
Copy Markdown

palkx commented Jul 2, 2024

Hello @jbonofre.

Can you estimate when the release of v1.6.0 with this fix is planned?
Or maybe at least we can port this fix to the v1.5.x?

Thanks!

@jbonofre
Copy link
Copy Markdown
Member

jbonofre commented Jul 2, 2024

@palkx I sent a message to the mailing list this morning: we have to merge a few PRs and we are good to go with 1.6.0 release. I would like to start the vote end of this week or max beginning of next week.

@palkx
Copy link
Copy Markdown

palkx commented Jul 2, 2024

Thank you for the update! Will wait for the v1.6.0 then.

sasankpagolu pushed a commit to sasankpagolu/iceberg that referenced this pull request Oct 27, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants