Core: Retry connections in JDBC catalog with user configured error code list#10140
Conversation
| retryableStatusCodes.addAll(COMMON_RETRYABLE_CONNECTION_SQL_STATES); | ||
| String configuredRetryableStatuses = props.get(JdbcCatalog.RETRYABLE_STATUS_CODES); | ||
| if (configuredRetryableStatuses != null) { |
There was a problem hiding this comment.
I'm presenting this configuration option since at the moment I believe it's the most practical way for people to work around production issues they hit today.
Long term, if we separate JDBC catalog into a separate library (I forgot where we landed on where JDBC fits in in terms of the catalog discussion that's happened in the past) that library could define status codes for each common, industry-standard database so a user doesn't have to specify anything.
See https://www.youtube.com/watch?v=uAQVGd5zV4I&t=1323s for the discussion on catalogs in the future.
|
Need to add tests |
|
cc: @jbonofre |
| implements Configurable<Object>, SupportsNamespaces { | ||
|
|
||
| public static final String PROPERTY_PREFIX = "jdbc."; | ||
| public static final String RETRYABLE_STATUS_CODES = "retryable_status_codes"; |
There was a problem hiding this comment.
would be good to move this, as properties are defined in JdbcUtil
| implements Configurable<Object>, SupportsNamespaces { | ||
|
|
||
| public static final String PROPERTY_PREFIX = "jdbc."; | ||
| public static final String RETRYABLE_STATUS_CODES = "retryable_status_codes"; |
There was a problem hiding this comment.
To be consistent, I would move this property in JdbcUtil as it's where all properties are located.
|
|
||
| private Set<String> retryableStatusCodes; | ||
|
|
||
| private static final Set<String> COMMON_RETRYABLE_CONNECTION_SQL_STATES = |
There was a problem hiding this comment.
Maybe make it by default but also configurable (not necessarily in this PR). For instance, in ActiveMQ JDBC Store, I did some special handling specific to vendor (some SQL states are specific to vendor). I can include that in my PR (in preparation) working with multiple vendors.
There was a problem hiding this comment.
I think it would be helpful to add some javadoc and document what those errors actually mean.
nit: the definition of that var should probably be moved to the top (before the non-static vars)
|
@amogh-jahagirdar I'd be happy with this fix since it will solve the problem my organization is facing. Thanks for looking into this and bringing the football past the goal line. 👍🏻 |
b96f642 to
86d5788
Compare
|
I didn't forget about this (again :) ) I've added a test to verify the error code configuration get parsed and considered correctly and am adding tests for ClientPoolImpl which we can have a mocked implementation (mocking the actual connection action and failure cases) to verify the new retry path works as expected. This combination of tests should give us confidence on the real behavior. |
58e0a9c to
9ef9ed7
Compare
9ef9ed7 to
c68fc93
Compare
| // property to control if view support is added to the existing database | ||
| static final String SCHEMA_VERSION_PROPERTY = JdbcCatalog.PROPERTY_PREFIX + "schema-version"; | ||
|
|
||
| static final String RETRYABLE_STATUS_CODES = "retryable_status_codes"; |
There was a problem hiding this comment.
should this have JdbcCatalog.PROPERTY_PREFIX as a prefix like the other properties have?
There was a problem hiding this comment.
I don't think so , looking at https://iceberg.apache.org/docs/nightly/jdbc/?h=jdbc#iceberg-jdbc-integration
The properties which are prefixed with "jdbc" look to be specifically for configuring the JDBC connection url, like user,password, ssl etc. These are standard options when establishing a JDBC connection.
What we're trying to do here is independent of the JDBC connection URL and more like a typical catalog property.
I do see that we have strict-mode and schema-version catalog properties which kind of break this pattern since those aren't relevant for the JDBC connection URL, but I'm not sure we want to keep following that since it does seem like the original intention of the prefix was pretty clear on having it be for the connection URL. cc @jbonofre if you have a take on this
There was a problem hiding this comment.
My take on this (maybe I'm wrong) is that anything specific to JDBC catalog should be prefixed by jdbc, just to be sure we don't have conflict with properties with same name not related to JDBC Catalog.
As JDBC Connection is used in JDBC Catalog, I would prefix with jdbc for consistency. As it's user facing properties, it's important to be consistent here.
There was a problem hiding this comment.
that anything specific to JDBC catalog should be prefixed by jdbc, just to be sure we don't have conflict with properties with same name not related to JDBC Catalog. As JDBC Connection is used in JDBC Catalog, I would prefix with jdbc for consistency. As it's user facing properties, it's important to be consistent here.
I think if we do this it's not really consistent with what we do with other catalogs. For example, REST, Glue, Nessie catalogs don't neccessarily prefix any of their custom configurations with "glue", "rest", or "nessie" respectively; configuring
the catalog configuration is just a matter of setting the property via <catalog_name>.<property>. I also don't think we need to worry about conflicts in property names across different catalogs; in the end we use reflection to load the catalog and it's up to the catalog implementation to respect whichever properties we want, so we just need to make sure the properties are well defined.
The docs seemed pretty clear to me about the original intention of the "jdbc" prefix for the JDBC catalog https://iceberg.apache.org/docs/nightly/jdbc/?h=jdbc#configurations is that it's specifically meant for configuring JDBC connections (username/port/db name that are in the URL itself).
retryable_status_codes is not configuring the JDBC connection URL.
It is a catalog level property which controls when the catalog should retry establishing an entirely new connection. That's why I think it should be namespaced as it's own property (same as any other property that we do for other catalogs).
I feel like that's actually more consistent behavior for a user; they can know that everything under jdbc is part of the actual connection URL, and everything outside of the JDBC prefix is a general catalog level property.
There was a problem hiding this comment.
Fair enough. It makes sense indeed. But it also means that strict-mode and schema-version should not have been prefixed by jdbc then.
We can change that, but as it's user facing, it could be problematic.
I can work on a PR to deprecated jdbc prefix (not still supporting it) for strict-mode and schema-version.
There was a problem hiding this comment.
But it also means that strict-mode and schema-version should not have been prefixed by jdbc then.
Right, I think they should not have been.
We can change that, but as it's user facing, it could be problematic.
I can work on a PR to deprecated jdbc prefix (not still supporting it) for strict-mode and schema-version
Sure, we can either deprecate the prefix or just leave as is. Since it's relatively early days for strict-mode/schema-version in the catalog (e.g. it'll practically take time for these paths to get exercised, as views get adopted) I think it makes most sense just to deprecate and correct it.
c68fc93 to
52049a9
Compare
52049a9 to
1a4990f
Compare
| public class JdbcClientPool extends ClientPoolImpl<Connection, SQLException> { | ||
|
|
||
| /** | ||
| * The following are common retryable SQLSTATEs error codes which are generic across vendors. |
There was a problem hiding this comment.
| * The following are common retryable SQLSTATEs error codes which are generic across vendors. | |
| * The following are common retryable SQLSTATE error codes which are generic across vendors. |
nastra
left a comment
There was a problem hiding this comment.
the only open discussion point is whether retryable_status_codes should have a prefix or not, but everything else LGTM
c89e12a to
cb45320
Compare
| * | ||
| * See https://en.wikipedia.org/wiki/SQLSTATE for more details. | ||
| */ | ||
| static final Set<String> COMMON_RETRYABLE_CONNECTION_SQL_STATES = |
There was a problem hiding this comment.
Maybe worth to add 40001 and 40P01 state to the list:
40001is the code used by most databases to identify deadlocks40P01is a PostgreSQL specific code for deadlocks as well (see https://www.postgresql.org/docs/13/errcodes-appendix.html)
Deadlocks are a candidate to retry imho.
There was a problem hiding this comment.
Sure it seems like 40001 is typically a transaction rollback prompted by a database's deadlock detector kicking in and essentially killing one of the operations.
I think I'll add 40001 since it can be safely retried and is a common status code across vendors.
I'm a bit more hesitant to add database vendor specific codes (like 40P01) to this default set, since it sets a precedent to be a dumping ground for many states. I'd advocate (at least for now) we keep this default set to be a common set, and users pass in custom codes for other cases.
There was a problem hiding this comment.
40P01 can be provided by the user, if using PostgreSQL. So, OK to add 40001 state only.
| properties = props; | ||
| retryableStatusCodes = Sets.newHashSet(); | ||
| retryableStatusCodes.addAll(COMMON_RETRYABLE_CONNECTION_SQL_STATES); | ||
| String configuredRetryableStatuses = props.get(JdbcUtil.RETRYABLE_STATUS_CODES); |
There was a problem hiding this comment.
Thanks to make it configuration (as discussed in the PR): it will allow users to define database specific states.
| // property to control if view support is added to the existing database | ||
| static final String SCHEMA_VERSION_PROPERTY = JdbcCatalog.PROPERTY_PREFIX + "schema-version"; | ||
|
|
||
| static final String RETRYABLE_STATUS_CODES = "retryable_status_codes"; |
There was a problem hiding this comment.
My take on this (maybe I'm wrong) is that anything specific to JDBC catalog should be prefixed by jdbc, just to be sure we don't have conflict with properties with same name not related to JDBC Catalog.
As JDBC Connection is used in JDBC Catalog, I would prefix with jdbc for consistency. As it's user facing properties, it's important to be consistent here.
cb45320 to
895f9ee
Compare
895f9ee to
9e6eee5
Compare
|
I'll go ahead and merge this since I think we have consensus on #10140 (comment). Thanks for reviewing @nastra @jbonofre ! |
|
@amogh-jahagirdar When is it going to be released ? |
|
@guyco33 It will be included in Iceberg 1.6.0, plan in a few weeks. |
|
Oh, I also need this feature for my REST/JDBC catalog. I've been suffering from the JDBC connection failure issue. |
|
Hello @jbonofre. Can you estimate when the release of v1.6.0 with this fix is planned? Thanks! |
|
@palkx I sent a message to the mailing list this morning: we have to merge a few PRs and we are good to go with 1.6.0 release. I would like to start the vote end of this week or max beginning of next week. |
|
Thank you for the update! Will wait for the v1.6.0 then. |
This change is an alternative to #7561 for retrying JDBC connections.
First, we currently retry SQLNonTransientExceptions which is a bug. Those are non-retryable by definition, we should retry SQLTransientExceptions.
Next, this change also enables configuring multiple retries since just retrying once may not increase the chance of success sufficiently.
Lastly, this change introduces the ability for a user to configure which SQL error codes they want to retry on on top of the standard retryable error codes. The reason is there's so many different databases out there each with their own status codes that Iceberg shouldn't define all those in the library. This approach allows a user to specify the ones they want to retry via a catalog property called "retryable_status_codes" which is a comma separated list of status codes that are considered retryable.