-
Notifications
You must be signed in to change notification settings - Fork 334
More sensible BasePolarisCatalog retry behavior #1046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More sensible BasePolarisCatalog retry behavior #1046
Conversation
service/common/src/main/java/org/apache/polaris/service/catalog/BasePolarisCatalog.java
Outdated
Show resolved
Hide resolved
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
30b91e2 to
d475c5a
Compare
| public static final PolarisConfiguration<Integer> MAX_METADATA_REFRESH_RETRIES = | ||
| PolarisConfiguration.<Integer>builder() | ||
| .key("MAX_METADATA_REFRESH_RETRIES") | ||
| .description( | ||
| "How many times to retry refreshing metadata when the previous error was retryable") | ||
| .defaultValue(12) | ||
| .build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the backward compatibility,
[doubt] These retries are on top of SDK retries ?
why are we did we go with a such high number ? default for hive catalog impl is 2 more details
: apache/iceberg#5510 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These retries are on top of SDK retries
I believe so. This is done by Iceberg Task under the hood. Is it possible to make the default a reasonable number while changing the current deployment config to 12 for compatibility? The concern is that users normally don't change the default value until they hit issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current deployment config
Curious what you mean by this. Do you mean setting this in one of the k8s/application properties files?
I'd be very much in favor of lowering the default as 12 is quite large and potentially dangerous. I'd be surprised if anyone is getting benefit from it being so high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean setting this in one of the k8s/application properties files?
I was talking about the Snowfalke's Polaris deployment.
I'd be surprised if anyone is getting benefit from it being so high.
I believe not. The only compatibility concern is the current production deployment, including Snowflake, Zsaler, etc. Given that we don't have a release yet, the impact will manageable. In general, it should be fine to prefer a reasonable retry number(2 or 3) to compatibility. cc @MonkeyCanCode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it to 2!
service/common/src/main/java/org/apache/polaris/service/catalog/BasePolarisCatalog.java
Outdated
Show resolved
Hide resolved
service/common/src/main/java/org/apache/polaris/service/catalog/BasePolarisCatalog.java
Outdated
Show resolved
Hide resolved
service/common/src/main/java/org/apache/polaris/service/exception/IcebergExceptionMapper.java
Show resolved
Hide resolved
| public static final PolarisConfiguration<Integer> MAX_METADATA_REFRESH_RETRIES = | ||
| PolarisConfiguration.<Integer>builder() | ||
| .key("MAX_METADATA_REFRESH_RETRIES") | ||
| .description( | ||
| "How many times to retry refreshing metadata when the previous error was retryable") | ||
| .defaultValue(12) | ||
| .build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These retries are on top of SDK retries
I believe so. This is done by Iceberg Task under the hood. Is it possible to make the default a reasonable number while changing the current deployment config to 12 for compatibility? The concern is that users normally don't change the default value until they hit issues.
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @andrew4699 !
| /** Signifies that we could not extract an HTTP code from a given cloud exception */ | ||
| public static final int UNKNOWN_CLOUD_HTTP_CODE = -1; | ||
|
|
||
| @VisibleForTesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could remove this tag as it needs to be public for both non-test code and test code.
| return true; | ||
| } | ||
| return false; | ||
| /** Helper to retrieve the max number of metadata refresh retries */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I guess this comment isn't necessary as the method name is descriptive enough.
| Arguments.of( | ||
| new HttpResponseException("", new FakeAzureHttpResponse(code), ""), | ||
| true))) | ||
| .flatMap(Function.identity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I believe there's a "concat" method for Streams.
* More sensible BasePolarisCatalog retry behavior * Reuse AWS/GCP retryable property * Update default to 2
…1046)" (apache#32) * More sensible BasePolarisCatalog retry behavior (apache#1046) * More sensible BasePolarisCatalog retry behavior * Reuse AWS/GCP retryable property * Update default to 2 * Fix conflicts --------- Co-authored-by: Andrew Guterman <andrew.guterman1@gmail.com>
Previously BasePolarisCatalog had very aggressive retries. It would do up to 12 retries and would retry unauthorized, not found, and other typically non-retryable errors.
This PR makes the retry behavior more sensible. I added a config for the max number of retries, with a default value of 12 for backward compatibility. This is an improvement over the hard-coded value of 12. I also restricted the retries to timeouts, rate limits, 500, 503, and unknown errors.