-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[discovery-ec2] Plugin doesn't retry EC2 describe instance calls #50462
Comments
Pinging @elastic/es-distributed (:Distributed/Discovery-Plugins) |
Do you mind if I can raise a PR for the same @original-brownbear . It's a small change I'm fine either ways |
@Bukhtawar I'm not sure this is entirely a small change. It seems we're not just using the EC2 SDK to make calls to the EC2 metadata service but also use (I ran into this when I tried to test retries by faking |
@original-brownbear Please correct me if I misunderstood your point but aren't the two calls made to different endpoint and hence logically different entities.
While for (1) we need retries for cases when a large cluster(100 node or more) loses a master and then PeerFinder on all nodes do a per second EC2 call causing account level limits to be breached. Testing |
Thanks @Bukhtawar, you are correct I think. (Thanks @DaveCTurner for chiming in here!). I think I found a way for a proper test and was will raise a PR for this one shortly. We have a bunch of HTTP mocking infrastructure from the snapshot repository tests that we can reuse here and I think I was able to craft a valid test from that :) |
Thanks @original-brownbear and @DaveCTurner. I had one more proposal wrt to the backoff strategy. It is to use the SDKDefaultBackoffStrategy which has equal/full jitters configured for throttle/non-throttle exceptions in the newer AWS SDK versions. Defaults for some configurations can be overridden as below.
|
Use the default retry condition instead of never retrying in the discovery plugin causing hot retries upstream and add a test that verifies retrying works. Closes elastic#50462
Use the default retry condition instead of never retrying in the discovery plugin causing hot retries upstream and add a test that verifies retrying works. Closes #50462
Use the default retry condition instead of never retrying in the discovery plugin causing hot retries upstream and add a test that verifies retrying works. Closes elastic#50462
Use the default retry condition instead of never retrying in the discovery plugin causing hot retries upstream and add a test that verifies retrying works. Closes elastic#50462
Problem Description
Discovery EC2 doesn't retry EC2 describe calls on throttle or exceptions due to
NO_RETRY_CONDITION
which has the potential to make the throttling worse. Although there is a logic to exponentially backoff in such cases, backoff doesn't kick in.elasticsearch/plugins/discovery-ec2/src/main/java/org/elasticsearch/discovery/ec2/AwsEc2ServiceImpl.java
Lines 82 to 90 in 7203cee
AWS reference
The text was updated successfully, but these errors were encountered: