-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(aws-asg): add retry_attempts
configuration to customize the maximum waiting time
#594
Conversation
…ximum waiting time The maximum waiting time for checking the status of the ASG is 2 minutes, which is not configurable. When using the ASG target plugin and downscaling, the ELB connection draining could take more than 2 minutes. If this happens, Nomad will purge the nodes without waiting for the ASG to remove the instances. Logs: ``` 2022-07-28T15:48:51.939Z [ERROR] internal_plugin.aws-asg: failed to ensure all activities completed: action=scale_in asg_name=preproduction/nomad error="reached retry limit" 2022-07-28T15:48:52.123Z [INFO] internal_plugin.aws-asg: successfully purged Nomad node: node_id=34862212-f89a-b107-903d-f4036a44543d nomad_evals=["88a3c562-31e6-f303-cae3-1289bd924958", "b0b08d76-0c2c-e5f3-1b8d-d6f032cd89aa"] ``` So, this commit adds a new config property called `retry_attempts`, which is the maximum number of attempts the plugin will make to ensure that the ASG has completed all its actions. Apart from that, if this PR is accepted, I will update the [documentation](https://www.nomadproject.io/tools/autoscaling/plugins/target/aws-asg#policy-configuration-options) accordingly. Resolves hashicorp#518
@@ -18,10 +19,18 @@ import ( | |||
|
|||
const ( | |||
defaultRetryInterval = 10 * time.Second | |||
defaultRetryLimit = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this from here since there will be a configuration for it.
|
||
// configValues are the default values used when a configuration key is not | ||
// supplied by the operator that are specific to the plugin. | ||
configValueRegionDefault = "us-east-1" | ||
configValueRegionDefault = "us-east-1" | ||
configValueRetryAttemptsDefault = "15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config
values must be strings. Hence the default value should also be a string.
retry_attempts
configuration to customize the ma…retry_attempts
configuration to customize the maximum waiting time
By reading and parsing the value of `retry_attempts` earlier in `SetConfig` we can detect invalid configuration values (such as non-interger strings) earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @alopezsanchez, and apologies for the long delay in getting this reviewed.
I pushed a commit with some small refactoring to parse the config value earlier in the SetConfig
method. This allows for quicker detection of invalid values (like non-integer strings), otherwise the problem would only be caught at scaling time.
func getConfigValue(config map[string]string, key string, defaultValue string) string { | ||
value, ok := config[key] | ||
if !ok { | ||
return defaultValue | ||
} | ||
|
||
return value | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice little helper, good one 🙂
The maximum waiting time for checking the status of the ASG is 2 minutes, which is not configurable.
When using the ASG target plugin and downscaling, the ELB connection draining could take more
than 2 minutes. If this happens, Nomad will purge the nodes without waiting for the ASG to remove
the instances.
Logs:
So, this commit adds a new config property called
retry_attempts
, which is the maximum numberof attempts the plugin will make to ensure that the ASG has completed all its actions.
Apart from that, if this PR is accepted, I will update the documentation accordingly.
Resolves #518