Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(aws-asg): add retry_attempts configuration to customize the maximum waiting time #594

Merged
merged 3 commits into from
Dec 1, 2022

Conversation

alopezsanchez
Copy link
Contributor

@alopezsanchez alopezsanchez commented Aug 9, 2022

The maximum waiting time for checking the status of the ASG is 2 minutes, which is not configurable.
When using the ASG target plugin and downscaling, the ELB connection draining could take more
than 2 minutes. If this happens, Nomad will purge the nodes without waiting for the ASG to remove
the instances.

Logs:

2022-07-28T15:48:51.939Z [ERROR] internal_plugin.aws-asg: failed to ensure all activities completed: action=scale_in asg_name=preproduction/nomad error="reached retry limit"
2022-07-28T15:48:52.123Z [INFO]  internal_plugin.aws-asg: successfully purged Nomad node: node_id=34862212-f89a-b107-903d-f4036a44543d nomad_evals=["88a3c562-31e6-f303-cae3-1289bd924958", "b0b08d76-0c2c-e5f3-1b8d-d6f032cd89aa"]

So, this commit adds a new config property called retry_attempts, which is the maximum number
of attempts the plugin will make to ensure that the ASG has completed all its actions.

Apart from that, if this PR is accepted, I will update the documentation accordingly.

Resolves #518

…ximum waiting time

The maximum waiting time for checking the status of the ASG is 2 minutes, which is not configurable.
When using the ASG target plugin and downscaling, the ELB connection draining could take more
than 2 minutes. If this happens, Nomad will purge the nodes without waiting for the ASG to remove
the instances.

Logs:
```
2022-07-28T15:48:51.939Z [ERROR] internal_plugin.aws-asg: failed to ensure all activities completed: action=scale_in asg_name=preproduction/nomad error="reached retry limit"
2022-07-28T15:48:52.123Z [INFO]  internal_plugin.aws-asg: successfully purged Nomad node: node_id=34862212-f89a-b107-903d-f4036a44543d nomad_evals=["88a3c562-31e6-f303-cae3-1289bd924958", "b0b08d76-0c2c-e5f3-1b8d-d6f032cd89aa"]
```

So, this commit adds a new config property called `retry_attempts`, which is the maximum number
of attempts the plugin will make to ensure that the ASG has completed all its actions.

Apart from that, if this PR is accepted, I will update the [documentation](https://www.nomadproject.io/tools/autoscaling/plugins/target/aws-asg#policy-configuration-options) accordingly.

Resolves hashicorp#518
@hashicorp-cla
Copy link

hashicorp-cla commented Aug 9, 2022

CLA assistant check
All committers have signed the CLA.

@@ -18,10 +19,18 @@ import (

const (
defaultRetryInterval = 10 * time.Second
defaultRetryLimit = 15
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this from here since there will be a configuration for it.


// configValues are the default values used when a configuration key is not
// supplied by the operator that are specific to the plugin.
configValueRegionDefault = "us-east-1"
configValueRegionDefault = "us-east-1"
configValueRetryAttemptsDefault = "15"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config values must be strings. Hence the default value should also be a string.

@alopezsanchez alopezsanchez marked this pull request as ready for review August 9, 2022 14:02
@alopezsanchez alopezsanchez changed the title feat(aws-asg): add retry_attempts configuration to customize the ma… feat(aws-asg): add retry_attempts configuration to customize the maximum waiting time Aug 9, 2022
By reading and parsing the value of `retry_attempts` earlier in
`SetConfig` we can detect invalid configuration values (such as
non-interger strings) earlier.
Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @alopezsanchez, and apologies for the long delay in getting this reviewed.

I pushed a commit with some small refactoring to parse the config value earlier in the SetConfig method. This allows for quicker detection of invalid values (like non-integer strings), otherwise the problem would only be caught at scaling time.

Comment on lines +236 to +243
func getConfigValue(config map[string]string, key string, defaultValue string) string {
value, ok := config[key]
if !ok {
return defaultValue
}

return value
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice little helper, good one 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aws-asg: configureable retry interval/retry
3 participants