-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add/lifecycle heartbeat #1116
Merged
LikithaVemulapalli
merged 31 commits into
aws:main
from
hyeong01:add/lifecycle-heartbeat
Jan 29, 2025
Merged
Add/lifecycle heartbeat #1116
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
d991814
add lifecycle heartbeat
0e4b686
Lifecycle heartbeat unit test
1fbd7cb
Refactor heartbeat logging statements
d0f1ef4
Heartbeat e2e test
df0696f
Merge branch 'aws:main' into add/lifecycle-heartbeat
hyeong01 d7f8e07
Remove error handling for using heartbeat and imds together
a6cfd89
add e2e test for lifecycle heartbeat
64e9cff
Add check heartbeat timeout and compare to heartbeat interval
d3047a0
Add error handling for using heartbeat and imds together
559adc3
fix config error message
7012bab
update error message for heartbeat config
bc79eb7
Fix heartbeat flag explanation
75400a9
Update readme for new heartbeat feature
bbddcfa
Fix readme for heartbeat section
029fdf7
Update readme on the concurrency of heartbeat
56b3f55
fix: stop heartbeat when target is invalid
7221ed2
Added heartbeat test for handling invalid lifecycle action
4bcb916
incorporated unsupoorted error types for unit testing
4ff40d9
fix unit-test: reset heartbeatCallCount each test
fe7fcc1
Merge branch 'aws:main' into add/lifecycle-heartbeat
hyeong01 265828d
use helper function to reduce repetitive code in heartbeat unit test
044fc3a
Update readme. Moved heartbeat under Queue Processor
2732775
Fix config.go for better readability and check until < interval
0492976
Update heartbeat to have better logging
1631bb6
Update unit test to cover whole process of heartbeat start and closure
b41751d
Update heartbeat e2e test. Auto-value calculations for future modific…
9e3fe77
Add inline comment for heartbeatUntil default behavior
dbdeec1
Fixed e2e variables to have double quotes
80b88a4
fix readme for heartbeat
9c54964
Added new flags in config test
56ea41d
Fixed typo in heartbeat e2e test
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -112,6 +112,9 @@ const ( | |
queueURLConfigKey = "QUEUE_URL" | ||
completeLifecycleActionDelaySecondsKey = "COMPLETE_LIFECYCLE_ACTION_DELAY_SECONDS" | ||
deleteSqsMsgIfNodeNotFoundKey = "DELETE_SQS_MSG_IF_NODE_NOT_FOUND" | ||
// heartbeat | ||
heartbeatIntervalKey = "HEARTBEAT_INTERVAL" | ||
heartbeatUntilKey = "HEARTBEAT_UNTIL" | ||
) | ||
|
||
// Config arguments set via CLI, environment variables, or defaults | ||
|
@@ -166,6 +169,8 @@ type Config struct { | |
CompleteLifecycleActionDelaySeconds int | ||
DeleteSqsMsgIfNodeNotFound bool | ||
UseAPIServerCacheToListPods bool | ||
HeartbeatInterval int | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to have some coverage around these newly added configs in config-test.go file... |
||
HeartbeatUntil int | ||
} | ||
|
||
// ParseCliArgs parses cli arguments and uses environment variables as fallback values | ||
|
@@ -230,6 +235,8 @@ func ParseCliArgs() (config Config, err error) { | |
flag.IntVar(&config.CompleteLifecycleActionDelaySeconds, "complete-lifecycle-action-delay-seconds", getIntEnv(completeLifecycleActionDelaySecondsKey, -1), "Delay completing the Autoscaling lifecycle action after a node has been drained.") | ||
flag.BoolVar(&config.DeleteSqsMsgIfNodeNotFound, "delete-sqs-msg-if-node-not-found", getBoolEnv(deleteSqsMsgIfNodeNotFoundKey, false), "If true, delete SQS Messages from the SQS Queue if the targeted node(s) are not found.") | ||
flag.BoolVar(&config.UseAPIServerCacheToListPods, "use-apiserver-cache", getBoolEnv(useAPIServerCache, false), "If true, leverage the k8s apiserver's index on pod's spec.nodeName to list pods on a node, instead of doing an etcd quorum read.") | ||
flag.IntVar(&config.HeartbeatInterval, "heartbeat-interval", getIntEnv(heartbeatIntervalKey, -1), "The time period in seconds between consecutive heartbeat signals. Valid range: 30-3600 seconds (30 seconds to 1 hour).") | ||
flag.IntVar(&config.HeartbeatUntil, "heartbeat-until", getIntEnv(heartbeatUntilKey, -1), "The duration in seconds over which heartbeat signals are sent. Valid range: 60-172800 seconds (1 minute to 48 hours).") | ||
flag.Parse() | ||
|
||
if isConfigProvided("pod-termination-grace-period", podTerminationGracePeriodConfigKey) && isConfigProvided("grace-period", gracePeriodConfigKey) { | ||
|
@@ -274,6 +281,27 @@ func ParseCliArgs() (config Config, err error) { | |
panic("You must provide a node-name to the CLI or NODE_NAME environment variable.") | ||
} | ||
|
||
// heartbeat value boundary and compability check | ||
if !config.EnableSQSTerminationDraining && (config.HeartbeatInterval != -1 || config.HeartbeatUntil != -1) { | ||
return config, fmt.Errorf("currently using IMDS mode. Heartbeat is only supported for Queue Processor mode") | ||
} | ||
if config.HeartbeatInterval != -1 && (config.HeartbeatInterval < 30 || config.HeartbeatInterval > 3600) { | ||
return config, fmt.Errorf("invalid heartbeat-interval passed: %d Should be between 30 and 3600 seconds", config.HeartbeatInterval) | ||
} | ||
if config.HeartbeatUntil != -1 && (config.HeartbeatUntil < 60 || config.HeartbeatUntil > 172800) { | ||
return config, fmt.Errorf("invalid heartbeat-until passed: %d Should be between 60 and 172800 seconds", config.HeartbeatUntil) | ||
} | ||
if config.HeartbeatInterval == -1 && config.HeartbeatUntil != -1 { | ||
return config, fmt.Errorf("invalid heartbeat configuration: heartbeat-interval is required when heartbeat-until is set") | ||
} | ||
if config.HeartbeatInterval != -1 && config.HeartbeatUntil == -1 { | ||
config.HeartbeatUntil = 172800 | ||
log.Info().Msgf("Since heartbeat-until is not set, defaulting to %d seconds", config.HeartbeatUntil) | ||
} | ||
if config.HeartbeatInterval != -1 && config.HeartbeatUntil != -1 && config.HeartbeatInterval > config.HeartbeatUntil { | ||
return config, fmt.Errorf("invalid heartbeat configuration: heartbeat-interval should be less than or equal to heartbeat-until") | ||
} | ||
|
||
// client-go expects these to be set in env vars | ||
os.Setenv(kubernetesServiceHostConfigKey, config.KubernetesServiceHost) | ||
os.Setenv(kubernetesServicePortConfigKey, config.KubernetesServicePort) | ||
|
@@ -332,6 +360,8 @@ func (c Config) PrintJsonConfigArgs() { | |
Str("ManagedTag", c.ManagedTag). | ||
Bool("use_provider_id", c.UseProviderId). | ||
Bool("use_apiserver_cache", c.UseAPIServerCacheToListPods). | ||
Int("heartbeat_interval", c.HeartbeatInterval). | ||
Int("heartbeat_until", c.HeartbeatUntil). | ||
Msg("aws-node-termination-handler arguments") | ||
} | ||
|
||
|
@@ -383,7 +413,9 @@ func (c Config) PrintHumanConfigArgs() { | |
"\tmanaged-tag: %s,\n"+ | ||
"\tuse-provider-id: %t,\n"+ | ||
"\taws-endpoint: %s,\n"+ | ||
"\tuse-apiserver-cache: %t,\n", | ||
"\tuse-apiserver-cache: %t,\n"+ | ||
"\theartbeat-interval: %d,\n"+ | ||
"\theartbeat-until: %d\n", | ||
c.DryRun, | ||
c.NodeName, | ||
c.PodName, | ||
|
@@ -424,6 +456,8 @@ func (c Config) PrintHumanConfigArgs() { | |
c.UseProviderId, | ||
c.AWSEndpoint, | ||
c.UseAPIServerCacheToListPods, | ||
c.HeartbeatInterval, | ||
c.HeartbeatUntil, | ||
) | ||
} | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a line that explains when this feature would be useful: e.g. When a customer has pods that have long-running drain tasks.