Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

Fix get_dead_nodes query #1054

Merged
merged 2 commits into from
Jul 9, 2021
Merged

Fix get_dead_nodes query #1054

merged 2 commits into from
Jul 9, 2021

Conversation

chkeita
Copy link
Contributor

@chkeita chkeita commented Jul 9, 2021

partial fix for #1028

@bmc-msft
Copy link
Contributor

bmc-msft commented Jul 9, 2021

I had not considered using Timestamp (which is effectively "last updated") as a solution. Interesting.

Can this include updating the log & Error messages to differentiate between a node with an expired heartbeat and a node that never generated a heartbeat?

I'm think about this:

logging.info(
SCALESET_LOG_PREFIX
+ "reimaging nodes with expired heartbeats. "
+ "scaleset_id:%s nodes:%s",
self.scaleset_id,
",".join(str(x.machine_id) for x in dead_nodes),
)

and this:

errors=[
"node reimaged due to expired heartbeat",
f"scaleset_id:{node.scaleset_id} machine_id:{node.machine_id}",
f"last heartbeat:{node.heartbeat}",
],

@bmc-msft bmc-msft merged commit 89b7d13 into microsoft:main Jul 9, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants