Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

remove nodes from db upon reimage #1005

Merged

Conversation

bmc-msft
Copy link
Contributor

@bmc-msft bmc-msft commented Jun 18, 2021

The flag Node.reimage_queued is intended to stop nodes from reimaging repeatedly.

In #970, in order to work around Azure API failures, this flag was cycled if the node was already set to cleanup. Unfortunately, reimaging can sometimes take a significant amount of time, causing this change to get nodes multiple times.

Instead of using reimage_queued as a flag, this PR deletes the node from the storage table upon reimage. When the node registers OR the next time through Scaleset.cleanup_nodes, the Node will be recreated automatically, whichever comes first.

@bmc-msft bmc-msft changed the title only reset reimage_queued on expired heartbeats remove nodes from db upon reimage Jun 19, 2021
@bmc-msft bmc-msft marked this pull request as ready for review June 21, 2021 21:11
Copy link
Member

@ranweiler ranweiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this moves us from a more explicit model (where we flag reimage-queued nodes) to an implicit one, where we rely on some expected future behavior to cause the right end result. I find the correctness of this very hard to reason about.

Is there another variation we could use? Would it be better to add more state to reimage_request, e.g. some form of counter, or a new state like reimage_finishing?

@ghost
Copy link

ghost commented Jun 23, 2021

Hello @bmc-msft!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 5f8e423 into microsoft:main Jun 23, 2021
@bmc-msft bmc-msft deleted the only-reset-reimage-queued-on-heartbeat-expired branch June 23, 2021 22:25
@ghost ghost locked as resolved and limited conversation to collaborators Jul 24, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants