-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn CrashloopBackoff
machines to Running
quicker
#806
Conversation
/invite @himanshu-kun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick PR
2 points:
- kindly open draft PR for these kind of reviews
- address these review comments first and then I'll let you know about the test cases to add.
@@ -109,6 +110,8 @@ func (s *MCServer) AddFlags(fs *pflag.FlagSet) { | |||
fs.StringVar(&s.NodeConditions, "node-conditions", s.NodeConditions, "List of comma-separated/case-sensitive node-conditions which when set to True will change machine to a failed state after MachineHealthTimeout duration. It may further be replaced with a new machine if the machine is backed by a machine-set object.") | |||
fs.StringVar(&s.BootstrapTokenAuthExtraGroups, "bootstrap-token-auth-extra-groups", s.BootstrapTokenAuthExtraGroups, "Comma-separated list of groups to set bootstrap token's \"auth-extra-groups\" field to") | |||
|
|||
logs.AddFlags(fs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed , didn't we deal with this in an earlier PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the provider to work. We didn't handle this in the earlier PR. I thought a separate PR is not needed for this, so I added it here only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok in that case could you add a small comment above it saying it adds --v
flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will add a comment
} else if clone.Status.CurrentStatus.Phase == v1alpha1.MachineCrashLoopBackOff { | ||
return machineutils.ShortRetry, fmt.Errorf("node object not yet created for Machine %s", machine.Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this , we could update the if
condition where reconcileMachineHealth
is called , to ignore CrashloopBackoff
machine.
This will save us a shortRetry , as the triggerCreationFlow
will be called directly
It makes sense as we reconcileMachineHealth should only be entered for a machine with Unknown
or Pending
or Running
status machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say we don't call reconcileMachineHealth
, then also we will have a longRetry
from the machine reconciliation flow
return machineutils.LongRetry, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but this is the behaviour which used to be there before also
When node is created, then the event for it should again send the machine for reconciliation because now machine obj will be in Pending
state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be in a Pending
state. Even if the machine creation is successful, the object will still be in a CrashLoopBackOff
state. This is because of
if machine.Status.CurrentStatus.Phase == "" { |
triggerCreationFlow
to change the status or add an explicit condition in machine reconciliation flow to have a shortRetry
for CLBF machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is here the condition status.Node != nodeName
was removed. This condition enabled transition of CrashLoopBackoff
-> Pending
if VM creation is successful.
- So we should fix this.
- Kindly also add a test case for the same
- update the docstring for
CrashLoopBackoff
state, explicitly telling that this states means there are no infra resources
} else if clone.Status.CurrentStatus.Phase == v1alpha1.MachineCrashLoopBackOff { | ||
return machineutils.ShortRetry, fmt.Errorf("node object not Ready for Machine %s", machine.Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/needs cherry-pick rel-v0.48 |
@himanshu-kun Label needs/rel-v0.48 does not exist. |
…unning` quicker (#807) * fix reconcileMachineHealth for clbf machines * address review comments * add unit test for clbf to pending machine
What this PR does / why we need it:
This PR turns
CLBF
machines toPending
if thedriver.CreateMachine
call was successful in the previous reconciliation. This will help thereconcileMachineHealth
to turn thePending
to theRunning
phase when the machine is reconciled due to node object creation.Which issue(s) this PR fixes:
Fixes #805
Special notes for your reviewer:
Release note: