-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid overwriting status changes #172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this LGTM. But I think I did not understand the following: Do we only want to protect against changes which happen in the k8s API server automagically from outside this process. Then this is probably all OK. Or do we have to protect against changes of our copy of the state against concurrent modifications from a different goroutine? If so, my worries in the comments might be valid. I therefore request changes here, or at least an explanation that satisfies me.
pkg/deployment/context_impl.go
Outdated
// together with the current version of that status. | ||
func (d *Deployment) GetStatus() (api.DeploymentStatus, int32) { | ||
version := atomic.LoadInt32(&d.status.version) | ||
return *d.status.last.DeepCopy(), version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if *d.status.last
(and indeed *d.status.version
) have changed between the execution of line 81 and line 82? Do we not have to check the version again after the DeepCopy?
Or we have to make getting the version and doing a DeepCopy an atomic operation (with a Mutex)...
pkg/deployment/context_impl.go
Outdated
Msg("UpdateStatus version conflict error.") | ||
return maskAny(fmt.Errorf("Status conflict error. Expected version %d, got %d", lastVersion, d.status.version)) | ||
} | ||
d.status.last = *status.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again: What happens if somebody updates version and status between the execution of lines 90 and 98?
if err := d.context.UpdateStatus(status); err != nil { | ||
log.Debug().Err(err).Msg("Failed to update CR status") | ||
return false, maskAny(err) | ||
{ // action.CheckProgress may have changed status, so reload it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this extra pair of braces achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It provides an scope in which we're sure we're using the right status
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thanks.
The point of this PR is NOT to make it go-routine safe.
|
Understood, having another quick look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but see question in comment.
return maskAny(fmt.Errorf("Status conflict error. Expected version %d, got %d", lastVersion, d.status.version)) | ||
} | ||
d.status.version++ | ||
d.status.last = *status.DeepCopy() | ||
if err := d.updateCRStatus(force...); err != nil { | ||
return maskAny(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails because of network failure, then the version is already incremented. Any retry will now fail with a version mismatch. If we do not retry, we have a changed in memory state which is not written to the API service. Maybe we have to rollback the change in memory to allow for a retry?
The current behaviour might or might not be what we want. If we simply move on then the in memory state is different from the one in the API server. I cannot judge the consequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a retry higher up, it MUST call GetStatus()
first to get the latest status and version.
So while a network error will cause a state diff between in-memory & persistent in kube-apiserver, this will not affect future updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
if err := d.context.UpdateStatus(status); err != nil { | ||
log.Debug().Err(err).Msg("Failed to update CR status") | ||
return false, maskAny(err) | ||
{ // action.CheckProgress may have changed status, so reload it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thanks.
There were some cases where status could be overwritten, resulting in member objects being listed twice.
This PR makes modifying the status a very controlled procedure. When you fetch the status you get a last version number. When updating the status, that last version has to match the current version number.