Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid overwriting status changes #172

Merged
merged 4 commits into from
Jun 12, 2018
Merged

Avoid overwriting status changes #172

merged 4 commits into from
Jun 12, 2018

Conversation

ewoutp
Copy link
Contributor

@ewoutp ewoutp commented Jun 12, 2018

There were some cases where status could be overwritten, resulting in member objects being listed twice.
This PR makes modifying the status a very controlled procedure. When you fetch the status you get a last version number. When updating the status, that last version has to match the current version number.

Copy link
Member

@neunhoef neunhoef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this LGTM. But I think I did not understand the following: Do we only want to protect against changes which happen in the k8s API server automagically from outside this process. Then this is probably all OK. Or do we have to protect against changes of our copy of the state against concurrent modifications from a different goroutine? If so, my worries in the comments might be valid. I therefore request changes here, or at least an explanation that satisfies me.

// together with the current version of that status.
func (d *Deployment) GetStatus() (api.DeploymentStatus, int32) {
version := atomic.LoadInt32(&d.status.version)
return *d.status.last.DeepCopy(), version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if *d.status.last (and indeed *d.status.version) have changed between the execution of line 81 and line 82? Do we not have to check the version again after the DeepCopy?
Or we have to make getting the version and doing a DeepCopy an atomic operation (with a Mutex)...

Msg("UpdateStatus version conflict error.")
return maskAny(fmt.Errorf("Status conflict error. Expected version %d, got %d", lastVersion, d.status.version))
}
d.status.last = *status.DeepCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again: What happens if somebody updates version and status between the execution of lines 90 and 98?

if err := d.context.UpdateStatus(status); err != nil {
log.Debug().Err(err).Msg("Failed to update CR status")
return false, maskAny(err)
{ // action.CheckProgress may have changed status, so reload it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this extra pair of braces achieve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It provides an scope in which we're sure we're using the right status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks.

@ewoutp
Copy link
Contributor Author

ewoutp commented Jun 12, 2018

The point of this PR is NOT to make it go-routine safe.
It wants to prevent the following:

  • Fetch status into variable A
  • do something, changing a field of A
  • Fetch status into variable B
  • do something, changing a field of B
  • save A
  • save B ... overwriting the change in A

@neunhoef
Copy link
Member

Understood, having another quick look.

Copy link
Member

@neunhoef neunhoef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but see question in comment.

return maskAny(fmt.Errorf("Status conflict error. Expected version %d, got %d", lastVersion, d.status.version))
}
d.status.version++
d.status.last = *status.DeepCopy()
if err := d.updateCRStatus(force...); err != nil {
return maskAny(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails because of network failure, then the version is already incremented. Any retry will now fail with a version mismatch. If we do not retry, we have a changed in memory state which is not written to the API service. Maybe we have to rollback the change in memory to allow for a retry?
The current behaviour might or might not be what we want. If we simply move on then the in memory state is different from the one in the API server. I cannot judge the consequences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a retry higher up, it MUST call GetStatus() first to get the latest status and version.
So while a network error will cause a state diff between in-memory & persistent in kube-apiserver, this will not affect future updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if err := d.context.UpdateStatus(status); err != nil {
log.Debug().Err(err).Msg("Failed to update CR status")
return false, maskAny(err)
{ // action.CheckProgress may have changed status, so reload it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks.

@ewoutp ewoutp merged commit 2b86891 into master Jun 12, 2018
@ewoutp ewoutp deleted the bugfix/status-resilience branch June 12, 2018 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants