-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State race when updating container config #5445
Comments
Update:
|
Log bundle with portlayer and hostd logs that capture the time of failure. Took a quick scan of hostd but didn't see obvious issue, barring the fact that the ChangeVersions in concurrent access reports from hostd don't show up in the portlayer log. Will ensure those are logged. Considering a two pronged approach to this:
|
Managed to recreate locally with trivia logging for vimsvc and vmsvc. hostd: Initial item of interest is the fact that we see the read-only keys being deleted here:
which is the result of us sending a full ExtraConfig during this reconfigure when the VM is on (change version: 2017-06-19T16:10:26.408734Z):
The specific point of interest is that the Handle we have in the port layer with that change version thinks the power state is off:
Hypotheses:
|
Checking ChangeVersion with default VM created by govc - we can see that power on updates ChangeVersion but power off does not.
|
Ordering of operations: power on dispatch:
Getting runtime state for update of Handle:
power on success:
reconfigure dispatch:
start of volatile entry updates:
reconfigure success:
|
Added #4922 as that should be addressed by the same workaround - modifying the reconfigure spec to only include changed keys. |
PR is in WIP because has dependencies on other unmerged PRs. Moving to verify anyway for ci input. |
* Updates extraconfig logging config * Fixes syslogd configuration and reduces some log verbosity * Adds support for delta sets of extraconfig keys * Stops overwriting exit code on CLI stop call * Generate stop event after state refresh to get consistent state * Adjusts the syslog queue size to deal with excessive overflows. * Correct handle refreshFromHandle behaviour on the power off path. * Disables the exit code check for migrated containers until #5653 is addressed.
From: #4872
We have observed one incidence in the logs of a container having its state damaged.
From the logs we can see multiple concurrent operations:
exit
in the shellMy hypothesis is that the powerOn from the restart (3) is occuring just after the powerOff operation from (2) has succeeded, but before the container configuration is updated with the revised StoppedTime.
This results in us using a full config as that's based on the target power state, not the non-persistent portion suited to a powered on cVM. This hypothesis was generated by code inspection with input from the logs.
@cgtexmex and I have been unable to recreate this failure remotely reliably with released codebase, so I have injected code to manipulate the timings. This has resulted in the expected
SetVolatileEntry
lines in the containervmware.log
.In specific, this adds a sleep of 5s before https://github.com/vmware/vic/blob/master/lib/portlayer/exec/commit.go#L119 and races:
hickeng update: this can be mostly addressed in VIC code, but not completely without bug1898149 - I will let this issue get closed by the work around, and keep the platform issue open.
The text was updated successfully, but these errors were encountered: