-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolator recovery is problematic with legacy mounts. #95
Comments
when this was developed, I did a test:
I did not observe a cleanup call related to the task -resulting in a mount leak What are the guarantees in both a slave service crash and a full cluster node restart? The current implementation is based on the assumption that if the container is still running, it will be in the orphans list on the recover call. If it is on the orphans list, unmount is NOT called. Are you saying that there is a scenario where a container can remain running and not be in the orphans list? |
A container is called orphan if it's not in 'ContainerState' but the launcher detects it (e.g., through cgroups). This is a common when the operator wants to do an upgrade that requires killing all existing containers. THe operator usually does by stop the agent, wipe the metadata directory, and restart the slave. In that case, 'inUseMounts' does not contain the mounts for that orphan container. Since you checkpointed it, it'll be considered as a 'legacyMounts', and umount will be called on it even though the launcher hasn't cleaned it up yet. |
OK, so I understand you are saying the mount will be present, and ContainerState will NOT contain a record for the container. Current recover implementation is: Unmount is called for any mount that is in legacyMounts but not in inUseMounts. So the change you propose would result in no unmount calls in recover() under any circumstances. This might be OK for the scenario you outlined of stopping the agent, wiping the metadata directory and restarting, assuming cleanup will always be called for every container not running at restart. But what about other scenarios:
|
For 1b case, the container will still be in ContainerState list because the agent believes that it's still running. The containerizer will reap on the checkpointed pid and find out that the container is terminated. It'll subsequently call 'destroy', resulting in 'cleanup' being called. |
The isolator determines the 'legacy mounts' (no active container is using that) during slave recovery by just looking at the checkpointed state and will umount those legacy mounts immediately. However, for known orphans, they might not be killed yet (known orphans are those containers that are known by the launcher). The Mesos agent will do an async cleanup on those known orphan containers.
Umounting the volumes while those orphan containers are still using them might be problematic. The correct way is to wait for 'cleanup' to be called for those known orphan containers (i.e., still create Info struct for those orphan containers, and do proper cleanup in 'cleanup' function).
The text was updated successfully, but these errors were encountered: