Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolator recovery is problematic with legacy mounts. #95

Open
jieyu opened this issue Apr 14, 2016 · 4 comments
Open

Isolator recovery is problematic with legacy mounts. #95

jieyu opened this issue Apr 14, 2016 · 4 comments
Assignees
Labels
Milestone

Comments

@jieyu
Copy link

jieyu commented Apr 14, 2016

The isolator determines the 'legacy mounts' (no active container is using that) during slave recovery by just looking at the checkpointed state and will umount those legacy mounts immediately. However, for known orphans, they might not be killed yet (known orphans are those containers that are known by the launcher). The Mesos agent will do an async cleanup on those known orphan containers.

Umounting the volumes while those orphan containers are still using them might be problematic. The correct way is to wait for 'cleanup' to be called for those known orphan containers (i.e., still create Info struct for those orphan containers, and do proper cleanup in 'cleanup' function).

@cantbewong
Copy link
Contributor

when this was developed, I did a test:

  1. start a task with an external mount
  2. Kill the Mesos slave process to simulate a crash..
  3. The task terminates
  4. Mesos slave service is restarted

I did not observe a cleanup call related to the task -resulting in a mount leak

What are the guarantees in both a slave service crash and a full cluster node restart?
Should cleanup always be called for every container that was running before the crash or restart?

The current implementation is based on the assumption that if the container is still running, it will be in the orphans list on the recover call. If it is on the orphans list, unmount is NOT called. Are you saying that there is a scenario where a container can remain running and not be in the orphans list?

@jieyu
Copy link
Author

jieyu commented Apr 14, 2016

A container is called orphan if it's not in 'ContainerState' but the launcher detects it (e.g., through cgroups). This is a common when the operator wants to do an upgrade that requires killing all existing containers. THe operator usually does by stop the agent, wipe the metadata directory, and restart the slave. In that case, 'inUseMounts' does not contain the mounts for that orphan container. Since you checkpointed it, it'll be considered as a 'legacyMounts', and umount will be called on it even though the launcher hasn't cleaned it up yet.

@cantbewong
Copy link
Contributor

OK, so I understand you are saying the mount will be present, and ContainerState will NOT contain a record for the container.

Current recover implementation is:
mount will be in legacyMounts
mount will not be in inUseMounts

Unmount is called for any mount that is in legacyMounts but not in inUseMounts.
There is no other Unmount processing.

So the change you propose would result in no unmount calls in recover() under any circumstances.

This might be OK for the scenario you outlined of stopping the agent, wiping the metadata directory and restarting, assuming cleanup will always be called for every container not running at restart.

But what about other scenarios:

  1. Kill -9 (simulate crash) the agent and restart. No metadata directory wipe.
    1a. container still running when agent restarts - should be OK, I expect container will be on the ContainerState list and no unmount will be called, which is what you want.
    1b. container not still running when agent restarts - seems like you need an unmount, unless the the agent will do a cleanup() call for the container. Will this happen?

@jieyu
Copy link
Author

jieyu commented Apr 15, 2016

For 1b case, the container will still be in ContainerState list because the agent believes that it's still running. The containerizer will reap on the checkpointed pid and find out that the container is terminated. It'll subsequently call 'destroy', resulting in 'cleanup' being called.
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L573

@cantbewong cantbewong added this to the 0.4.3 milestone Apr 18, 2016
@cantbewong cantbewong added the bug label Apr 18, 2016
@cantbewong cantbewong self-assigned this Apr 19, 2016
@davidvonthenen davidvonthenen modified the milestones: 0.4.4, 0.4.3 Jun 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants