The isolator should checkpoint mounts BEFORE the actual mount operation. #93

jieyu · 2016-04-14T18:08:40Z

Currently, the isolator checkpoints the mount AFTER doing the actual mount. The consequence of that if the slave crashes after doing the mount before the checkpointing, it loses track of that mount during recovery. One obvious issue is mount leaking. There might be other issues.

cantbewong · 2016-04-14T19:18:10Z

Your statement on the order of checkpointing occurring after the mount is correct

Order is:

mount at OS level
queue "Future" bind isolation mount for container
record checkpoint

Note that the checkpoint record includes the munt location which is NOT deterministic and can only be recorded after the mount.

The mountpoint is there to handle cases where multiple containers a=on the agent request the idetical volume. When this happens, the isolator needs to lookup the OS mountpoint for the bind mount.

The design accepted the risk of a mount leak on the premise that pre-emptive mount would remedy this under the rare circumstance of a slave service crash.

If we go with a checkpoint first model, the checkpoint would need to be a 2 phase approach of first recording only provider + volume name as a "tenatative mount" and then recording full details. The "tentative mount" info would be enough to do an unmount during recover()

jieyu · 2016-04-14T20:21:47Z

Do you need to checkpoint mount point location? Or you just need provider + volume.

In other words, if your recovery just need provider + volume, why are you checkpointing the mount point (and other informations)?

cantbewong · 2016-04-14T21:01:43Z

The bind mount needs the mount point location. Provider + volume is a usually unique identifier but is not sufficient to make a bind call by itself. It might be possible to make a query to the provider but this takes time - and may result in no response adding the complexity that you would want this asynchronous. And some providers, such as Amazon allow duplicate volume names, meaning that the query is only usually reliable as opposed to 100% reliable.

Other information (mount options) was checkpointed for troubleshooting and for the possibility of warning generation should two containers request mount of the same volume with different options. This warning generation is not implemented at this time

cantbewong · 2016-04-14T21:05:45Z

So to make it clear, when a second container arrives, requesting a mount of a volume that is already mounted and in use by another container, we need a way to determine the mount point location returned by the mount performed for the first container

jieyu · 2016-04-15T02:41:51Z

Just to clarify, that duplicated volume names sounds weird to me. What's the semantics? Say you mount the same volume name (same driver) twice, there'll be two mounts in the host mount table? What happens if I do a subsequent unmount? Will that unmount both? or just one of them? If just one of them, which one will be unmounted?

Now I see that you need to checkpoint the mount points. Thanks for the explanation.

jieyu · 2016-04-15T16:21:01Z

I looked at the docker volume specification:
https://docs.docker.com/engine/extend/plugins_volume/

/VolumeDriver.Mount
"Docker requires the plugin to provide a volume, given a user specified volume name. This is called once per container start. If the same volume_name is requested more than once, the plugin may need to keep track of each new mount request and provision at the first mount request and deprovision at the last corresponding unmount request."

It looks to me that the ref counting should be handled by the plugin (e.g., rexray), rather than the container runtime. The container runtime will simply call 'Mount' during container startup and 'Unmount' during container teardown. Given that, I am wondering why we are doing mount tracking in the isolator?

clintkitson · 2016-04-15T17:24:43Z

The Docker engine does not keep track of state or mounted volume/container
counts today. But this isn't to say this is ideal.

For example, what if the Docker engine crashes and along with it the
containers. The plugin would have zero knowledge about this occurring other
than in the form of more requests and a higher tracking count. This will
lead to the volume never getting unmounted.

The container runtime from our view should be the sole authority behind
when to mount and unmount the volumes from the hosts. It should have all of
the necessary knowledge that would determine when that operation is proper
because it knows about the state of all containers.

So we didn't necessarily do anything different with the spec. We
implemented the volume package but changed how the Mesos runtime did
accounting to be more sophisticated and tracked directly with containers.

On Friday, April 15, 2016, Jie Yu notifications@github.com wrote:

I looked at the docker volume specification:
https://docs.docker.com/engine/extend/plugins_volume/

/VolumeDriver.Mount
"Docker requires the plugin to provide a volume, given a user specified
volume name. This is called once per container start. If the same
volume_name is requested more than once, the plugin may need to keep track
of each new mount request and provision at the first mount request and
deprovision at the last corresponding unmount request."

It looks to me that the ref counting should be handled by the plugin
(e.g., rexray), rather than the container runtime. The container runtime
will simply call 'Mount' during container startup and 'Unmount' during
container teardown. Given that, I am wondering why we are doing mount
tracking in the isolator?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#93 (comment)

jieyu · 2016-04-15T17:47:02Z

@clintonskitson that makes sense. Thanks for the explanation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The isolator should checkpoint mounts BEFORE the actual mount operation. #93

The isolator should checkpoint mounts BEFORE the actual mount operation. #93

jieyu commented Apr 14, 2016

cantbewong commented Apr 14, 2016

jieyu commented Apr 14, 2016

cantbewong commented Apr 14, 2016

cantbewong commented Apr 14, 2016

jieyu commented Apr 15, 2016

jieyu commented Apr 15, 2016

clintkitson commented Apr 15, 2016

jieyu commented Apr 15, 2016

The isolator should checkpoint mounts BEFORE the actual mount operation. #93

The isolator should checkpoint mounts BEFORE the actual mount operation. #93

Comments

jieyu commented Apr 14, 2016

cantbewong commented Apr 14, 2016

jieyu commented Apr 14, 2016

cantbewong commented Apr 14, 2016

cantbewong commented Apr 14, 2016

jieyu commented Apr 15, 2016

jieyu commented Apr 15, 2016

clintkitson commented Apr 15, 2016

jieyu commented Apr 15, 2016