Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Unable to unmount/detach volume after vmgroup membership change #1189

Closed
shuklanirdesh82 opened this issue Apr 25, 2017 · 16 comments
Closed

Comments

@shuklanirdesh82
Copy link
Contributor

shuklanirdesh82 commented Apr 25, 2017

volume stays as attached even after detaching from container

Test bed setup:

  • 1 ESX (E1) accessing a shared datastore SD1
  • 2 VMs (VM1 & VM2): VM1/VM2 are created on SD1 (registered at E1)

Steps to reproduce:

  1. VM1: create volume (volume is created under _DEFAULT vmgroup)
    docker volume create -d vsphere vol1
  2. E1: verify volume is created and Admin CLI shows its entry correctly
    vmdkops_admin.py volume ls
  3. VM1/VM2: invoke docker volume ls and docker volume inspect to verify vol1 properties
  4. VM1: attach container and leave it attached
    docker run --rm -it -v vol1@sharedVmfs-0:/vol1 --name cnt_abc11 busybox
  5. E1: verify volume is in attached state and docker volume inspect shows the same
  6. E1: create new named vmgroup by adding VM1 ( docker_host where the container is in attached state)
  7. E1: ADMIN CLI volume ls to make sure the volume is still in attached state
  8. VM2: invoke docker volume inspect vol1
  9. Make sure docker volume inspect or Admin CLI still says volume is attached to VM1
  10. VM1: detach/exit container (remove unused containers)
  11. VM2: invoke docker volume inspect vol1 <<< volume is still in attached state
root@photon-xVmYMbyTn [ ~ ]# docker volume inspect vol1@sharedVmfs-0
[
    {
        "Driver": "vsphere:latest",
        "Labels": null,
        "Mountpoint": "/mnt/vmdk/vol1@sharedVmfs-0",
        "Name": "vol1@sharedVmfs-0",
        "Options": {},
        "Scope": "global",
        "Status": {
            "access": "read-write",
            "attach-as": "independent_persistent",
            "attached to VM": "VM_17_SVMFS",
            "capacity": {
                "allocated": "15MB",
                "size": "100MB"
            },
            "clone-from": "None",
            "created": "Tue Apr 25 06:33:46 2017",
            "created by VM": "VM_17_SVMFS",
            "datastore": "sharedVmfs-0",
            "diskformat": "thin",
            "fstype": "ext4",
            "status": "attached"
        }
    }
]

12: E1: vmdkops_admin.py volume ls still shows it is attached state

[root@sc2-rdops-vm05-dhcp-160-17:~] /usr/lib/vmware/vmdkops/bin/vmdkops_admin.py volume ls
Volume  Datastore      VMGroup   Capacity  Used  Filesystem  Policy          Disk Format  Attached-to  Access      Attach-as               Created By   Created Date              
------  -------------  --------  --------  ----  ----------  --------------  -----------  -----------  ----------  ----------------------  -----------  ------------------------                        
vol1    sharedVmfs-0   _DEFAULT  100MB     15MB  ext4        N/A             thin         VM_17_SVMFS  read-write  independent_persistent  VM_17_SVMFS  Tue Apr 25 06:33:46 2017  

Expectation: step#11 and step#12 should not show vol1 as attached after detaching container

Note: vmgroup created at step#6 becomes not removable, vm1 is part of named tenant and fail to remove as shown below

vmdkops_admin.py vmgroup rm --name=T2
Cannot complete vmgroup rm. VM 'VM_17_SVMFS' has volumes mounted.
@shuklanirdesh82 shuklanirdesh82 added this to the 0.14 milestone Apr 25, 2017
@govint
Copy link
Contributor

govint commented Apr 25, 2017

@shuklanirdesh82, can you upload the ESX and plugin logs. Plus the command output for admin CLI vmgroup ls for each vmgroup and list the volumes in each.

Pls. update whats the default DS for the _DEFAULT vmgroup, it should be the VM DS i guess.

@govint govint self-assigned this Apr 25, 2017
@govint
Copy link
Contributor

govint commented Apr 25, 2017

If a VM is in a vmgroup with and has volumes attached, then those volumes will be on datastores assigned to the vmgroup.

If a VM in a vmgroup has volumes attached to it then don't allow the VM to be migrated out of the vmgroup. The VM can end up in a vmgroup with a different default datastore and which won't resolve the VMDK path in vmdk_ops.py:executeRequest(). And hence detach disk won't happen.

Per documentation a VM can access only the volumes on datastores assigned to the VM group. Moving a VM with volumes attached across groups is hence incorrect.

When creating a VM group check if the VMs in the list have any volumes mounted (it can be in the default tenant with a default DS (VM_DS)) and disallow placing the VM.

@shuklanirdesh82 shuklanirdesh82 changed the title volume stays as attached even after detaching from container VM shouldn't be allowed to add to new vmgroup when volumes are mounted on the VM Apr 25, 2017
@pdhamdhere
Copy link
Contributor

@govint Looking at code, detach should have succeeded.

@shuklanirdesh82 Logs please.

@govint
Copy link
Contributor

govint commented Apr 30, 2017

Once the VM is attached to the new vmgroup, the next command will use the default datastore of the new vmgroup and will try to resolve the VMDK path with that. That won't work because the VMDK is present on the datastore of the earlier vm groujp that the VM belonged to.

@pdhamdhere
Copy link
Contributor

For detach, why do check tenant and DS info? Get VMDK info from VM config and detach. Wasn't #1045 supposed to fix this? @pshahzeb

@pdhamdhere pdhamdhere changed the title VM shouldn't be allowed to add to new vmgroup when volumes are mounted on the VM Unable to unmount/detach volume after vmgroup membership change Apr 30, 2017
@govint
Copy link
Contributor

govint commented Apr 30, 2017

vmdk_ops.py:executeRequest() line 906 the path is the dockvols path on the datastore determined as - default datastore for the vmgroup or the datastore mentioned in the volume name. Then we try to get the vmdk path relative to that DS's dockvols path (line 911). The VM in this case however had a volume attached from its earlier vm group where the vmgroup default DS was VM_DS. The detach req is passing the VMDK name but that won't be resolved on the new dockvols path.

Aside from this, the doc clearly says that a VM mustn't be in a vmgroup and using volumes on a DS not assigned to the vmgroup. In this case there is a mismatch between what was allowed (Vm was allowed to be moved to a new vmgroup with a volume atached) and what the doc is saying.

@pdhamdhere
Copy link
Contributor

Any reason why we won't do

For detach, why do check tenant and DS info? Get VMDK info from VM config and detach

Disk is already attached and when container stops, I don't see a reason detach should fail.

I can foresee issues where 2 volumes with same name are created in 2 different datastore before and after vmgroup membership change and detach will have to figure out which volume should be detached.

@pshahzeb
Copy link
Contributor

@pdhamdhere

I can foresee issues where 2 volumes with same name are created in 2 different datastore before and after vmgroup membership change and detach will have to figure out which volume should be detached.

Yes, that's the reason we can't detach a VMDK from a VM based on VM config.
To prevent this, #1060 introduced checks to disallow membership change when VMs have volumes attached.
Hence in the above tests, adding VM to a new vmgroup should have failed.
Checking if VMs have any volumes attached is done in tenant rm, vm rm, replace.

But when VMs from default tenant are moved to a named tenant, we need to do this check in tenant
create and tenant vm add as well. That's the bug.

@shuklanirdesh82
Copy link
Contributor Author

shuklanirdesh82 commented May 1, 2017

@govint
Copy link
Contributor

govint commented May 1, 2017

Docker won't allow two volumes to be created with the same name and if user provides the @datastore format then those are separate volumes as far as Docker is concerned. And when the ESX service gets the volume name as name@datastore we use that datastore to figure the VMDK path. Otherwise its always the default datastore of the vm group thats used to create the volume path. As mentioned in earlier update once the VM is assigned to a new vmgroup its docker volume disks are assumed to be on the default DS of that vmgroup.

@shuklanirdesh82, the vmdk_ops logs are not uploaded, can you attach those.

@shuklanirdesh82
Copy link
Contributor Author

@govint I have attached the logs (vmdk_ops.txt)

@govint
Copy link
Contributor

govint commented May 1, 2017

From the plugin and service logs, the user has used vol-name@DS format, which means the service will try to use the DS mentioned in the vol-name.

  1. Once VM is moved to new vm group, the new group has no access rights set for the DS used earlier.
  2. When container exited and unmount was called (as part of detach), the ESX service failed the request with an access error or the DS thats hosting the volume - sharedVmfs-0.

Plugin:
2017-05-01 02:15:04.662495101 +0000 UTC [INFO] Unmounting Volume name="vol1@sharedVmfs-0"
2017-05-01 02:15:04.796778118 +0000 UTC [ERROR] Failed to unmount error="No access privilege exists" name="vol1@sharedVmfs-0"

Service:
2017-05-01 02:15:04.662495101 +0000 UTC [INFO] Unmounting Volume name="vol1@sharedVmfs-0"
2017-05-01 02:15:04.796778118 +0000 UTC [ERROR] Failed to unmount error="No access privilege exists" name="vol1@sharedVmfs-0".

Overall, this issue won't happen if new vmgroup has access rights set for the old DS. But if thats not the case, the service code is handling the condition correctly and flagging the error.

The proposed fix still applies as the correct way where to disallow the membership of VMs to a vmgroup create if volumes attached to the VMs.

@pshahzeb
Copy link
Contributor

pshahzeb commented May 1, 2017

@govint I agree with your proposed fix and here is what I can think of that not allows us to directly look at vmdk config and detach vmdk.

  1. We can have same volume names on same DS across different tenants created through different VMs. One of it may migrate from one tenant to another and try to attach another volume with same name.
  2. Since we can't always get fully qualified volume name all the time in detach flow on service side, it is not correct to assume the DS to be default for short volume names due to possibility of membership change.
    Hence the fix. I think we should also have it in tenant vm add.

@govint
Copy link
Contributor

govint commented May 1, 2017

I'll check, I thought this was the only place where the check was missing.

@govint
Copy link
Contributor

govint commented May 1, 2017

Updated diff and pushed.

@govint
Copy link
Contributor

govint commented May 3, 2017

Fixed via #1190.

@govint govint closed this as completed May 3, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants