Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong PCI to VM association removed when migrating VM with PCI pass-through #3230

Closed
7 tasks
atodorov-storpool opened this issue Apr 12, 2019 · 2 comments
Closed
7 tasks

Comments

@atodorov-storpool
Copy link
Contributor

atodorov-storpool commented Apr 12, 2019

Description
After power-off migrate of a VM wrong PCI to VM mapping is cleared on the source host.

To Reproduce

  • Run several VMs with PCI NIC pass-through and note which VF to which VM is assigned, on each host. (here is an yaml dump containing : '')
'2':
  0000:d8:00:2: '26'
  0000:d8:00:3: '28'
  0000:d8:00:4: '6'
  0000:d8:00:7: '45'
  0000:d8:01:0: '45'
'1':
  0000:d8:00:2: '22'
  0000:d8:00:3: '11'
  0000:d8:00:4: '18'
  0000:d8:00:5: '44'
  0000:d8:00:6: '44'
  0000:d8:00:7: '31'
  0000:d8:01:0: '5'
  0000:d8:01:1: '48'
  0000:d8:01:2: '46'
  0000:d8:01:3: '47'
'0':
  0000:d8:00:2: '21'
  0000:d8:00:3: '20'
  0000:d8:00:4: '43'
  0000:d8:00:5: '4'
  0000:d8:00:6: '43'
  0000:d8:00:7: '38'
  0000:d8:01:0: '38'
  • Migrate some VMs. in the example VMs 46,47 and 48 are migrated from hostid 1 to hostid 2:
'2':
  0000:d8:00:2: '26'
  0000:d8:00:3: '28'
  0000:d8:00:4: '6'
  0000:d8:00:5: '48'
  0000:d8:00:6: '46'
  0000:d8:00:7: '45'
  0000:d8:01:0: '45'
  0000:d8:01:1: '47'
'1':
  0000:d8:00:2: '22'
  0000:d8:00:3: '11'
  0000:d8:00:4: '18'
  0000:d8:00:7: '31'
  0000:d8:01:0: '5'
  0000:d8:01:2: 46
  0000:d8:01:3: 47
'0':
  0000:d8:00:2: '21'
  0000:d8:00:3: '20'
  0000:d8:00:4: '43'
  0000:d8:00:5: '4'
  0000:d8:00:6: '43'
  0000:d8:00:7: '38'
  0000:d8:01:0: '38'

From the above hosts dump it is clear that:

  1. the PCI addresses of VMs 46 and 47 are not deleted from hostid 1,
  2. but the PCI addresses of vm 44 are gone from hostid 1.

There is a fsck routine to fix (1) but it needs stop-fsck-start of the onedb service. And it is not fixing issue (2) which break the instantiation of new VMs with pass-through because the VF that is given (and oned believe it is free) is already in use by another running VM.

Expected behavior
oned should free the old PCI addresses from the source host.

Details

  • Affected Component: [Core]
  • Hypervisor: [KVM]
  • Version: [5.8.1]

Additional context
To investigate the issue I've patched oned to log host_share.add, host_share.del, HostSharePCI::add and HostSharePCI::del. Following are the logs of the migrate session that clearly show the issue:

VM 46,47 and 48 are started on hostid 1

Fri Apr 12 15:28:29 2019 [Z0][ONE][D]: ZDBG host_share.add(48,10,2097152,0,0000:d8:00:5 ) host:1
Fri Apr 12 15:28:29 2019 [Z0][ONE][D]: ZDBG HostSharePCI::add(0000:d8:00:5,48) dev->address:0000:d8:01:1 dev->vmid:-1 dev->attrs->VMID:-1
Fri Apr 12 15:28:29 2019 [Z0][ONE][D]: ZDBG host_share.add(46,10,2097152,0,0000:d8:00:6 ) host:1
Fri Apr 12 15:28:29 2019 [Z0][ONE][D]: ZDBG HostSharePCI::add(0000:d8:00:6,46) dev->address:0000:d8:01:2 dev->vmid:-1 dev->attrs->VMID:-1
Fri Apr 12 15:28:29 2019 [Z0][ONE][D]: ZDBG host_share.add(47,10,2097152,0,0000:d8:01:1 ) host:1
Fri Apr 12 15:28:29 2019 [Z0][ONE][D]: ZDBG HostSharePCI::add(0000:d8:01:1,47) dev->address:0000:d8:01:3 dev->vmid:-1 dev->attrs->VMID:-1

Next the VMs were migrated to hostid 2. They has new PCI assignment on the new host (hostid 2) ...

Fri Apr 12 15:30:08 2019 [Z0][ONE][D]: ZDBG host_share.add(48,10,2097152,0,0000:d8:01:1 ) host:2
Fri Apr 12 15:30:08 2019 [Z0][ONE][D]: ZDBG HostSharePCI::add(0000:d8:01:1,48) dev->address:0000:d8:00:5 dev->vmid:-1 dev->attrs->VMID:-1
Fri Apr 12 15:30:08 2019 [Z0][ONE][D]: ZDBG host_share.add(46,10,2097152,0,0000:d8:01:2 ) host:2
Fri Apr 12 15:30:08 2019 [Z0][ONE][D]: ZDBG HostSharePCI::add(0000:d8:01:2,46) dev->address:0000:d8:00:6 dev->vmid:-1 dev->attrs->VMID:-1
Fri Apr 12 15:30:08 2019 [Z0][ONE][D]: ZDBG host_share.add(47,10,2097152,0,0000:d8:01:3 ) host:2
Fri Apr 12 15:30:08 2019 [Z0][ONE][D]: ZDBG HostSharePCI::add(0000:d8:01:3,47) dev->address:0000:d8:01:1 dev->vmid:-1 dev->attrs->VMID:-1

But on the source host the new PCI address assignments(from the destination host)are removed instead of the old ones:

Fri Apr 12 15:30:15 2019 [Z0][ONE][D]: ZDBG host_share.del(47,10,2097152,0,0000:d8:01:1 ) host:1
Fri Apr 12 15:30:15 2019 [Z0][ONE][D]: ZDBG HostSharePCI::del(0000:d8:01:1) second->address:0000:d8:01:1 second->vmid:48
Fri Apr 12 15:30:15 2019 [Z0][ONE][D]: ZDBG host_share.del(48,10,2097152,0,0000:d8:00:5 ) host:1
Fri Apr 12 15:30:15 2019 [Z0][ONE][D]: ZDBG HostSharePCI::del(0000:d8:00:5) second->address:0000:d8:00:5 second->vmid:44
Fri Apr 12 15:30:16 2019 [Z0][ONE][D]: ZDBG host_share.del(46,10,2097152,0,0000:d8:00:6 ) host:1
Fri Apr 12 15:30:16 2019 [Z0][ONE][D]: ZDBG HostSharePCI::del(0000:d8:00:6) second->address:0000:d8:00:6 second->vmid:44

Progress Status

  • Branch created
  • Code committed to development branch
  • Testing - QA
  • Documentation
  • Release notes - resolved issues, compatibility, known issues
  • Code committed to upstream release/hotfix branches
  • Documentation committed to upstream release/hotfix branches
@christian7007
Copy link
Contributor

Before executing the power-off migrate action, is the VM in running state?

@atodorov-storpool
Copy link
Contributor Author

yes. the VM is running.

christian7007 pushed a commit to christian7007/one that referenced this issue Apr 17, 2019
christian7007 pushed a commit to christian7007/one that referenced this issue Apr 17, 2019
christian7007 pushed a commit to christian7007/one that referenced this issue Apr 22, 2019
christian7007 pushed a commit to christian7007/one that referenced this issue Apr 23, 2019
…devices without poweroff flag"

This reverts commit 3c7446d.
christian7007 pushed a commit to christian7007/one that referenced this issue Apr 23, 2019
christian7007 pushed a commit to christian7007/one that referenced this issue Apr 23, 2019
christian7007 pushed a commit to christian7007/one that referenced this issue Apr 23, 2019
rsmontero pushed a commit that referenced this issue Apr 26, 2019
rsmontero pushed a commit that referenced this issue Apr 26, 2019
rsmontero pushed a commit that referenced this issue Oct 4, 2024
Signed-off-by: dcarracedo <dcarracedo@opennebula.io>
Co-authored-by: Tino Vázquez <cvazquez@opennebula.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants