Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMX: Stale VM's blocking the driver usage #62

Open
sparshev opened this issue May 2, 2024 · 1 comment
Open

VMX: Stale VM's blocking the driver usage #62

sparshev opened this issue May 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@sparshev
Copy link
Collaborator

sparshev commented May 2, 2024

There happened multiple times during usage of the fish node - ~every 2-3 months of active utilization VMX just starts to behave weirdly and fails to allocate the VM net interface. This have a negative impact on the Fish node - because the allocation stops in vmrun start - even the timeout not helping and since the state of the Application is ELECTED - jenkins net plugin also don't see any issues.

To fix that - OS reboot usually helps with cleanup of the VM's via manual deallocation...

Steps to Reproduce

Unfortunately - reproducing regularly but no specific steps. Most probably some issue with VMWare Fusion, so hard to find out what's causing it.

Platform and Version

OS: macOS 11.6.5 20G527
VM's: Ubuntu/Windows, possible Mac too
Fish: Aquarium Fish v0.7.2 (240223.211700)
VMware Fusion: 13.0.1

Sample Code that illustrates the problem

Logs taken while reproducing problem

  • vmrun list:

    Total running VMs: 7
    /opt/aquarium-fish/ws/fish_vmx_workspace/c60dcc2b2fbc/c60dcc2b2fbc.vmx
    /opt/aquarium-fish/ws/fish_vmx_workspace/daef49119743/daef49119743.vmx
    /opt/aquarium-fish/ws/fish_vmx_workspace/9a86a7e02f3b/9a86a7e02f3b.vmx
    /opt/aquarium-fish/ws/fish_vmx_workspace/cee9b0838ff9/cee9b0838ff9.vmx
    /opt/aquarium-fish/ws/fish_vmx_workspace/aa6b6648625e/aa6b6648625e.vmx
    /opt/aquarium-fish/ws/fish_vmx_workspace/ba6cdd90f859/ba6cdd90f859.vmx
    /opt/aquarium-fish/ws/fish_vmx_workspace/8658822ac4ac/8658822ac4ac.vmx
    
  • Related Fish logs:

    2024/04/30 21:24:39 INFO:       Fish: NEW Application with no vote: f706a1af-c683-4ef1-8467-00221f264b0f 2024-04-30 21:24:37.469644 -0700 -0700
    2024/04/30 21:24:39 INFO:       Fish: NEW Application with no vote: f706a1af-c683-4525-89d8-5b4632d28b2b 2024-04-30 21:24:37.488213 -0700 -0700
    2024/04/30 21:24:39 INFO:       Fish: Starting Application f706a1af-c683-4ef1-8467-00221f264b0f election round 0
    2024/04/30 21:24:39 INFO:       Fish: Starting Application f706a1af-c683-4525-89d8-5b4632d28b2b election round 0
    2024/04/30 21:24:39 INFO:       Fish: I won the election for Application f706a1af-c683-4ef1-8467-00221f264b0f
    2024/04/30 21:24:39 INFO:       Fish: I won the election for Application f706a1af-c683-4525-89d8-5b4632d28b2b
    2024/04/30 21:24:44 INFO:       Fish: Start executing Application f706a1af-c683-4ef1-8467-00221f264b0f NEW
    2024/04/30 21:24:44 INFO:       Fish: Allocate the resource using the driver vmx
    2024/04/30 21:24:44 INFO:       Fish: Start executing Application f706a1af-c683-4525-89d8-5b4632d28b2b NEW
    2024/04/30 21:24:44 INFO:       VMX: Loading the required image: ubuntu2004 220517.042747_9aa1206d https://artifactory.corp.adobe.com/artifactory/maven-ci-tools-snapshot/com/adobe/ci/image/vmx/ubuntu2004/ubuntu2004-220517.042747_9aa1206d.tar.xz
    2024/04/30 21:24:44 INFO:       VMX: Loading the required image: ubuntu2004-ci 221026.005337_de1f82e3 https://artifactory.corp.adobe.com/artifactory/maven-ci-tools-snapshot/com/adobe/ci/image/vmx/ubuntu2004-ci/ubuntu2004-ci-221026.005337_de1f82e3.tar.xz
    2024/04/30 21:24:44 INFO:       Fish: Allocate the resource using the driver vmx
    2024/04/30 21:24:44 INFO:       VMX: Loading the required image: winserver2019 221022.032951_80e227c2 https://artifactory.corp.adobe.com/artifactory/maven-ci-tools-snapshot/com/adobe/ci/image/vmx/winserver2019/winserver2019-221022.032951_80e227c2.tar.xz
    2024/04/30 21:24:44 INFO:       VMX: Loading the required image: winserver2019-ci 230821.223539_85c70d8a https://artifactory.corp.adobe.com/artifactory/maven-ci-tools-snapshot/com/adobe/ci/image/vmx/winserver2019-ci/winserver2019-ci-230821.223539_85c70d8a.tar.xz
    2024/04/30 21:24:44 INFO:       VMX: The images are processed.
    2024/04/30 21:24:58 Waiting for /opt/aquarium-fish/ws/fish_vmx_workspace/8658822ac4ac/vmware.log to appear...
    2024/04/30 21:24:58 Waiting for /opt/aquarium-fish/ws/fish_vmx_workspace/aa6b6648625e/vmware.log to appear...
    

    after the last line - no signs of operation in the fish VMX allocation thread

  • 8658822ac4ac/vmware.log:

    ...
    2024-05-01T04:24:58.765Z No(00) vcpu-0 ConfigDB: Setting RemoteDisplay.vnc.key = <not printed>
    2024-05-01T04:24:58.770Z In(05) vcpu-0 memoryHotplug: Node 0: Present: 4095 MB (100 %) Size:4095 MB (100 %)
    2024-05-01T04:24:58.770Z In(05) vcpu-0 PIIX4: PM Resuming from suspend type 0x0, chipset.onlineStandby 0
    2024-05-01T04:24:58.770Z In(05) vcpu-0 VNET: 'ethernet0' enable link state propagation, lsp.state = 5
    2024-05-01T04:24:58.770Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: Starting isolated virtual interface, vmnet=vmnet8
    2024-05-01T04:24:58.771Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: uuid=8E22BD0D-8F8D-41EF-B1DF-CD9B296A13CF
    2024-05-01T04:24:58.771Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: connect adapter to host: connect=yes
    2024-05-01T04:24:58.771Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: host adapter address: ip=192.168.248.1
    2024-05-01T04:24:58.771Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: host adapter subnet: subnet=255.255.255.0
    2024-05-01T04:24:58.771Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: MTU size is not configured, vmnet=vmnet8
    2024-05-01T04:25:03.911Z Wa(03) vcpu-0 VNET: MacosVmnetVirtApiWaitSem: Ethernet0: Timeout while waiting for semaphore
    2024-05-01T04:25:03.911Z Er(02) vcpu-0 VNET: MacosVmnetVirtApiStartInterface: Ethernet0: Failed to wait for semaphore on interface start
    

    after the last line it stops.

  • 8658822ac4ac_vmware-vix-65140.log - have weird snapshot errors, but should not affect

  • 8658822ac4ac_vmware-vix-65216.log - looks fine

  • aa6b6648625e/vmware.log

    ...
    2024-05-01T04:24:59.568Z In(05) vcpu-0 USB: Disconnecting device 0x2000000400000000
    2024-05-01T04:24:59.568Z In(05) vcpu-0 USB: Connecting device desc:name:VMware\ Virtual\ USB\ Mouse vid:0e0f pid:0003 speed:full family:hid deviceType:virtual-hid info:0000005 version:5 id:0x200000050e0f0003
    2024-05-01T04:24:59.568Z No(00) vcpu-0 ConfigDB: Setting usb_xhci:4.present = "TRUE"
    2024-05-01T04:24:59.568Z No(00) vcpu-0 ConfigDB: Setting usb_xhci:4.deviceType = "hid"
    2024-05-01T04:24:59.568Z No(00) vcpu-0 ConfigDB: Setting usb_xhci:4.port = "4"
    2024-05-01T04:24:59.568Z No(00) vcpu-0 ConfigDB: Setting usb_xhci:4.parent = "-1"
    2024-05-01T04:24:59.570Z No(00) vcpu-0 ConfigDB: Setting RemoteDisplay.vnc.key = <not printed>
    2024-05-01T04:24:59.575Z In(05) vcpu-0 memoryHotplug: Node 0: Present: 4095 MB (100 %) Size:4095 MB (100 %)
    2024-05-01T04:24:59.575Z In(05) vcpu-0 PIIX4: PM Resuming from suspend type 0x0, chipset.onlineStandby 0
    2024-05-01T04:24:59.575Z In(05) vcpu-0 VNET: 'ethernet0' enable link state propagation, lsp.state = 5
    2024-05-01T04:24:59.575Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: Starting isolated virtual interface, vmnet=vmnet8
    2024-05-01T04:24:59.575Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: uuid=8E22BD0D-8F8D-41EF-B1DF-CD9B296A13CF
    2024-05-01T04:24:59.576Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: connect adapter to host: connect=yes
    2024-05-01T04:24:59.576Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: host adapter address: ip=192.168.248.1
    2024-05-01T04:24:59.576Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: host adapter subnet: subnet=255.255.255.0
    2024-05-01T04:24:59.576Z In(05) vcpu-0 VNET: MacosVmnetVirtApiStartIsolatedInterface: Ethernet0: MTU size is not configured, vmnet=vmnet8
    2024-05-01T04:25:04.719Z Wa(03) vcpu-0 VNET: MacosVmnetVirtApiWaitSem: Ethernet0: Timeout while waiting for semaphore
    2024-05-01T04:25:04.719Z Er(02) vcpu-0 VNET: MacosVmnetVirtApiStartInterface: Ethernet0: Failed to wait for semaphore on interface start
    

    after the last line it stops.

  • aa6b6648625e_vmware-vix-65141.log - have weird snapshot errors, but overall should not affect

  • aa6b6648625e_vmware-vix-65221.log - looks fine

  • nothing significant in the macos log...

@sparshev sparshev added the bug Something isn't working label May 2, 2024
@sparshev
Copy link
Collaborator Author

sparshev commented May 3, 2024

The issue with timeout looks related to those issues:

Shortly: if subcommand will run another process (and vmrun could do that in theory) - even if vmrun is killed by timeout, it doesn't mean the child process will be killed and since we're attaching stderr/stdout to listen - they will wait for closure, which will not happen until all child processes will be completed.

It seems like the proposed resolutions are relying on Lin only, so need to check if this solution will work for Mac as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant