Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qemu-guest-agent monitoring is unusable #6765

Closed
jimis opened this issue Oct 29, 2024 · 0 comments
Closed

qemu-guest-agent monitoring is unusable #6765

jimis opened this issue Oct 29, 2024 · 0 comments

Comments

@jimis
Copy link

jimis commented Oct 29, 2024

I have been trying to enable monitoring in OpenNebula 6.10.0 based on these instructions:
https://docs.opennebula.io/6.10/open_cluster_deployment/kvm_node/kvm_driver.html#enabling-qemu-guest-agent

There are plenty of issues I am facing, it's practically unusable in its current state. I have done plenty of debugging, please read my findings below.

The examples in the documentation don't work.

Only the :vm_qemu_ping command works. The :guest_info mentioned later fails. This happens because...

The guest-agent returns JSON replies and OpenNebula throws syntax errors.

Meaningless errors like the following are visible in monitor.log:

 , error: syntax error, unexpected COMMA, expecting $end at line 1095005315, columns 195:197

This happens because OpenNebula expects a single "return" JSON key that contains content of VM_TEMPLATE syntax.

However, what the guest-agent returns is a JSON object with a "return" key containing a deep JSON hierarchy, depending on the commands.

Even if we pass the guest agent response through jq, it will still be of JSON syntax so it will not work and meaningless errors will be logged.

The workaround is to change the command to return a single string under the return JSON element. See the :vm_ip_address command on this pull request that I opened: #6762

    :vm_ip_address:     >
                        one-$vm_id '{"execute":"guest-network-get-interfaces"}' --timeout 5 |
                        jq '{"return" : [ .return[]."ip-addresses"[]|select(."ip-address-type"=="ipv4" and (."ip-address"|startswith("127.")|not))."ip-address" ][0]}'

This returns { "return": "10.218.100.2" } which is accepted by opennebula because a single string is both valid JSON and valid VM_TEMPLATE syntax.

This processing is cumbersome and can't provide any kind of complex monitoring, like getting the list of all IP addresses (demonstrated by the 2nd command on the same pull request, which does not work because of JSON).

On a hypervisor host with many VMs, if a single VM is frozen, then monitoring fails for all VMs of the host

Then again useless errors are being logged:

Received STATE_VM message from host 1:
Error executing state.rb: unexpected token at ''
Tue Oct 29 15:30:12 2024 [Z0][MDP][W]: Failed to monitor VM state for host 1: Error executing state.rb: unexpected token at ''

Like the previous errors, these too are misleading. What actually happened is that one single VM on the host times out like this:

# virsh  qemu-agent-command one-4 --cmd '{"execute":"guest-ping"}' --timeout 5
error: Guest agent is not responding: Guest agent not available for now

This can happen either because a VM is frozen, or because it is still booting.
When I recover this single VM then monitoring suddenly works for all VMs on the host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants