Skip to content

Conversation

@changlei-li
Copy link
Contributor

No description provided.

snwoods and others added 25 commits September 1, 2025 10:25
Adds a new span.depth key to the trace context baggage, and a
configurable max_span_depth. This defaults to 100 and so will not limit
traces, but is useful when wanting to analyse large traces which can
often become slow if all the traces are recorded.

Signed-off-by: Steven Woods <steven.woods@cloud.com>
Http exporting appears to get overwhelmed when too many spans are
exported at the same time. This adds the option to export a smaller
chunk of spans at a time. This also reduces the size of the file exports
as we only check for max file size after exporting all of the finished
spans.

Signed-off-by: Steven Woods <steven.woods@cloud.com>
The other Observer components e.g. xenops need a forwarder to notify
them of any changes to the export_chunk_size and max_depth as they do
not have access to xapi_globs/xapi.conf

Signed-off-by: Steven Woods <steven.woods@cloud.com>
  see cgroups v2 "no internal processes" rule

  if cgroup.subtree_control is not empty, and we attach a pid
  to cgroup.procs, kernel would return EBUSY

Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Currently a manually disabled host will be re-enabled on toolstack restarts
and host reboots, which will provoke VM migrations in an HA cluster. If
maintenance requires many restarts, that could be painful.

To allow for keeping a host persistently disabled across toolstack
restarts and host reboots, add a new localdb flag "host_auto_enable"
(set through the parameter on Host.disable). This coexists with the internal
flag of host_disabled_until_reboot, which is only set on host poweroff
internally and cannot be controlled by the user directly.

With host_auto_enable set to false, xapi will not re-enable a host on
its own no matter what: toolstack restarts, host reboots, calls to
consider_enabling_host (triggered by PBD plugs etc.) will have no effect.
Only a manual call to Host.enable will re-enable the host.

Expose the new parameter in the CLI. Also fix up the comment in
xapi_host_helpers.mli.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
  see cgroups v2 "no internal processes" rule

  if cgroup.subtree_control is not empty, and we attach a pid
  to cgroup.procs, kernel would return EBUSY
* Use the decoder from the OCaml standard library instead of
  our own implementation, which this patch removes.
* Validate UTF-8/XML conformance for maps and sets, in addition to
  strings.

This is XSA-474 / CVE-2025-58146.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Reviewed-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Currently `Host.disable` only disables the host for the current xapi's
runtime - it will be re-enabled on toolstack restarts and host reboots
(if other conditions are met). This greatly complicates manual
maintenance in an HA cluster when reboots are needed, since VMs will
start moving to the host as soon as it's enabled.

To allow for keeping a host persistently disabled across toolstack
restarts and host reboots, add a new localdb flag `host_auto_enable`
(set through the parameter on Host.disable). This coexists with the
internal flag of `host_disabled_until_reboot`, which is only set on host
poweroff internally and cannot be controlled by the user directly.

With `host_auto_enable` set to false, xapi will not re-enable a host on
its own no matter what: toolstack restarts, host reboots, calls to
consider_enabling_host (triggered by PBD plugs etc.) will have no
effect. Only a manual call to `Host.enable` will re-enable the host.

Expose the new parameter in the CLI. Also fix up the comment in
`xapi_host_helpers.mli`.

I've verified the new functionality manually.
Because Autoconf is not DHCP, networkd uses the dns value to write to
resolv.conf. This is done on ocaml/networkd/bin/network_server.ml line 745

This allows to have non-empty resolv.conf when using IPv6 autoconf.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
* Use the decoder from the OCaml standard library instead of our own
implementation, which this patch removes.
* Validate UTF-8/XML conformance for maps and sets, in addition to
strings.

This is XSA-474 / CVE-2025-58146.

Reviewed-by: Edwin Török <edwin.torok@cloud.com>

Patch from: https://xenbits.xen.org/xsa/advisory-474.html
This adds the monitor service required for the SSH auto-mode, as
described in `doc/content/toolstack/features/SSH`.

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
Signed-off-by: Rob Hoes <rob.hoes@citrix.com>
Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Signed-off-by: Rob Hoes <rob.hoes@citrix.com>
Adds a new span.depth key to the trace context baggage, and a
configurable max_span_depth. This defaults to 100 and so will not limit
traces (the traces I've seen with the most depth are ~40 depth e.g.
https://jaeger.kfd.eng.citrite.net/trace/ea5ddca5509b3ae1102bc7279092652d),
but is useful when wanting to analyse large traces which can often
become slow if all the spans are recorded in a trace.

This isn't perfect, the span.depth seems to get lost sometimes between
xapi and xenops, resulting in a greater depth than that listed, but I
have created ticket CP-308999 for this and this works well enough to
greatly reduce the number of spans in a trace when needed, which is the
intention. As an example, a host evacuate trace with max_span_depth 10
goes down to ~1000 spans rather than the 34k+ withou a depth limit.
Previously both xapi and networkd had to inspect the IP configuration to decide
whether the DNS values should be persistend into /etc/resolv.conf. This
actually lead to a mismatch in them. Instead use an option value for DNS that
simply means that if there's a value, it must be persisted.

Now xapi decides the instances where these values are written.

Treat a couple of empty lists as a lack of value to avoid writing empty
resolv.conf files. This can happen when updating a host from previous
versions, which use empty lists when using DHCP.

Tested manually by installing a version with this change and restarting
the toolstack. The file is kept intact, unlike the previous version of
the change that did not take into account the update behaviour.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
…lugging

When unplugging a pbd, enabling a host, or adding a vbd, the shared SR
constraint violation could be violated, but the error used in these cases
was that the operation blocked the failover planning. This was confusing
because the main reason was not mentioned in the error. Instead use the
SR constraint violation error, and log a more descriptive message in the
logs as info, because these can happen during normal operation and
there's nothing dodgy going on.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
…d unplugging

When unplugging a PIF, enabling a host, or adding a VIF, the shared
network constraint violation could be violated, but the error used in
these cases was that the operation blocked the failover planning. This
was confusing because the main reason was not mentioned in the error.
Instead use the network constraint violation error, and log a more
descriptive message in the logs as info, because these can happen during
normal operation and there's nothing dodgy going on.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
It didn't add any useful information to the error. Also cleaned up the
formatting of some comments found during the patch series.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
…#6666)

Now the HA shared SR and network constraint violations are used when
plugging and unplugging.

When unplugging a pbd or pif, enabling a host, or adding a vbd or vif,
the shared SR or network constraint violations could be violated, but
the error used in these cases was that the operation blocked the
failover planning. This was confusing because the main reason was not
mentioned in the error. Instead use the
shared constraint violation error, and log a more descriptive message in
the logs as info, because these can happen during normal operation and
there's nothing dodgy going on.

I previously wanted to know Tina's opinion on how we change the reason
in a way that can be better treated by clients and internationalized,
but saw that the error used was simply not the right one.
Because Autoconf is not DHCP, networkd uses the dns value to write to
resolv.conf. This is done on ocaml/networkd/bin/network_server.ml line
745

This allows to have non-empty resolv.conf when using IPv6 autoconf.

xapi-idl/network: Remove code duplication for DNS persistence decisions:

Previously both xapi and networkd had to inspect the IP configuration to
decide
whether the DNS values should be persistend into /etc/resolv.conf. This
actually lead to a mismatch in them. Instead use an option value for DNS
that
simply means that if there's a value, it must be persisted.

Now xapi decides the instances where these values are written.

Treat a couple of empty lists as a lack of value to avoid writing empty
resolv.conf files. This can happen when updating a host from previous
versions, which use empty lists when using DHCP.

Tested manually by installing a version with this change and restarting
the toolstack. The file is kept intact, unlike the previous version of
the change that did not take into account the update behaviour.

This is PR fixed version of #6586
Some of these were passed through several layers of functions only to be unused
in the end. Drop them, improving the legibility of the code.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Some of these were passed through several layers of functions only to be
unused in the end. Drop them, improving the legibility of the code.
When performing the changes described in add-function.md for adding a
host-price-of function to xapi, a type error would arise from the
message-forwarding.ml file - this is fixed by explicitly giving the remote_fn
named argument.

Signed-off-by: Christian Pardillo Laursen <christian.pardillolaursen@cloud.com>
When performing the changes described in add-function.md for adding a
host-price-of function to xapi, a type error would arise from the
message-forwarding.ml file - this is fixed by explicitly giving the
remote_fn named argument.
@changlei-li changlei-li merged commit ea80a89 into feature/host-network-device-ordering Sep 18, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.