Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge master to feature/non-cdn-update #5824

Conversation

BengangY
Copy link
Contributor

@BengangY BengangY commented Jul 13, 2024

As the module record_util.ml is refactored on master which conflicts with PR #5764, I need to merge master to feature branch feature/non-cdn-update.

edwintorok and others added 30 commits June 1, 2022 18:39
It is in /usr/sbin not /usr/bin

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Run xentrace in continuous tracing mode to a circular buffer
(of fixed size).
When a CPU spike is detected xentrace is sent a SIGTERM with `killall`
to get it to dump the xentrace buffer to disk.
The trace is compressed and another xentrace started.

This repeats until we run out of the space on the VDI set up for
tracing.

CPU spikes are detected by using `rrd2csv` and watching the host
`cpu_avg` which should be the average of the CPU usage of all the host's
pCPUs.

New flags:
* -c to enable circular tracing mode (default is a -T time one-off trace)
* -p <cpu-usage> e.g. -p 98 the detection point for CPU spike
* -r <n> repeat count - how many high cpu usage measurements in a row
  before triggerring the xentrace dump
* -M <n> to set the size of the circular buffer in MiB (default: 400MiB)

There is also a (hardcoded for now) sleep 60 between the traces just
so we don't continuously measure a very long spike.

To stop circular tracing mode you have to Ctrl-C on the running
xe-xentrace (e.g. under `screen`), or `killall xe-xentrace`
(but this seems to prevent automatic cleanup)

Signed-off-by: Edwin Török <edwin@etorok.net>
The C SDK build was failing with a recent GCC on Fedora39 like this:
```
src/xen_common.c: In function ‘xen_session_logout’:
src/xen_common.c:298:5: error: ‘xen_call_’ accessing 16 bytes in a region of size 0 [-Werror=stringop-overflow=]
  298 |     xen_call_(session, "session.logout", params, 0, NULL, NULL);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Use `NULL` instead of a VLA of size 0.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Requires libxml2-devel installed.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This will be useful for testing record_util.ml

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…ed case

Other enums here accept mixed case, not just lower case.

Fixes: 638f58e ("CP-38020: add HOST.set_numa_affinity_policy")

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
There are some inconsistencies between enum string values and their
declarations, e.g. `balance-slb` (which has special handling in the API
generator), or `depth-first` (which does not).

We want to automatically generate record_util.ml, but we must ensure backwards
compatibility, and the only way to do that is to exhaustively test all the old
values in a unit test.

This test won't need to be updated when new enum values are introduced: we'll
use the automatically generated ones for those already.

The `old*` files were created using the following command:
```
cp ocaml/xapi-cli-server/record_util.ml ocaml/tests/record_util/old_record_util.ml && dune build @check --profile=release && grep 'let all_' _build/o\default/ocaml/xapi-types/aPI.ml >|ocaml/tests/record_util/old_enum_all.ml && dune fmt --auto-promote
```

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…ecord_util_test

Add record_util backwards compatibility test
XAPI was just retrying endlessly in a loop saying "Could not find block device",
but didn't raise any alerts that XenRT could detect.

Report the redo log broken the first time we fail due to the lack of a bloc k device (which would indicate something going wrong in SM).

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Nested virt is not a supported feature yet, but when moving the setting
from Xen to toolstack as part of the Xen-4.17 update I typoed the name of the setting.

It was platform/nested-virt in xenguest.patch before

Fixes: 664de76 ("Xen-4.15+: CDF_NESTED_VIRT")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
For regular replies we look at the request and reply with a matching version.
However when we fail to parse the JsonRPC request itself then we don't know what version to use.

XenCenter uses JsonRPC v2 by default, and JsonRPC v2 has been supported in XAPI for a long time.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Fixes this pytype report:
```
pytype_reporter:          Expected: (self, uri, transport, encoding, verbose: bool = ..., ...)
pytype_reporter:   Actually passed: (self, uri, transport, encoding, verbose: int, ...)
pytype_reporter:          Expected: (self, uri, transport, encoding, verbose, allow_none: bool = ..., ...)
pytype_reporter:   Actually passed: (self, uri, transport, encoding, verbose, allow_none: int)
```

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
pytype claims that there is no 'errno' in the socket module, but there is:
```
Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.errno.ETIMEDOUT
110
```

Also `session.local_logout` is a remote method that is proxied by `__getattr__`,
I don't know why `pytype` complains about that one, and not about `logout`, but suppress it.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…ication

This is already the case for HTTP(S) connections, but XenAPI overrides the xmlrpclib implementation
to provide a Unix socket transport, and it lacks this optimization.

Otherwise we've noticed that SM makes at least 4 separate connections to XAPI to query the version number for example.
On a busy system each of those connections can get delayed by 100ms:

* OCaml's tick thread switches threads every 50ms, and you might need 2 switches when XAPI is busy with CPU-bound workloads:
  * one to accept the connection
  * another to handle the request on the newly spawned thread

See:
https://github.com/python/cpython/blob/v2.7.5/Lib/xmlrpclib.py#L1361-L1371
https://github.com/python/cpython/blob/v3.6.8/Lib/xmlrpc/client.py#L1237-L1245

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Old version of XenAPI.py used to look at the API version to implement a compatibility layer,
but that code got dropped and the API version is now completely unused:

df9b539 ("CA-35286: Remove forward compatability code from the python XenAPI module in favour of the compatability layer in xapi")

Remove the unused argument, this will allow us to avoid making 4 additional API calls after every login.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
The API version is not used anymore, so remove the 4 additional queries it'd make.
For backward compatibility retain the field as a cached property: should a client of this code want to access the API version,
then it'll make the 4 queries at that time, and cache the result for the duration of the API object.

Cached properties are writable, so the fallback that writes the API version to 1.1 should keep working too

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…enapi

CP-48623: reduce XenAPI.py connection rate and drop 4 useless API calls
…edo_report

redo_log: report redo log as broken if we cannot find the block device
This forces us to fully declare the dependencies of our code,
and not rely on libraries that are brought in only as transitive dependencies
of other libraries we happen to link to.

E.g. if our module A depends on library X, which itself depends on library Y,
then currently by linking X we also get Y linked and accessible from A
directly.
If code in module A uses both module X and Y *directly* then it needs to
declare a dependency on both when implicit transitive deps are off or it gets a
link failure (typically an error about a module or type being abstract).
If the code in module A only uses module X then no change is needed (X can
still use Y and the final executable will link both, it is just a question of
what is visible and callable from A directly).

This is especially useful when writing new code to get dependencies correct
from the beginning.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
(cherry picked from commit 7203430)
…check

Bytecode builds for `http_lib` are disabled due to '(modes best)',
and that means that anything that depends on it must have it disabled too to avoid this warning.

Avoids these kinds of warnings:
```
File "_none_", line 1:
Error: Module `Buf_io' is unavailable (required by `Http_svr')
```

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This will be a new library that will provide a more type-safe interface to file descriptor operations.
Useful on its own, but also for testing stdext.

Minimal dependencies, only Unix (and Alcotest for testing).

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This will be a test framework providing QCheck generators and properties for
testing file descriptor operations.
It will try to generate:
* different kinds of file descriptors
* actual data written/read on the other end of pipes and socket pairs
* different speeds and delays on the other end to find buffering bugs
* file descriptors that are >1024 to find bugs with select

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
We are going to use type-level constraints a lot.
Try to future proof it by using the recommended compiler flag.
`ocamlc` says this about `-principal`:
> When using labelled arguments and/or polymorphic methods, this flag is required to
> ensure future versions of the compiler will be able to infer types correctly, even if internal algorithms change

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
edwintorok and others added 26 commits July 1, 2024 16:34
This code snippet was unchanged since the beginning,
but pylint complained:
```
R1720: Unnecessary "else" after "raise", remove the "else" and de-indent the code inside it (no-else-raise)
```

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Fixes: 500a1f7 ("XenAPI: suppress pytype false positives")

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Use the Atomic module to track whether a watcher has been created.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
This allows one to force sync the state of xapi db with the cluster
stack, useful for cluster API methods change the state of the cluster.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Previously there were two ways an alert for a cluster host join/leave
can be raised: 1. through the cluster change watcher; 2. through the api
call. These two can generate duplicate alerts as an API call can cause
the cluster change watcher to notice the change as well.

The idea of the fix here is still to let API and watcher raise alerts
separately, but now add synchronous API calls to allow API call
(cluster-host-join, etc) to call
the cluster change update code at the right time so that the cluster
change watcher won't see the change again, hence not generating duplicate
alerts.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
…ess-alert

CA-394109: Reduce number of alerts
The new fingerprint_sha256 and fingerprint_sha1 fields will be empty
when upgrading from a version without the fields. This commit checks for
this and fills them in, stopping the certificate from being needlessly
reinstalled.

Signed-off-by: Steven Woods <steven.woods@citrix.com>
In nm.ml; helpers.ml.
Expands test cases, keeps the existent exception behaviour.

Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
Otherwise the tests are run as part of every package when installing them using
opam, leading to failures

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Use a function accepting printf-style arguments to raise an error and
simplify all the call sites.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Simplify the implementation.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Signed-off-by: Rob Hoes <rob.hoes@citrix.com>
Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
@BengangY BengangY marked this pull request as ready for review July 14, 2024 01:45
@minglumlu minglumlu merged commit 591916d into xapi-project:feature/non-cdn-update Jul 15, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants