Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update feature/vm-anti-affinity by merging from master #5658

Conversation

gangj
Copy link
Contributor

@gangj gangj commented May 30, 2024

No description provided.

freddy77 and others added 30 commits April 27, 2024 08:10
fail is always used in combination with Printf.sprintf so combine the two.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
More compact code.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
"exe" variable is compute in mutiple functions, not very expensive to
compute once.
Reduce code.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Make sure we release the "sock" file descriptors in all cases.
Add test trying to reproduce the issue passing an invalid file descriptor; not
a perfect reproduction but failure in this function can happen for instance if
daemon is restarted.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
We just want information about the current process.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Instrument:

- `forkhelpers.ml`,
- `fecomms.ml`

to create spans around functions when a parent span is supplied as
`tracing`.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Currently the `observe` mode of the `tracing` library do not work
corrently resulting in the logs being spammed by this warning.

Comment it out so that the logs do not become too big (for the time
being).

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
…/CP-48195-instrument-forkexecd-client

CP-48195: Instrument client side of `forkexecd`
- Use `Types.checkError` instead of throwing a generic `XenAPIException`. This ensures the `Types.XYZ` family of exceptions are being used
- Use `@JsonValue` to ensure base class objects are deserialised as a simple opaque_ref string, as opposed to a mapping of each field. This ensures the API's behaviour is unchanged.
- Parse the results of `task.getResult` calls. The jsonrpc method returns value payloads of the form `"value" : "<value>OpaqueRef:XYZ</value>"` so we need to strip the surrounding XML

Signed-off-by: Danilo Del Busso <danilo.delbusso@cloud.com>
…e build @check

Bytecode builds for `http_lib` are disabled due to '(modes best)',
and that means that anything that depends on it must have it disabled too to avoid this warning.

Avoids these kinds of warnings:
```
File "_none_", line 1:
Error: Module `Buf_io' is unavailable (required by `Http_svr')
```

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
…ries

There were 3 modules with conflicting names with compiler libraries: Watch,
Debuginfo and Stats. Debuginfo was renamed, the others's libraries were changed
to be wrapped.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Pinning the libraries runs dune subst, which needs a project name, define it

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
It is not safe to access a global hashtable from multiple threads, even if the operations are read-only
(it may be concurrently changed by another thread, which then may result in errors in the racing thread).

This means we must always take the mutex, and because OCaml doesn't have a reader-writer mutex, we need to take the exclusive mutex.

Eventually we should use a better datastructure here (immutable maps, or lock-free datastructures), but for now
fix the datastructure that we currently use to be thread-safe.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
In preparation for OCaml 5, on OCaml 4 they'd be equivalent.

Note that adding Atomic doesn't make operations on these values always atomic: that is
the responsibility of surrounding code.
E.g. Atomic.get + Atomic.set is not atomic, because another domain might've raced and changed the value inbetween
(so in that case Atomic.compare_and_set should be used).

However for global flags that are read multiple times, but set from a central place this isn't a problem.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Sets `observe` to `false` until at least one tracer provider is enabled.
If all tracer providers are disabled we set it back to `false`.

Prior to this `observe` seemed to be always `true` (apart from unit
tests). This would cause `with_tracing` to spam the logs with warnings
`No provider found...` until at least one tracer provider is enabled.

By setting `observe` to `false` as default and updating it depending on
the state of the tracer providers, `with_tracing` should now execute no
extra operations. Therefore, we avoid spamming the logs unnecessarily.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Do not output loglines that are part of the normal operation. Use debug for
them, they are not usually logged, but can be enabled if need be by changing
the loglevel

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
New version got released

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
The query on HTTP endpoint /updates will return the available updates in
JSON format. Prior to the changes in this commit, if a query arrives
when another query is being handled, a "GET_UPDATES_IN_PROGRESS" error
will be returned immediately. This behaviour is not friendly to GUI
client XenCenter.

In this commit, the behaviour is changed to wait and retry in handling
the query in xapi since the "*_IN_PROGGRESS" error is a transient
failure. Tolerating it in xapi (server) side avoids error handling in
client side.

With the change, the "GET_UPDATES_IN_PROGRESS" will not be an error
exposed to users any more. Therefore it is removed.

Signed-off-by: Ming Lu <ming.lu@cloud.com>
…389319

CA-389319: Wait and retry for GET_UPDATES_IN_PROGRESS
As part of a start, resources are allocated for a VM in "scheduled_to.."
fields. These need to be cleared if the start fails. It turned out that
this was incomplete for PCI slots and those were leaking. This patch
tries to be more systematical about it.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Create a new unit test file `test_tracing.ml` for the `tracing` library.
Add tests for `create`/`set`/`destroy` tracer providers.

Now that we change the library to have `observe` modes based on whether
or not we have `tracer providers` enabled, we want to make sure that
functions on applied on tracer providers set the correct mode for the
library.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Removes code duplication in `storage_mux.ml` by using the
already existing `with_dbg` implementation from `debuginfo` module.

This should lower the chances of unintentionally introducing bugs by
having to maintain two version of the same functions. e.g. Not using the
no op when tracing is disabled and generating unwanted warning messages.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Moves the following:

- `create` under `TracerProvider`;
- `set` under `TracerProvider`;
- `destroy` under `TracerProvider;
- `get_tracer_providers` unde `TracerProvider`;
- `get_tracer` under `Tracer`.

Adds documentation for `TracerProvider` module.

It kept being confusing of what `Tracing.set` does unless I was going through
the implementation again and again. Therefore, I moved some of the
functions so that their functionality becomes (hopefully) more intuitive.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
This is mostly copied from the jekyll version at xapi-project.github.io and
amended for Hugo. The main difference is that all XenAPI pages have now been
integrated with the main menu in the left side bar, including the menu of the
XenAPI classes.

Some automation is left to do in order to take the `doc/data` files from a
build of `ocaml/idl` and update the class pages using the script
`doc/make-class-pages.py`. Also the overview of xapi releases is still missing,
as well as some of the API guides.

Finally, there is some overlap with markdown files in `ocaml/doc`, which needs
to be sorted out.

Signed-off-by: Rob Hoes <rob.hoes@citrix.com>
From Xen 4.19 onwards the legacy paths disappeared and the only valid path for
pygrub is /usr/libexec/xen/bin/pygrub. This path has always been preferred, but
now it's mandatory.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Add a new field `cluster_stack_version` to the cluster datamodel to
track the version of corosync currently in use. This version will
always be set to 3. Also add logic to switch corosync binary and
associated library versions when a cluster is created, if needed.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
freddy77 and others added 26 commits May 16, 2024 09:52
In case of getting the list of files we called Unix.readlink twice for every file
descriptor.
In case we wanted just the count we computed the list anyway.
Factor out code to avoid calling function twice and do not compute the list
is not needed.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
The VM power state should be preserved on a suspended VM import.

In commit ebb58a8, the power state and suspend VDI in `vm_record` were
reset on importing a suspended VM. This meant to facilitate the
following `Client.VM.create_from_record`. But this gets the restore of
power state and suspend VDI broken as it requires the data in
`vm_record`.

This commit preserves the data in `vm_record` and reset it just before
it being passed to `Client.VM.create_from_record`.

Signed-off-by: Ming Lu <ming.lu@cloud.com>
CA-392930: Fixed exception handling which prevents connection
Instruments `xapi_session.ml` to create span around session logins.

Adds new attribute for spans:

- `xs.xapi.session.originator`

where available.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
…orosync3-basic

CP-48027: Corosync upgrade add `cluster_stack_version` datamodel change
…392836

CA-392836,CA-392847: Lost the power state on suspended VM import
This forces the failure on a host that is trying to perform corosync
upgrade. There are ways to recover: if the failure happens early, before
the cluster is created in the DB, then a recreate ought to fix the
problem. This happens when the corosync upgrade fails on the
coordinator.

If the failure happens after the cluster is created on a pool member,
then a `pool-resync` should help retry this upgrade.

Hopefully this can simulate some of the failure paths, but is by no
means exhaustive. Other more complicated failures are not easily
recoverable and therefore not simulated for now.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Refactor the representation of Kerberos Domain Controller (KDC) to better
support IPv6. It became apparent that the current implementation assumes
standard port 88 being used by a KDC - which we should relax in the
future.

Capture a KDC configuration as KDC.t and provide functions to create,
serialise, log a value. Represent the IP address as Ipaddr.t (which can
be IPv6 or IPv4).

The existing implementation does not properly use a domain controller's
port number that it obtains from a `net lookup kdc`:

* the port number is not passed to other net commands that query the
  domain controller

* the port number is not stored in the xapi database (along the IP
  address).

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
We want to store the KDC address as IP/port in xapi using the existing
database field. We use a new custom URI. We support the existing
(simpler legacy) scheme by trying it first when readign the field.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Remove possible file descriptor leak if safe_close_and_exec fails
…orosync3-upgrade-fist

CP-49635: Add FIST point for corosync upgrade
This unblocks the Storage: BST to use distributed tracing as the sr_uuid
parameter is sometimes not given.

Signed-off-by: Steven Woods <steven.woods@citrix.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
It's unused outside of the repository

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
…-391381

CA-391381: Avoid errors for Partial Callables in observer.py
Remove CVM and relevant test cases

CVM had been supported but not since long time ago.

This commit is to clean up the remanent code and test cases.

Signed-off-by: Ming Lu <ming.lu@cloud.com>
@gangj
Copy link
Contributor Author

gangj commented May 30, 2024

$ git show 5a8614fc51
commit 5a8614fc5160fcb2f8ec666c615e24b43de7ccf1 (HEAD -> feature/vm-anti-affinity.8-24.05.30-master.merged, my_gh/private/gangj/merge_from_master)
Merge: 6529c0b6d c0e5dc454
Author: Gang Ji <gang.ji@citrix.com>
Date:   Thu May 30 10:20:21 2024 +0800

    Merge branch 'master' into feature/vm-anti-affinity

diff --cc ocaml/idl/schematest.ml
index 2d839624f,def14e581..b7d9d8ed0
--- a/ocaml/idl/schematest.ml
+++ b/ocaml/idl/schematest.ml
@@@ -3,7 -3,7 +3,7 @@@ let hash x = Digest.string x |> Digest.
  (* BEWARE: if this changes, check that schema has been bumped accordingly in
     ocaml/idl/datamodel_common.ml, usually schema_minor_vsn *)

- let last_known_schema_hash = "024b6ce32246d7549898310675631d51"
 -let last_known_schema_hash = "2b8b5b107eb465e97d35a68274ac18ef"
++let last_known_schema_hash = "242a81f4f769291456431be3f75ef783"

  let current_schema_hash : string =
    let open Datamodel_types in
diff --cc ocaml/xapi/xapi_globs.ml
index ca913c572,1ddc98250..15a5664ed
--- a/ocaml/xapi/xapi_globs.ml
+++ b/ocaml/xapi/xapi_globs.ml
@@@ -1023,8 -1025,8 +1025,10 @@@ let python3_path = ref "/usr/bin/python
  let observer_experimental_components =
    ref (StringSet.singleton Constants.observer_component_smapi)

 +let pool_recommendations_dir = ref "/etc/xapi.pool-recommendations.d"
 +
+ let disable_webserver = ref false
+
  let xapi_globs_spec =
    [
      ( "master_connection_reset_timeout"

@robhoes robhoes merged commit 3dc2b9e into xapi-project:feature/vm-anti-affinity May 30, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.