-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-39894: Move most of the daemon management code to the service module and make pid locations explicit #4720
CP-39894: Move most of the daemon management code to the service module and make pid locations explicit #4720
Conversation
let path x = "unix:" ^ x | ||
|
||
let rm x = | ||
let dbg = debug "error cleaning unix socket %s: %s" x in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use "removing" as cleaning implies something like flushing.
| None -> | ||
false | ||
| Some p -> ( | ||
try Unix.kill p 0 ; (* This checks the existence of pid p *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this guaranteed behavior? I understand it works with the kill(1) command but does this work with the system call as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The man page for kill(2) is missing from centos7 hosts (and CH), but on my dev machine seems to confirm this behaviour:
If sig is 0, then no signal is sent, but existence and permission checks are still performed; this can be used to check for the existence of a process ID or process group ID that the caller is permitted to signal.
Sorry, I missed a lot of commits here |
47a1c7a
to
d269958
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good to me, and a useful exercise to consolidate the code for all of these mechanisms. A lot of churn though, so it needs careful testing, including upgrade scenarios with running VMs on the host.
d269958
to
a5c2ad0
Compare
a78ddb8
to
5306b89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing this PR on github is next to impossible: github won't show whether all the moved code is the same or not.
Locally with git I think it is possible with these git config settings:
[diff]
algorithm = histogram
colorMoved = dimmed_zebra
renames = copies
Then identically moved lines will be kind of greyish (i.e. easy to visually filter out on a black background as they're barely visible, but still there for context if needed).
And then use something like this (after having added paus
as a remote and fetched it), ignoring whitespace is a must otherwise indentation changes won't detect the moved code as identical:
git log --reverse --color-moved-ws=ignore-all-space -w -p origin/feature/vtpm..psafont/private/paus/pid-wait
I think some of the code moves got mixed together with also reformatting the moved code, e.g. git shows the vnc port path and pid path lines as having changed even if they're functionally the same. Not sure whether we want all the arg building code moved out of device, that seems to belong there (it tells you how to launch the device, and is specific to it, e.g. varstored and qemu) and results in a lot of code churn. I think we may be overcomplicating service launch detection here: the usual way a Unix process signals that it is "ready" is to open all sockets and everything it needs, fork off a child to run the actual daemon and then exit the parent, and this exit is detected by systemd to transition it from I haven't checked whether all our processes fork off a child, but perhaps that'd be a lot simpler way: no need to wait for xenstore or a pidfile, just wait for the forked process from forkexecd to exit successfully, which already has a function and would be a single mechanism instead of two. We could also delete quite a lot of this code then. If not all processes would support this "forking" detection method of launch we could perhaps add a few small patches implementing that to the daemons, which overall would be less complex code to maintain. (There might be a performance justification for avoiding the extra fork, especially in a PV Dom0 forking is expensive as it triggers a lot page map copies, but I don't think the added complexity here would justify the tiny perf advantage we might have by avoiding the fork, setting up the inotify is probably not exactly free either.) |
I can't spot anything particularly wrong with the changes here though, but it'll be difficult to merge this to master without also merging the vTPM code itself (since it depends on it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably merge the xenopsd/ changes of feature/vtpm to master fairly soon, we've diverged quite a lot and if xenopsd changes on master we'll have lots of conflicts to solve. The swtpm
daemon is not yet part of rt-next and if we don't merge the xapi
side of the code this code should be mostly inactive, so we should just do some regression testing on a branch with ocaml/xenopsd commits cherry-picked.
This way it's harder to for code elsewhere to get a reference to inactive / non-existent sandboxes. Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
This needed some bindings to move to xenops_utils or the PV_Vnc module in service. The change helps set up a baseline for the more complex changes around moving the arg building and reducing the surface of the service submodules. Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
This allows to remove the waiting function from the interface Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
This forces to move the wait_path function there as well as the efivar variables. They all need to be exposed in the interface so they can be used from device.ml. Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
This allows to hide the arg-building into the module, as well as removing wait_path from the interface. Not all starting code can be moved, as this creates a module dependency loop because of the use of the PCI module. This means that the usage of the xenstore key as a cancellable watch has to leak through the interface. Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
This forces the daemon modules to define which of the two paths they use to advertise availability. The translation is as follows: use_pidfile = false means that only xenstore is used use_pidfile = true means that both xenstore and the filesystem are used, depending on situation. Note that this exposes a current issue in swtpm: it is forced to define a xenstore path when it is not used at all and the filepath is hardcoded in the waiting function. Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
This allows for clearer meaning in the paths offered by the module, as well as removing the xenstore path by integrating the stop code in it and simplify building of the arguments, which hasn't been pulled into the module due to complexity. The Vusb code has been simplified, it was checking twice the pid of Qemu, this removes a race condition that could lead to a spurious error. Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
This allows for clearer logic regarding which conditions trigger each cleanup. Also quite a bit of code from the functions can be removed as the client modules take control of the code. Prevent error messages when trying to shutdown swtpm and varstored for a domain that does not need them so they are not running. Instead print an informational message about not bothering to shut it down. Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
This means the code will not try to read or write to xenstore for any operation regarding swtpm and the filesystem will be cleaned up as needed as well. Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
Also replaces the code to read the pidfile to use the common function in Unixext. Signed-off-by: Pau Ruiz Safont <pau.safont@citrix.com>
5306b89
to
6e4a3ca
Compare
We'll probably need some retouches around the interface for this, or be conservative about the changes brought into |
The PR is better reviewed commit-by-commit. The biggest change (10c0291) should only move around the code and only modify the callers' side.
Xenops and the daemons until now have used 2 ways of communicating the pid to the former: a xenstore key and optionally a file. We want to move to a model where we can use only a pidfile for some daemons (SWTPM, mainly).
Previously the path pidfile was implicit and the code to fetch it was shared among all the daemons meaning that even if the daemon always provided a pidfile, their clients always had to take into account the case where there was no file to process.
These changes force all the damon modules to declare the locations where the pidfile is expected to be, and only expose the pidfile in the way the daemon actually uses it.
Having all this logic in a separate module also help us to maintain an interface and not allow clients to access all the internal workings, facilitating future refactors (on how to wait for a daemon, for example)
vTPM suite: SR 170394 green
ring3 BVT+BST: SR170395 green