-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PVF: separate worker binaries and build with musl #7147
Conversation
The worker binaries must be built first so that the host could embed them into `polkadot`. Therefore `pvf/worker` could not depend on `pvf`, so to remove the dependency, common functionality was extracted into `pvf/common`. (NOTE: We already needed to do this host/worker/common separation as part of https://github.com/paritytech/polkadot/issues/7116, it's just unfortunate that it had to be done here and complicate this PR.) Integration tests were moved from `pvf/worker/tests` to `pvf/tests` because they need the PVF host.
Why do we need separate binaries instead of a single one, given the code is pretty compact? |
node/core/pvf/src/host.rs
Outdated
@@ -851,15 +892,83 @@ fn pulse_every(interval: std::time::Duration) -> impl futures::Stream<Item = ()> | |||
.map(|_| ()) | |||
} | |||
|
|||
// TODO: Should we purge unneeded binaries? | |||
// TODO: Test on windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using POSIX sockets API for node-worker communication, windows is long not supported
Wow, you decided to just implement it instead of discussing infinitely on the forum? I love it! 😁
And if they are already there, how do you know if they are of the same version as the node? And if they are worker binaries at all, not malicious binaries? Probably it's a good idea to either check hashes or always overwrite them?
Are we guaranteed to have a writeable location? For sure we do for storage purposes, but is that location guaranteed to allow executable files to be written (it can have |
CC @wpank (affects validators UX significantly) |
One binary may have functionality/dependencies the other one does not etc. With distinct binaries we can apply an appropriate level of sandboxing to each one separately. There are other benefits I liked, such as having a separate LOG_TARGET for each one, and I also think codewise it's a nicer way to organize it.
We do include the version in the file name, but checking expected hashes is a good idea. I'll do it in a follow-up.
Really good questions. Maybe we should first try the database dir, then try the directory we are executing from (with the
I don't see how it does. It's still a single binary from their perspective. I think it's very likely that we have to split out the binaries for our security work, and this is a nice solution that doesn't require changes from the validators or our release process. |
Why can't sandboxing level set dynamically based on which functionality is invoked from a single binary ? Also separate log targets can be achieved without separate binaries. |
We could, but @koute's syscall script analyzes the entire binary, so it would pick up things common to both workers, which is more than strictly necessary. And he mentioned ROP gadgets: smaller binaries means a lower surface area for attacks.
Oh, when I tried with a configurable Stepping back, why would we want to have both workers in the same binary?
I've been thinking about this and I'm now unsure about extracting executables. Mounting Footnotes
|
It actually does not seem possible on MacOS because there is no equivalent of On Linux there are no issues. With |
But currently we only want to achieve that we can run something and have logging enabled? I just realized that instead of separating the workers etc right now, we could just compile the entire node for musl. Then we can start running with sandboxing and log the syscalls being done. In parallel we (you) can then look into KVM. Should speed up the entire process quite a bit. |
@bkchr I thought based on this comment that splitting the binaries should be done before KVM. So my plan was to do this, enable the logging, and then switch to KVM. Anyway, with this PR being close to finished I don't think it should be abandoned now. |
It was just about enuring that we can have this running everywhere without needing any changes by operators right now. But on Linux with |
Update: some issues were discovered when working on "3. worker binary extraction". After discussing with @s0me0ne-unkn0wn and @eskimor, it seems better not to do this and to instead properly distribute multiple binary files; I opened https://forum.polkadot.network/t/ux-of-distributing-multiple-binaries-take-2/2854. In the meantime, I'll proceed along these lines (#7147 (comment)):
I'll close this PR if there are no objections. I'll raise another PR with just the "1. Worker refactor" part. |
The issues with "3. worker binary extraction" are:
Maybe the above issues can be worked around too, but it's turning into multiple layers of hacks. Would be much better to use standard mechanisms that work on both platforms. |
This pull request has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/ux-of-distributing-multiple-binaries-take-2/2854/8 |
Should be solvable by: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#artifact-dependencies
Unpacking to |
AFAIK that should work even on macOS? What exactly was happening? Was the process failing to launch? Was
There will be anyway with the sandboxing, and to me this seems like a relatively easy thing to just abstract over in single OS-specific module, no?
That's not really specific to extracting worker binaries though, just to the way we were building them. (: Even if we'll go with multiple binaries we'll still have to build them when the Anyway, I'm not really convinced. To me having multiple separate binaries brings out its own set of problems:
All in all I personally don't really like the idea of distributing multiple binaries as it just complicates deployment and, more importantly, pushes extra complexity on both our users and ourselves (we embed the WASM runtimes into the binary for a reason, and those reasons also apply here), instead of just isolating it into one part of the code, so I'd rather not do that unless there are some very convincing arguments, and so far I'm not really convinced. (: Also, a side note: if the end game is to have the workers sandboxed using KVM then there's not much point in distributing the binaries separately because they won't have to be normally executed anyway. (But I guess you could argue that we'll still need a fallback anyway if KVM's not available.) |
Thanks, that might be useful if we stick to this approach. Have you used it before? Would we need more hacks?
We discussed that we don't want to use the FS for this in secure mode. For example here I mentioned that we should encourage secure validators to clamp down on permissions, so spawning from disk wouldn't work in that case. About the additional option, ideally spawning the processes doesn't infect more code than it has to and especially not the UX.
The process was launching but there weren't any errors. I wasn't able to diagnose it, but my guess was that some bug or security feature was introduced since the Actualy, I just tried to poll the match poll!(&mut handle) {
Poll::Ready(_) => println!("ready"),
Poll::Pending => println!("pending"),
} I guess this is enough to keep the process alive. If this is the last hack we need, I might be okay with this...
Well with the sandboxing there are just two codepaths, secure mode and non-secure mode. With additional Mac-specific code there would be secure mode, Linux nonsecure mode, and Mac nonsecure mode. But yeah, I am keeping everything isolated to modules where possible, but if we also needed some cleanup code, that would infect the startup code in addition to process spawning. Even with good isolation of the code, it's more that we have to document and test, and that readers have to understand when the abstraction inevitably leaks.
If we build the whole node with musl as @bkchr suggested then we wouldn't need the binary builder at all, the workers would just be separate binary crates that are built in the regular cargo build. Otherwise, I guess we can try to fix the binary builder.
For (1), we are having a discussion about how to distribute multiple files. I honestly don't see the issue with it, it's a standard practice. (2) I think is addressed if we build the binaries during the cargo build?
We have this. :) #6861
I mean honestly, the wasm-builder is extremely hacky, although I guess it's worked fine so far... But it's not guaranteed to work forever, and every time a new I don't think the single-binary approach is sustainable in the long-term. Are we going to keep piling hacks on top of each other for the next 20 years? We should do this properly, and then we will all sleep better at night. But that is just my opinion as a relative newcomer to the project. |
One thing I forgot to mention is that the binary extraction breaks the PVF tests on Mac, particularly the ones that rely on specific timing information. We extract the binaries on-demand for each job, which adds 1-2s to each test (on my machine, it will differ depending on hardware). I guess we could add test-specific code to only do it once before all tests are run and clean up after, if that's possible, though that's even more complexity. I guess we could disable the timing-specific ones on Mac and just live with slow tests on this platform. |
Did you looked at it?
But you only run the workers in secure mode? The node itself needs access to write to storage. You first write out the binary and then execute it in "secure mode". Not sure what else you mean.
We don't use any undocumented features or whatever of cargo. Not sure why it should break? We only use what is available. You can also not do any "version" checks. Before the wasm builder we had wasm files for genesis checked in git and a bash script to compile them. People constantly forgot to recompile and were running into issues. Thus, I added the wasm builder. For sure it isn't perfect and there are often too many recompilations. However, there are no hacks and something failing or similar.
Why we extract them per job? Why isn't this done on startup? |
Yes, it doesn't help. There's a grand total of two sentences of documentation. And it's a nightly feature. What would help me is to know whether someone's used it before and can confirm it works, and isn't going to introduce more issues. It would save me a lot of time vs. experimenting with it myself.
We were planning to provide ways to run as a validator without secure mode. And for sure it needs to work on dev machines. And the node needs write access but not exec access, see above comments about
Thanks for the context, that helps explain why we're wary of multiple files. I didn't mean to be dismissive of wasm-builder, it works well for what it does. But I think we wouldn't need it if we built the whole node with musl, assuming that idea is still on the table.
Each test is self-contained and has its own validation host. |
Take a nightly compiler and let's go? The docs look rather easy. You import another crate, define the target, bin etc and then you can reference it using a env variable.
Yes for sure, but giving it
But that doesn't help with the points of compiling all the binaries.
Ahh sorry, I thought by test you meant when you run the node on your machine doing some "tests" and not the actual tests. |
I'll try it if we decide to embed binaries after all. Nightly does mean the feature can technically be pulled if there are major issues with it, but it's very unlikely.
Yep. There were other issues mentioned but they can all be worked around. If we always did this we at least have the same code for Mac and Linux.
If the binaries were subcrates wouldn't they inherit the same target? I don't know how to compile only the binary crates with musl - something like this forces the user to be on linux: [profile.release]
target = "x86_64-unknown-linux-musl" On the other hand, this isn't possible AFAIK: [profile.release.x86_64-unknown-linux-musl]
target = "x86_64-unknown-linux-musl" |
Nightly probably more means that there can be breaking changes. I mean it could still be pulled, but I would say it is very unlikely.
I mean that isn't the problem we are speaking about. Currently when you want to work with Polkadot you only need to run |
We can just add the workers to Here's a summary of the problems so far and my proposal at the end.
Proposal:
I think this solves all our problems in a simple and clean way but let me know if I missed something. |
This pull request has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/ux-of-distributing-multiple-binaries-take-2/2854/27 |
In the docs of the feature. This sounds exactly like what we need? Otherwise I would not have brought this up...
I said now multiple times that we need some guide and you come up with some script again. I don't get it... Writing these scripts is NOT YOUR JOB. You need to provide all the requirements etc in a clear and understandable way. So, that people who write these packages can follow this guide. |
BTW, if we would bundle the workers in the polkadot binary. We could also add some CLI arg "export-workers" that export workers to the given path. This could be used in systemd scripts before starting the node binary to put the workers into some directory and the "lock down" the node. |
I just forgot about the guide suggestion. My proposal for a guide would be:
It's only a proposal though. At least we're not blocked on anything technical anymore, just on figuring out our exact requirements for the guide.
Indeed, I totally missed that there were more docs. 🤦♂️ I still think it's not ideal to rely on an unstable feature, but it's good to have this as a backup option.
That sounds like a decent backup option to have. It doesn't sound like a simple UX though. I think we were pushing so hard for a single binary because of the apparent simplicity on the user side. |
After extensive discussion (see [here](paritytech/polkadot#7147) and [here](https://forum.polkadot.network/t/ux-of-distributing-multiple-binaries-take-2/2854)), it's been determined that we will be distributing multiple binaries along with `polkadot`. I'm updating the Validator Guide to reflect this. To make it easier for validators, this may be the basis for some script in the future if that is desired.
The CI pipeline was cancelled due to failure one of the required jobs. |
Closing in flavor of #7337 |
…4860) * Validator Guide: add instructions for installing multiple binaries After extensive discussion (see [here](paritytech/polkadot#7147) and [here](https://forum.polkadot.network/t/ux-of-distributing-multiple-binaries-take-2/2854)), it's been determined that we will be distributing multiple binaries along with `polkadot`. I'm updating the Validator Guide to reflect this. To make it easier for validators, this may be the basis for some script in the future if that is desired. * Add a helpful note * Update verification instructions * Update docs/maintain/maintain-guides-how-to-validate-polkadot.md Co-authored-by: Will Pankiewicz <9498646+wpank@users.noreply.github.com> --------- Co-authored-by: Will Pankiewicz <9498646+wpank@users.noreply.github.com>
PULL REQUEST
Overview
This PR does several related things:
pvf/prepare-worker
andpvf/execute-worker
), which do not depend on the pvf host crate (pvf
).pvf/binary-builder
that builds the worker binaries as part of the cargo process.polkadot
at compile time, and extracts them at runtime.More info below.
1. Worker refactor
This PR does an additional refactor on top of #7101.
The worker binaries must be built first and embedded in
polkadot
so that the host could extract them. Thereforepvf/prepare-worker
andpvf/execute-worker
could not depend onpvf
, so to remove the dependency, common functionality was extracted intopvf/common
.Integration tests were moved from
pvf/worker/tests
topvf/tests
because they need the PVF host.2.
binary-builder
I initially tried some hacks to build the binaries here, but eventually realized that in order to be able to access the built binaries at compile time (through the
OUT_DIR
env var) we needed something like Substrate'swasm-builder
. So I copy/pasted thewasm-builder
code, removed references to "wasm", and re-jigged it to make it work with non-wasm targets.It will use the current target to build, unless we're on a "secure-mode" platform (atm just Linux x86_64) in which case it tries to build with musl.
Reviewers may find it helpful to diff
binary-builder
with Substrate'swasm-builder
.3. worker binary extraction
This is fairly straightforward. If the binaries are not already present, we decompress with
zstd
and write them toa tmpthe database location (similar to artifacts), with executable permissions.The only gotcha here is that we still have to support the old
program_path
parameter which is still used for puppet binaries in tests. This is now anOption
. (Now that I think about it we can make it#[cfg(test)]
.)TODO
program_path
parameter behind#[cfg(test)]
?Related issues
Closes paritytech/polkadot-sdk#650