PVF: Research the use of CPU virtualization for PVF execution #652

sandreim · 2023-04-18T16:42:44Z

For an improved security posture, we should consider running the PVF on a KVM virtualized CPU. As I/O is not required and communication with the host can happen via a memory mapped region there shouldn't be any performance degradation even in the case of nested virtualization.

bkchr · 2023-04-19T09:06:56Z

Sounds like this could be easier than using seccomp?

sandreim · 2023-04-19T09:11:53Z

No, it is not. Hardware virtualization provides far better security than any other software sandboxing technology. AWS , GCP and other cloud providers are using KVM to virtualize their compute and memory.

sandreim · 2023-04-19T09:14:40Z

Building a PoC would help to understand any shortcomings in the context our very specific usecase. We don't even need device emulation or a real VM, just a single virutalized CPU with a few memory regions. We could easily fork https://github.com/firecracker-microvm/firecracker and strip it down to our basic needs.

bkchr · 2023-04-19T09:28:23Z

No, it is not. Hardware virtualization provides far better security than any other software sandboxing technology

Yeah for sure, I know :D What I meant is that doing this is much better then trying to predict what kind of syscalls we are doing and prohibiting the others. I'm still afraid that this will fail at some point when we oversee some syscall.

sandreim · 2023-04-19T09:31:39Z

Yes, I am also afraid of stalling the chain because we did not add a syscall to whitelist. seccomp is a defense in depth measure, when all else has failed and you want to reduce the blast radius of the incident.

mrcnski · 2023-04-19T10:20:28Z

No, it is not. Hardware virtualization provides far better security than any other software sandboxing technology

Yeah for sure, I know :D What I meant is that doing this is much better then trying to predict what kind of syscalls we are doing and prohibiting the others. I'm still afraid that this will fail at some point when we oversee some syscall.

That's definitely a concern. I've been putting in a lot of work to mitigate that. Using @koute's script at build-time should mostly prevent even building a binary that contains disallowed syscalls. (In practice though, the syscalls that are actually used by the worker threads are quite few (about a dozen).)

But yeah, we want additional measures because just blocking syscalls is not enough -- if an attacker can break out of the WASM sandbox they probably can also get out-of-bounds memory, which essentially gives them a source of randomness. So say they make the worker job vote against with 50% chance, that would stall the chain.

koute · 2023-04-19T10:57:42Z

Note that AFAIK seccomp should essentially work out-of-box everywhere while KVM might require some extra setup from the users (e.g. some distros disallow access to KVM and require the user to be added to a special kvm group), or could not work at all on machines without support for hardware virtualization (not sure how common is that nowadays).

But you're right that hardware virtualization would be technically more secure.

It should be worth it to make a proof of concept and test it out in practice.

if an attacker can break out of the WASM sandbox they probably can also get out-of-bounds memory, which essentially gives them a source of randomness

Hmm... preventing the attacker from acquiring a source of randomness is going to be tricky. In presence of remote code execution it's not possible to disallow access to a source of randomness. Even if you disallow things like creating threads or measuring time through seccomp (although IIRC grabbing the time on amd64 goes through the vdso shim, so it might not be even possible to sandbox that with seccomp as no syscalls are involved) you still have e.g. the rdrand hardware instruction which the attacker could execute to just ask the CPU directly for some random bytes. AFAIK the only way to prevent that is to use virtualization and set the appropriate VMX bit to make the VM abort when that's called. There might be more corner cases here.

sandreim · 2023-04-19T11:25:30Z

Note that AFAIK seccomp should essentially work out-of-box everywhere while KVM might require some extra setup from the users (e.g. some distros disallow access to KVM and require the user to be added to a special kvm group), or could not work at all on machines without support for hardware virtualization (not sure how common is that nowadays).

AFAIK it is widely supported, and if it is not the validator can run without it. Indeed it requires some extra setup, but this is a small price to pay for the increased security. It could easily be part of the validator setup guide.

But you're right that hardware virtualization would be technically more secure.

It should be worth it to make a proof of concept and test it out in practice.

Yeah, that is something we should pursue !

if an attacker can break out of the WASM sandbox they probably can also get out-of-bounds memory, which essentially gives them a source of randomness

Hmm... preventing the attacker from acquiring a source of randomness is going to be tricky. In presence of remote code execution it's not possible to disallow access to a source of randomness. Even if you disallow things like creating threads or measuring time through seccomp (although IIRC grabbing the time on amd64 goes through the vdso shim, so it might not be even possible to sandbox that with seccomp as no syscalls are involved) you still have e.g. the rdrand hardware instruction which the attacker could execute to just ask the CPU directly for some random bytes. AFAIK the only way to prevent that is to use virtualization and set the appropriate VMX bit to make the VM abort when that's called. There might be more corner cases here.

What we would need is to develop and maintain a CPU template that customizes what we expose in CPUID and allow. An example, which is for general purpose VMs can be seen here: https://github.com/firecracker-microvm/firecracker/tree/main/src/vmm/src/cpuid . AFAIK this was not really supported on aarch64, but things might have changed. In our case we would have something very restrictive, or maybe we want to enable some CPU instruction set extensions that WASM can use for increased performance.

bkchr · 2023-04-23T21:56:09Z

It should be worth it to make a proof of concept and test it out in practice.

@mrcnski I would highly recommend that this is done before moving forward with the seccomp implementation.

koute · 2023-04-24T00:41:45Z

@mrcnski I would highly recommend that this is done before moving forward with the seccomp implementation.

Yep. But I think we still can first do the work of splitting the worker into a separate binary (and stripping it as much as possible) without necessarily sandboxing it yet, as that will be necessary regardless of which approach we pick.

mrcnski · 2023-04-24T07:20:48Z

@mrcnski I would highly recommend that this is done before moving forward with the seccomp implementation.

Agreed. I already have much of the seccomp logging implemented, so IMO it makes sense to finish that before a big context switch. And then the logging can run on validators for a while while I work on virtualization. And yeah, I will first split out the worker binaries (without musl-builder for now).¹

Not having musl may make the syscalls less deterministic in theory, but in practice very few are triggered and the logging may show us that musl is not strictly needed. ↩

bkchr · 2023-04-24T07:39:22Z

Okay ty!

sandreim · 2023-04-24T10:25:51Z

@mrcnski I would highly recommend that this is done before moving forward with the seccomp implementation.

Agreed. I already have much of the seccomp logging implemented, so IMO it makes sense to finish that before a big context switch. And then the logging can run on validators for a while while I work on virtualization. And yeah, I will first split out the worker binaries (without musl-builder for now).1

Footnotes

Not having musl may make the syscalls less deterministic in theory, but in practice very few are triggered and the logging may show us that musl is not strictly needed. ↩

IMO virtualization PoC is a big rock to push uphill. We should shoud first enable the logging to at least collect some data while we work on the PoC.

Polkadot-Forum · 2023-05-14T16:54:25Z

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/ux-of-distributing-multiple-binaries-take-2/2854/1

alindima · 2023-09-08T07:21:18Z

I like this idea and I think it'll provide far superior than any process-level sandboxing indeed 👍🏻 but as Andrei said, it'll be challenging.

What we would need is to develop and maintain a CPU template that customizes what we expose in CPUID and allow. An example, which is for general purpose VMs can be seen here: https://github.com/firecracker-microvm/firecracker/tree/main/src/vmm/src/cpuid . AFAIK this was not really supported on aarch64, but things might have changed. In our case we would have something very restrictive, or maybe we want to enable some CPU instruction set extensions that WASM can use for increased performance.

Actually, even if a given CPUID template is set (which doesn't advertise a certain instruction), a guest program can still try to call that instruction (and it'll work). CPUID only helps with non-malicious guests, to provide a common view of the CPU features to all guests.
We'd still need to set the right VMX bits to trap on instructions like RDRAND, on top of CPUID masking.

* Unify rpc api naming Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Add some docs Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Make test more stable Signed-off-by: koushiro <koushiro.cqx@gmail.com>

* Revert "Pin Rust Nightly to 2020-12-17 (paritytech#652)" This reverts commit e54e6f7. * fix clippy * clippy again * more clippy in test code * and new cargo fmt * another try

* Revert "Pin Rust Nightly to 2020-12-17 (#652)" This reverts commit e54e6f7. * fix clippy * clippy again * more clippy in test code * and new cargo fmt * another try

sandreim added the T4-parachains_engineering label Apr 18, 2023

This was referenced Apr 25, 2023

PVF worker: separate worker binaries and build with musl #650

Closed

PVF: separate worker binaries and build with musl paritytech/polkadot#7147

Closed

mrcnski mentioned this issue May 15, 2023

PVF workers: consider zeroing all process memory #634

Open

s0me0ne-unkn0wn mentioned this issue Jun 19, 2023

98.6% OF DEVELOPERS CANNOT REVIEW THIS PR! [read more...] paritytech/polkadot#7337

Merged

mrcnski mentioned this issue Aug 6, 2023

PVF: Move landlock out of thread into process; add landlock exceptions paritytech/polkadot#7580

Draft

4 tasks

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added T8-parachains_engineering and removed T4-parachains_engineering labels Aug 25, 2023

the-right-joyce removed the T8-parachains_engineering label Oct 23, 2023

bkchr pushed a commit that referenced this issue Apr 10, 2024

Fix updated clippy grumbles (#733)

9011330

* Revert "Pin Rust Nightly to 2020-12-17 (#652)" This reverts commit e54e6f7. * fix clippy * clippy again * more clippy in test code * and new cargo fmt * another try

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVF: Research the use of CPU virtualization for PVF execution #652

PVF: Research the use of CPU virtualization for PVF execution #652

sandreim commented Apr 18, 2023

bkchr commented Apr 19, 2023

sandreim commented Apr 19, 2023

sandreim commented Apr 19, 2023 •

edited

Loading

bkchr commented Apr 19, 2023

sandreim commented Apr 19, 2023 •

edited

Loading

mrcnski commented Apr 19, 2023

koute commented Apr 19, 2023

sandreim commented Apr 19, 2023

bkchr commented Apr 23, 2023 •

edited

Loading

koute commented Apr 24, 2023

mrcnski commented Apr 24, 2023

bkchr commented Apr 24, 2023

sandreim commented Apr 24, 2023

Footnotes

Polkadot-Forum commented May 14, 2023

alindima commented Sep 8, 2023 •

edited

Loading

PVF: Research the use of CPU virtualization for PVF execution #652

PVF: Research the use of CPU virtualization for PVF execution #652

Comments

sandreim commented Apr 18, 2023

bkchr commented Apr 19, 2023

sandreim commented Apr 19, 2023

sandreim commented Apr 19, 2023 • edited Loading

bkchr commented Apr 19, 2023

sandreim commented Apr 19, 2023 • edited Loading

mrcnski commented Apr 19, 2023

koute commented Apr 19, 2023

sandreim commented Apr 19, 2023

bkchr commented Apr 23, 2023 • edited Loading

koute commented Apr 24, 2023

mrcnski commented Apr 24, 2023

Footnotes

bkchr commented Apr 24, 2023

sandreim commented Apr 24, 2023

Footnotes

Polkadot-Forum commented May 14, 2023

alindima commented Sep 8, 2023 • edited Loading

sandreim commented Apr 19, 2023 •

edited

Loading

sandreim commented Apr 19, 2023 •

edited

Loading

bkchr commented Apr 23, 2023 •

edited

Loading

alindima commented Sep 8, 2023 •

edited

Loading