PVF worker: restrict network access #619

mrcnski · 2023-06-02T20:30:19Z

ISSUE

Overview

We're already working on sandboxing by blocking all unneeded syscalls. However, due to the wide scope it will take a while longer. I suggest starting with a much smaller scope and just blocking network access until the above is ready.

Note that we already have filesystem restriction coming soon. With network access also blocked, we will have decent sandboxing in place until the larger work gets in.

Proposal

birdcage simply blocks socket creation, maybe we can do the same.

The main risk with this, as with our other seccomp work, is missing syscalls on some OS/architecture or new syscalls being introduced. Currently there are only socket and socketpair, and since we are now targeting Linux x86-64, this should be tractable. For example, we can set up a regular job that downloads the syscall table from the latest kernel version and makes sure no new syscalls containing "socket" were added. However, the risk of a new syscall appearing, and libc libraries using it, in the immediate-term is very low, so I think it shouldn't block this development.

The text was updated successfully, but these errors were encountered:

We're already working on sandboxing by [blocking all unneeded syscalls](#882). However, due to the wide scope it will take a while longer. This PR starts with a much smaller scope, only blocking network-related syscalls until the above is ready. For security we block the following with `seccomp`: - creation of new sockets - these are unneeded in PVF jobs, and we can safely block them without affecting consensus. - `io_uring` - as discussed [here](paritytech/polkadot#7334 (comment)), io_uring allows for networking and needs to be blocked. See below for a discussion on the safety of doing this. - `connect`ing to sockets - the above two points are enough for networking and is what birdcage does (or [used to do](phylum-dev/birdcage#47)) to restrict networking. However, it is possible to [connect to abstract unix sockets](https://lore.kernel.org/landlock/20231023.ahphah4Wii4v@digikod.net/T/#u) to do some kinds of sandbox escapes, so we also block the `connect` syscall. (Intentionally left out of implementer's guide because it felt like too much detail.) `io_uring` is just a way of issuing system calls in an async manner, and there is nothing stopping wasmtime from legitimately using it. Fortunately, at the moment it does not. Generally, not many applications use `io_uring` in production yet, because of the numerous kernel CVEs discovered. It's still under a lot of development. Android outright banned `io_uring` for these reasons. Considering `io_uring`'s status, and that it very likely would get detected either by our [recently-added static analysis](#1663) or by testing, I think it is fairly safe to block it. If execution hits an edge case code path unique to a given machine, it's already taken a non-deterministic branch anyway. After all, we just care that the majority of validators reach the same result and preserve consensus. So worst-case scenario, there's a dispute, and we can always admit fault and refund the wrong validator. On the other hand, if all validators take the code path that results in a seccomp violation, then they would all vote against the current candidate, which is also fine. The violation would get logged (in big scary letters) and hopefully some validator reports it to us. Actually, a worst-worse-case scenario is that 50% of validators vote against, so that there is no consensus. But so many things would have to go wrong for that to happen: 1. An update to `wasmtime` is introduced that uses io_uring (unlikely as io_uring is mainly for IO-heavy applications) 2. The new syscall is not detected by our static analysis 3. It is never triggered in any of our tests 4. It then gets triggered on some super edge case in production on 50% of validators causing a stall (bad but very unlikely) 5. Or, it triggers on only a few validators causing a dispute (more likely but not as bad?) Considering how many things would have to go wrong here, we believe it's safe to block `io_uring`. Closes #619 Original PR in Polkadot repo: paritytech/polkadot#7334

mrcnski · 2023-10-31T11:24:39Z

Still need to enable the restrictions after one release.

mrcnski · 2023-11-05T13:14:51Z

Still need to enable the restrictions after one release.

Raised #2163.

mrcnski added the T4-parachains_engineering label Jun 2, 2023

This was referenced Jun 2, 2023

PVF worker: restrict access to env vars paritytech/polkadot#7326

Closed

PVF worker: restrict networking paritytech/polkadot#7334

Draft

s0me0ne-unkn0wn mentioned this issue Jun 19, 2023

98.6% OF DEVELOPERS CANNOT REVIEW THIS PR! [read more...] paritytech/polkadot#7337

Merged

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added T8-parachains_engineering and removed T4-parachains_engineering labels Aug 25, 2023

the-right-joyce removed the T8-parachains_engineering label Oct 23, 2023

mrcnski mentioned this issue Oct 24, 2023

PVF worker: Add seccomp restrictions (restrict networking) #2009

Merged

mrcnski closed this as completed in #2009 Oct 31, 2023

mrcnski reopened this Oct 31, 2023

mrcnski closed this as completed Nov 5, 2023

mrcnski self-assigned this Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVF worker: restrict network access #619

PVF worker: restrict network access #619

mrcnski commented Jun 2, 2023 •

edited

Loading

mrcnski commented Oct 31, 2023

mrcnski commented Nov 5, 2023

PVF worker: restrict network access #619

PVF worker: restrict network access #619

Comments

mrcnski commented Jun 2, 2023 • edited Loading

ISSUE

Overview

Proposal

mrcnski commented Oct 31, 2023

mrcnski commented Nov 5, 2023

mrcnski commented Jun 2, 2023 •

edited

Loading