Skip to content

Commit

Permalink
Update impl guide
Browse files Browse the repository at this point in the history
  • Loading branch information
mrcnski committed Nov 10, 2023
1 parent 8070074 commit 05555e7
Showing 1 changed file with 37 additions and 10 deletions.
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
# PVF Host and Workers

The PVF host is responsible for handling requests to prepare and execute PVF
code blobs, which it sends to PVF workers running in their own child processes.
code blobs, which it sends to PVF **workers** running in their own child
processes.

While the workers are generally long-living, they also spawn one-off secure
**job processes** that perform the jobs. See "Job Processes" section below.

This system has two high-levels goals that we will touch on here: *determinism*
and *security*.
Expand Down Expand Up @@ -36,8 +40,11 @@ execution request:
not successful.
2. **Artifact missing:** The prepared artifact might have been deleted due to
operator error or some bug in the system.
3. **Panic:** The worker thread panicked for some indeterminate reason, which
may or may not be independent of the candidate or PVF.
3. **Job errors:** For example, the worker thread panicked for some
indeterminate reason, which may or may not be independent of the candidate or
PVF.
4. **Internal errors:** See "Internal Errors" section. In this case, after the
retry we abstain from voting.

### Preparation timeouts

Expand All @@ -62,10 +69,16 @@ more than the CPU time.

### Internal errors

An internal, or local, error is one that we treat as independent of the PVF
and/or candidate, i.e. local to the running machine. If this happens, then we
will first retry the job and if the errors persists, then we simply do not vote.
This prevents slashes, since otherwise our vote may not agree with that of the
other validators.

In general, for errors not raising a dispute we have to be very careful. This is
only sound, if we either:
only sound, if either:

1. Ruled out that error in pre-checking. If something is not checked in
1. We ruled out that error in pre-checking. If something is not checked in
pre-checking, even if independent of the candidate and PVF, we must raise a
dispute.
2. We are 100% confident that it is a hardware/local issue: Like corrupted file,
Expand All @@ -75,11 +88,11 @@ Reasoning: Otherwise it would be possible to register a PVF where candidates can
not be checked, but we don't get a dispute - so nobody gets punished. Second, we
end up with a finality stall that is not going to resolve!

There are some error conditions where we can't be sure whether the candidate is
really invalid or some internal glitch occurred, e.g. panics. Whenever we are
unsure, we can never treat an error as internal as we would abstain from voting.
So we will first retry the candidate, and if the issue persists we are forced to
vote invalid.
Note that any error from the job process we cannot treat as internal. The job
runs untrusted code and an attacker can therefore return arbitrary errors. If
they were to return errors that we treat as internal, they could make us abstain
from voting. Since we are unsure if such errors are legitimate, we will first
retry the candidate, and if the issue persists we are forced to vote invalid.

## Security

Expand Down Expand Up @@ -119,6 +132,20 @@ So what are we actually worried about? Things that come to mind:
6. **Intercepting and manipulating packages** - Effect very similar to the
above, hard to do without also being able to do 4 or 5.

### Job Processes

As mentioned above, our architecture includes long-living **worker processes**
and one-off **job processes*. This separation is important so that the handling
of untrusted code can be limited to the job processes. A hijacked job process
can therefore not interfere with other jobs running in separate processes.

Furthermore, if an unexpected execution error occurred in the worker and not the
job, we generally can be confident that it has nothing to do with the candidate,
so we can abstain from voting. On the other hand, a hijacked job can send back
erroneous responses for candidates, so we know that we should not abstain from
voting on such errors from jobs. Otherwise, an attacker could trigger a finality
stall. (See "Internal Errors" section above.)

### Restricting file-system access

A basic security mechanism is to make sure that any process directly interfacing
Expand Down

0 comments on commit 05555e7

Please sign in to comment.