Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][SGX] Use Fortanix EDP instead of rust-sgx-sdk #2887

Closed
nhynes opened this issue Mar 24, 2019 · 13 comments
Closed

[RFC][SGX] Use Fortanix EDP instead of rust-sgx-sdk #2887

nhynes opened this issue Mar 24, 2019 · 13 comments

Comments

@nhynes
Copy link
Member

nhynes commented Mar 24, 2019

(WIP PR: #2885)

tl;dr: The Rust compiler now has support for an SGX target, so users can make their own enclaves using the unmodified TVM Rust runtime. In other words, TVM no longer needs to explicitly support SGX.

The current TVM SGX infrastructure is built on baidu/rust-sgx-sdk. The proposal is to replace r-s-s with fortanix/rust-sgx. Indeed, removing the rust-sgx-sdk completely obviates the need for TVM to explicitly support SGX.

Rationale

The Fortanix EDP is now a tier 3 target for Rust which vastly simplifies the build process compared to r-s-s which requires compiling a custom sysroot. Among other things, this

  • eliminates the need for adding SGX-specific codes to TVM (in fact, we can remove any mention of SGX from the TVM C++ and Rust codebases)
  • eliminates the need for adding xargo and a patched version of rust-sgx-sdk to the TVM Dockerfile
  • allows the use of more packages from the Rust ecosystem (e.g., the rand crate now has SGX support)
  • allows the use of the real Rust standard library (instead of the custom one designed by rust-sgx-sdk) and newer rustc nightlies. This benefits both correctness and security (more maintainers + reviewers).

Additionally, the EDP is a pure-Rust implementation of SGX enclaves. Compared to the Intel C++ implementation (which includes an entire C++ standard library), there is a smaller surface area of attack. That the EDP is part of the Rust compiler and is used by the Fortanix company means that the code is more actively maintained.

As shown in the updated SGX example, the EDP allows users to compile TVM modules into enclaves using nothing more than the unmodified TVM Rust runtime. Indeed, as the EDP allows running TCP servers in enclaves, all that must be done to provide a high-quality user experience for enclaves is to add TVM RPC support to the Rust runtime.

To address @tqchen's comment in #2885:

Dependency and license issue(The Fortanix one is MPL while the original one is BSD)

It actually doesn't matter because TVM under the new proposal doesn't even know that SGX exists. All we need to do is write a high-quality Rust runtime using the usual Rust toolchain.

Would the original C++ example makes it easier to port to other enclave based runtime?

Strictly speaking, the answer is "yes," but using the C++ SGX libraries is incredibly painful. For now, this does require users to write model harnesses in Rust, but we can automatically generate TVM RPC enclaves that can be called from Python.

cc @dmlc/tvm-team

@dingelish
Copy link

No offense.

The difference between rust-sgx-sdk and Fortanix EDP roots from basic assumptions of security. Fortanix EDP is not designed for security, so it has assumptions of trusting the OS. For example, the atomicity of EDP's SGX RwLock is untrusted. Technically, Fortanix EDP maintains an event queue in untrusted space and RwLock's atomicity roots from it -- resulting in untrusted atomicity. For other input/output, they are all interacting with untrusted input/output sources.

Are you sure you want it? Every dependency crate would potentially depends on untrusted I/O and weaken your trustworthyness.

@nhynes
Copy link
Member Author

nhynes commented Apr 5, 2019

Hey @dingelish cool to see you here! Thanks a lot for the input. One point of note for the TVM use-case is that the TVM modules' thread pool already requires cooperation from the OS. In almost all (or maybe actually all?) cases, the TVM threads operate on mutually exclusive portions of the output; the operation also doesn't return until all threads have finished their tasks. Accordingly, it's not so important that the thread pool be trusted.

TVM also doesn't make use of fs nor net, so really it's just a matter of threads.

With due respect to you and your fantastic std facade (and, trust me, there's plenty of that from my end :), the Fortanix programming model is so much simpler and more and flexible. That it is integrated as a Tier 3 target in rustc is no small factor in this proposal.

@dingelish
Copy link

dingelish commented Apr 5, 2019

Yeah I 100% trust you :)

Correct me if I'm wrong: SGX_QUEUE relies on the implementation of Mutex inside SGX. In Fortanix's solution, it's relies on WaitQueue which depends on usercall to provide its atomicity. An attacker could easily hi-jack the enclave-runner thus modify the behavior of WaitQueue then gain access to the behavior of that SGX_QUEUE and further ruins the SGX environment.

A short gif to show how an attacker cheats on Fortanix's SGX app:
https://dingelish.com/record.gif

@nhynes
Copy link
Member Author

nhynes commented Apr 5, 2019

Well, if it's any consolation, the rustc sgx thread implementation does the exact same thing as we're already doing. Might as well reduce code duplication, right?

@dingelish
Copy link

The code you referred to is exactly Fortanix's code. The JoinHandle is implemented using untrusted Mutex. I don't think their implementation could provide any trustworthiness. They bring too much uncertainty to the SGX environment. Personally, I strongly disagree with their implementation because they provide a LibOS-like Rust-SGX environment without any ability to control/audit the usercalls in compile time. It sounds like pushing the programmers to the edge of a cliff and say: you have the choice to not step forward. As an experienced researcher, you can hardly got away from the falling down because too much stuffs depends on their usercalls. The bad design is not desired by Fortanix, but a result of combining libstd to an environment without thread/fs/time/env/process/net. Similar runtimes such as webassembly are facing the same problem. I think you must know pwasm-std. Parity create this to provide a real runtime for wasm instead of using the default one -- you can open a file in a .rs file and compile it to webassembly, which would triggers a runtime panic. "if it builds, it works" is not true today, due to a bad design of libstd.

@dingelish
Copy link

The current code of SGX_QUEUE (with rust-sgx-sdk) is not depending on untrusted Mutex -- it relies on sgx_spin to provide atomicity which keeps everything inside.

@nhynes
Copy link
Member Author

nhynes commented Apr 5, 2019

sgx_spin [...] which keeps everything inside

Except for the ocall that's buried several calls deep and lives in Intel's SGX libc. I think that it's actually easier to audit the fortanix implementation since it's all in Rust and not spread across two completely separate codebases. Indeed, the usercalls are exactly equivalent to ocalls but are more observable (moreso since a user can't create their own usercalls as they could ocalls).

depending on untrusted Mutex

Right, but all the untrusted OS can do is not provide threads. That only compromises availability, and the untrusted system is always able to harm availability simply by not running the enclave. Correctness is not affected.

LibOS-like Rust-SGX environment without any ability to control/audit the usercalls

This is a fair point. Allowing a supposedly secure enclave to trivially access untrusted functions like fs, net, and time is not a good model for users who don't understand the security implications of doing so. The approach of r-s-s which makes these modules private is certainly the more explicit approach. For experienced users and library authors, however, usercalls offer greater usability.

pwasm-std

FWIW, without wasi, fs doesn't compile under wasm32-unknown-*. (sys|user)calls aside, pwasm-std unnecessarily limits itself by not including the standard library. Their focus is not security, but rather ensuring that consensus succeeds. Disallowing structures like HashMap is an oversight since serialization is, in fact, canonical. Similarly with their disallowing floats: it's totally possible to do flops if one "simply" canonicalizes the Wasm NaN representation. Of course, pwasm is unrelated to the matter at hand :)

As another point in favor of the fortanix edp, there's substantially lower overhead from ecalls/ocalls since it transparently implements switchless.

Overall, if the TVM runtime using more than just std::thread and those threads weren't embarrassingly parallel, I'd be more concerned about security. The main boons of switching are maintainability and usability.

@nhynes
Copy link
Member Author

nhynes commented Apr 10, 2019

Okay, it's been two weeks on this RFC, so I'm going to prepare a pros/cons summary of the discussion so far in preparation for further action.

Regarding the proposal to replace SGX support in TVM with Fortanix:

Pros

  • tightly integrated into Rust ecosystem, which gives support for more crates as well as a significantly simpler build process
  • better performance due to low overhead enclave entry/exit
  • better auditing of usercalls than [eo]calls; better auditing of Rust std
  • easier to build new functionality like RPCs
  • easier to debug enclaves (e.g., working backtraces, println!, panic!)

Cons

  • malicious operator could deny availability by blocking network access
  • end user needs to know that data sent outside of the enclave must go through a secure channel

I'd be glad to leave our current SGX infrastructure in place except that it adds maintenance burden as we upgrade our tvm crate. Even if the toolchain were robust enough to be tested in CI, we would still want to use Rust's own toolchain as the primary means for SGX support. Thus, unless anyone has any strong objections, we will deprecate rust-sgx-sdk in favor of x86_64-fortanix-unknown-sgx as proposed in #2885.

@dingelish
Copy link

I'm starting a next major version of rust-sgx-sdk which could be merged into Rust's std, along with several compiler changes, new features, and lint tools. One of Pre-RFCs is here.

Would you switch back to rust-sgx-sdk v2 later? If not, we'd possibly add a similar runtime "mesatee" (maybe) to tvm to support tvm in both trustzone and sgx using our SDKs.

@nhynes
Copy link
Member Author

nhynes commented Apr 22, 2019

Out of curiosity, what will be the difference between upstreamed r-s-s and fortanix sgx? The Rust libstd implementation is already preferable to the hugely complex and large TCB Intel C/C++ libs. Are you innovating on the runner?

@jethrogb
Copy link

EDP author here. Feel free to ping me any time with any security/support questions regarding Fortanix EDP. Either on GitHub, or on Slack.

Fortanix EDP is not designed for security, so it has assumptions of trusting the OS.

Both these things are completely, entirely, 100% incorrect. (@dingelish I'd be interested to know how you came to this understanding. I'd also appreciate it if you take security concerns directly to Fortanix instead of claiming incorrect things in a public forum.)

In fact, the Fortanix EDP is designed from the ground up to be the most secure and easy to use enclave interface and platform out there.

  • Unlike every single other enclave platform out there, there is no C code whatsoever in the enclave.† This protects you from dangerous memory safety vulnerabilities exposing your enclave secrets.
  • There is only a small interface with the untrusted environment that is the same for all enclaves, such that its security can be evaluated and implementation audited once instead of for each enclave separately. The security implications of every part of this interface have been carefully designed, evaluated, and tested.
  • Even if you want to use raw user memory for communication, there are user reference primitives for dealing with pointers to user space. These primitives use Rust's borrowing and ownership mechanism to avoid data races and TOCTTOU issues, and also prevent creating dangerous Rust references to user memory. However, we strongly recommend you use byte streams instead, which is why Rust's core byte stream primitive (std::net::TcpStream) is made available to developers.
  • All code that you're required to bring into the enclave (say, to get “hello world” running) is from official Rust repositories, which receive far more scrutiny from 3rd parties than external Rust forks.

For example, the atomicity of EDP's SGX RwLock is untrusted.
...
Technically, Fortanix EDP maintains an event queue in untrusted space and RwLock's atomicity roots from it -- resulting in untrusted atomicity.
...
The JoinHandle is implemented using untrusted Mutex

Wrong. All locks are protected using in-enclave state.

Yes, there is an event queue in untrusted space. But since enclave state is the source of truth for locks, if userspace decides to ignore the event queue and wake up the wrong thread, nothing bad can happen. Since the OS is already free to schedule/deschedule enclave threads at any time, this gives you the exact same security as using spinlocks, except without the CPU overhead in case locks are held for a long time.

For other input/output, they are all interacting with untrusted input/output sources.

I don't understand your point here, SGX doesn't have trusted I/O. Everyone has to do this. We recommend you use TLS to protect all enclave communications.

they provide a LibOS-like Rust-SGX environment without any ability to control/audit the usercalls in compile time. It sounds like pushing the programmers to the edge of a cliff and say: you have the choice to not step forward. As an experienced researcher, you can hardly got away from the falling down because too much stuffs depends on their usercalls. The bad design is not desired by Fortanix, but a result of combining libstd to an environment without thread/fs/time/env/process/net.

Developing for SGX (or for any environment, really) requires the programmer to have certain background knowledge. However, there is only so much the developer can actively keep track off. This is the complexity budget. Everything else that doesn't fit in this budget needs to be reviewed and audited by others.

We've clearly chosen a different trade-off here than you. I'd say reviewing your enclave codebase for usage of std::net and letting the programmer use familiar interfaces such as HTTP REST and gRPC scores far better in terms of complexity than requiring developers to learn a new language and paradigm (EDL) for describing their enclave interface.

Con: malicious operator could deny availability by blocking network access

This is not a drawback specifically of Fortanix EDP. A malicious operator can always do this for any SGX application.

Again, feel free to ping me any time with questions.

† When using panic=abort.

@jethrogb
Copy link

NB. The target is at tier 2, not tier 3.

@nhynes
Copy link
Member Author

nhynes commented May 16, 2019

Looks like we've reached consensus. Thanks @dingelish and @jethrogb for the feedback. Work will now proceed on #2885.

@nhynes nhynes closed this as completed May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants