-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Make a table of trait object type_ids and vtable pointers available to programs #66113
Conversation
r? @estebank (rust_highfive has picked a reviewer for you, use r? to override) |
@rust-lang/compiler @rust-lang/lang |
Thanks for moving the idea forward! I'm curious about the safety of this approach: What would happen if a binary tries to deserialize a trait object sent by a binary of a different version? (Or built with different flags?) What kind of safeguards are in place? The semantics of type IDs themselves are a bit underspecified for this. Is it possible to get false positives where one binary shares the same type ID as another binary yet are actually different types? |
Thanks for your feedback @Rufflewind! I made a correction to the example deserialization in my first post. As long as a vtable pointer is known to belong to The intent of embedding a mapping of type_id to vtable pointers is to provide this knowledge to programs; such that they can look up for a given The approach I envisage for the first time a particular
If a program is deserializing The probability of at least one collision, and thus the probability of a vulnerable binary, can be calculated with a generalisation of the birthday problem. Still, this can and should be fixed, and the approaches sounded out in #10389 do fix this. Regarding inter-binary concrete |
☔ The latest upstream changes (presumably #66175) made this pull request unmergeable. Please resolve the merge conflicts. |
This is not feature gated. |
Some initial thoughts: I think that this change is too large to land in an informal PR like this. This is specifically creating groundwork that crates will come to rely on -- nightly crates, but still. That means we'll have to support this mechanism in some form going forward, and I wouldn't want to create that dependency without a plan to "complete" the work. I also think that there are a lot of moving parts here, and this design is too complex to move forward without a proper discussion. Ultimately I think the right venue would be something more like the ffi-unwind project group that we are working on. Basically, get some people together to explore the problem and its complications and propose solutions. But such an effort needs to be shepherded in conjunction with the lang team, I think, and that will require some bandwidth -- I'm not sure if we have it right now or not. (This is also one of those issues that falls a bit outside the "core focus areas" of most of the lang team, but I think that can be addressed by incorporating people like @alecmocatta or others as the shepherds and key members.) |
Okay thanks @nikomatsakis that sounds like a good plan of action. I've gone ahead and created a place to collaborate on an RFC: https://github.com/alecmocatta/trait-object-deserialization. I invite anyone with an interest or any insight to feel free to contribute to it! I'll input to it myself over the coming weeks. I think the areas that currently need to be explored are:
Please feel free to nudge me in the right direction if I'm getting the wrong end of the stick! |
(removing nomination label; the discussion required here is too large for the triage meeting.) ((update: oops, sorry, I shouldn't have removed nomination without waiting for lang team meeting first...)) |
Link is broken? |
@Rufflewind Thanks, fixed! |
This prior art may be of relevance (see Section 5): http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf |
Marking this as blocked on the rfc |
Since this is blocked on an rfc which will take some time to go through the process, i'm closing this PR. We can re-open it or preferably start a new PR once that's done :) Thanks for contributing |
This PR aims to enable safe and sound serialization and deserialization of trait objects on tier 1 platforms.
Some context: The ability to serialize and deserialize trait objects, or similarly closures, has been requested by various members of the community for some time: rust-lang/rfcs#668 rust-lang/rfcs#1022. The two main use cases are IPC – for example sending trait objects between forks of a process; and distributed computing – for example sending trait objects to processes distributed across a cluster. While some message kinds can be wrapped in an enum instead of a trait object, this can be inconvenient, and isn't viable for un-nameable types like closures. The goal of this PR is to lay the groundwork for safe and sound serialization and deserialization of trait objects in a low-impact manner that allows user crates to safely experiment and iterate on this functionality.
Deserializing trait objects is possible today, but only in a rather hairy manner. The crate
serde_traitobject
for example works and sees usage, but it has two unsoundness vectors:The latter requires a very odd linker script and is unlikely to occur in practise; the former is concerning and rules out many use cases. It is however sufficient for other use cases – in fact a really cool one using it was published recently:
native_spark
.The approach I've taken in this PR is to create essentially a global array with appending linkage that stores structs comprising the
type_id
of the trait object and a pointer to the vtable, for all materialized vtables. Unfortunately appending linkage isn't really implemented by LLVM yet besides for a couple of special variables, so I've used the old-school approach of emitting static variables into an orphan section. The start and end of this section can be reliably retreived by user programs with help from the linker.This array increases the size of libraries by a small amount; for example build/x86_64-apple-darwin/stage2 by ~1.3MiB i.e. ~0.2%. Thanks to
--gc-sections
on Linux and/OPT:REF
on Windows, it's removed entirely from binaries that don't use it on those platforms. macOS's-dead_strip
works a little differently, and the best solution I've found adds 16 bytes per used vtable to the resulting binary. This increases the size of binaries by a small amount: hello world grows by 76 bytes i.e. ~0.03%. With a bit more work I think this could probably be made zero-cost similar to Linux and Windows. Android and iOS behave the same as Linux and macOS, and on other platforms this PR is a no-op.This array allows programs to retrieve a list of all materializable vtable pointers for a given
dyn Trait
. From this, a candidate concrete type for serialization can be selected and safely invoked. Here's a rough sketch of how serialization and deserialization can work:fn type_tag()
here could betype_id()
, which would work for unnameable types like closures but doesn't work across different builds where thetype_id
has changed due to potentially unrelated changes. It could be provided explicitly by the program (i.e. liketypetag
), which would not work for unnameable types but would work across different builds. It could also be a combination of both – explicit tags where provided, falling back totype_id
.I believe this is a step in the right direction to enabling sound trait object serialization and deserialization. It resolves the aforementioned security issue that exists in applications being used today, it enables user crates to safely iterate on the user interface for trait object deserialization, and it can hopefully co-evolve with the unsafe code guidelines and other RFCs like rust-lang/rfcs#2580 to increase the strength of its guarantees.
An upside of this PR is safe and sound serialization and deserialization of closures in conjunction with
serde_closure
- which is a proc macro that extracts captured variables from the AST and puts them in a struct that implements Serialize, Deserialize, Debug and other convenient traits. This is the major use case for distributed computing frameworks likeconstellation
andnative_spark
.I'd like to add a test that checks the array is removed by the linker on Linux and Windows if it's not used, but I'm not familiar with how to do this?
Let me know how best to feature gate it. Or if it perhaps doesn't need it – doing anything useful with the array is already unstable (i.e. transmuting
dyn Trait
<->std::raw::TraitObject
to get/set the vtable). Per the RFC guidelines as this should be "invisible to users-of-rust" I've gone ahead with this PR – let me know if it does indeed need an RFC. And lastly it's probably worth confirming it is indeed "invisible to users-of-rust" with a crater and/or perf run?