- Start Date: 2014-03-14
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)
Extending the DST proposal, add the concept of "fat objects" to the language as an alternative to fat pointers. These are DST objects that carry their extra information (i.e. vtable or size) right before their content, just like C++ objects do. Despite being dynamically sized, because they carry this extra info with them they can be pointed to by plain old "thin" pointers. Combining this with any of the proposals for structural inheritance would solve for the requirements that led to the virtual methods proposal, without having to introduce any additional parallel dynamic method dispatch systems.
The following requirements were given in the virtual methods RFC:
Supporting efficient, heterogeneous data structures such as the DOM. Precisely we need a form of code sharing which satisfies the following constraints:
- Cheap field access from internal methods;
- Cheap dynamic dispatch of methods;
- Cheap downcasting;
- Thin pointers;
- Sharing of fields and methods between definitions;
- Safe, i.e., doesn't require a bunch of transmutes or other unsafe code to be usable.
The goal is to meet all of the requirements listed above, while still respecting the following constraints that the addition of virtual
into the language would violate:
- Keep only one dynamic method dispatch system in the language, rather than two parallel ones
- Continue to allow for the logical separation of data layout in memory (structs) from the dynamic method interfaces (traits), even in situations where thin pointers are needed
This RFC assumes that dynamically sized types, as described in the RFC-as-blog-post DST, Take 5 by @nikomatsakis, will be added to the language. Refer to that post for details about this, as it will not be re-described here. The remainder of this RFC uses "DST, Take 5" as its baseline.
The following features are then added on top of this:
- Fat Objects
- Single Inheritance
- Downcasting
Each of them are fairly orthoganal to each other (so in theory each should be in their own RFC) but only together do they meet the listed requirements.
The focus of this RFC is on the fat objects. The others are variations on ideas that have been well discussed elsewhere. But all of them are needed to get to the end goal, so they are incorporated here for illustrative purposes.
The DST proposal defines a conceptual type-level function called Fat
that defines how to create a dynamically sized object type from a concrete type, based on the relation Fat(T as U) = v
, where v
is the additional data needed to be able to make use of a pointer to U
constructed from a T
. It shows how to create fat pointers (*U, v)
given that relation. In the descriptions below, Fat(T as U) = v
has the same meaning it does in the DST proposal.
For any T
, U
and v
such that Fat(T as U) = v
, it is possible to create a "fat object" which carries the extra word along with it: (v,T)
. For traits, this layout is the same as that of a C++ object that starts with its vtable.
Syntactically, this is expressed as if it were a generic struct that can take either one or two type parameters:
Fat<U,T>
denotes a fat concrete object type. In memory it is laid out as if it were the tuple(v,T)
for a specificT
andU
whereFat(T as U) = v
.T
must be a concrete type, andU
must be an existential variant of that type. So eitherT
is[Elem, ..n]
andU
is[Elem]
, orU
is a trait implemented by the concrete typeT
.Fat<U,T>
is not dynamically sized, so it can be used anywhere a normal statically sized type can--as a variable or parameter, inside an array, etc. It implementsDeref<T>
andDerefMut<T>
, which obtain borrowed pointers to the plainT
object. It also has a methodunwrap() -> T
that discards the extra word and leaves a plainT
. There is no automatic implementation added such thatFat<U,T>
implements traitU
, but auto-deref should allow you to call methods of any trait implemented byT
(including but not limited toU
).Fat<U>
is a dynamically sized fat existential type.U
must be a dynamically sized type. Just like a plainU
is a stand-in for anyT
whereT as U
,Fat<U>
stands in for anyFat<U,T>
whereT as U
. Since it is dynamically sized,Fat<U>
is only usable in the places where other DSTs can be used. But unlike other DSTs, pointers toFat<U>
are plain old thin pointers. When the object is used, the required extra word can be retrieved from its place right before the object itself.
A fat object can only be fat for one trait, no matter how many traits the type actually implements.
An intrinsic function fat
is added to std
to create fat objects:
fn fat<unsized U,T>(t:T) -> Fat<U,T>
It is allowed when U
is a trait implemented by T
, and when T
is a fixed length array and U
is the corresponding unknown-length array of the same type.
The same pointer conversion rules laid out in the DST proposal between T
and U
also apply to Fat<U,T>
and Fat<U>
. The only difference is that the pointers to Fat<U>
are still thin pointers, despite Fat<U>
being dynamically sized. So a &Fat<U,T>
can be automatically converted to a &Fat<U>
, and these pointers have the exact same representation.
For structs that are themselves dynamically sized because they end with a DST, "fatness" propagates up the chain. Using the RcData
example from the DST propsal, RcData<Fat<U>>
would be considered a fat DST, and therefore the pointer inside a Rc<Fat<U>>
would be a thin pointer.
struct MyStruct {
a:int
}
impl MyStruct {
fn new() -> MyStruct { MyStruct }
fn bar(&self) { }
}
trait MyTrait {
fn foo(&self) -> int
}
impl MyTrait for MyStruct {
fn foo(&self) -> int { self.a }
}
// The size of this struct is the size of 2 pointers, not 4.
struct HasABunchOfPointers<'a>(&'a Fat<MyTrait>, ~Fat<MyTrait>);
struct DynamicallySizedWrapper<unsized U> {
some_stuff: uint,
data:U
}
fn myfunc() {
let thin_owned_trait_pointer = box fat(MyStruct::new()) as ~Fat<MyTrait>;
let thin_borrowed_pointer = &*thin_owned_pointer; // inferred type is &Fat<MyTrait>
thin_borrowed_pointer.foo(); // Dynamically dispatched, uses the vtable found through the pointer like C++
let fat_stack_object = fat::<MyTrait>(MyStruct::new()); // inferred type is &Fat< MyTrait,MyStruct>
fat_stack_object.foo(); // Sugar for (*fat_stack_object).foo() via autoderef
let plain_object = fat_stack_object.unwrap();
let also_a_fat_object = DynamicallySizedWrapper{some_stuff:0, data: fat::<MyTrait>(plain_object)};
let also_a_thin_pointer = &also_a_fat_object as &DynamicallySizedWrapper<Fat<MyTrait>>;
}
From the motivating requirements, we now have thin pointers, cheap dynamic dispatch of methods, and cheap field access from internal methods. This has the same number of pointer indirections as the virtual fn
proposal, but keeps dynamic dispatch unified in one place: traits.
Single inheritance is necessary to meet all the listed requirements, but is not the focus of this RFC. Another RFC should focus on choosing the exact form of it. For the purposes of the examples below I'll assume the version of single inheritance proposed in the @nikomatsakis blog post from October 2013. This RFC is not advocating for this over any of the other proposed forms of single inheritance--others, such as traits with fields or just so I can use it to show that combining it with fat objects meets the requirements that inspired the virtual struct
proposal.
An intrinsic function downcast<Derived,Base>
is added to std
. It looks like this:
fn downcast<Derived,Base>(b:Base) -> Option<Derived>;
It has some special compiler checks to ensure that at each call site, the type system allows for the possibility that the downcast could succeed. A call to downcast
is allowed at compile time if and only if Derived
is coercible to Base
. It succeeds at run time and returns Some
if the actual run-time type of b
is coercible to Derived
. Otherwise it returns None
.
This should work for both the fat and thin variants of DST pointers. The vtable or size can be accessed at run time, either as part of the fat pointer or by dereferencing the thin pointer.
The DOM is how the requirements list came about, so here's a simplified DOM-like tree using the features above. The numbers in brackets show a place where each requirement from the motivation section is demonstrated:
struct NodeFields<'a> {
// Three thin pointers and a vector of thin pointers: [4]
parent:Option<&'a Fat<Node>>,
left_sibling:Option<&'a Fat<Node>>,
right_sibling:Option<&'a Fat<Node>>,
children: Vec<&'a Fat<Node>>
}
impl NodeFields {
// Methods defined here are available to anything that extends NodeFields, are statically
// dispatched, and cannot be overridden
fn get_grandparent(&self) -> Option<&'a Fat<Node>> {
self.parent.and_then(|p| p.parent)
}
}
trait Node<'a>: NodeFields<'a> {
// Methods defined here have access to the fields and methods of NodeFields via `self` [5]
fn frobify_children(&self) {
// .children is a static offset field access--no extra pointer indirection [1]
for node in self.children.iter() {
node.frobify(); // dynamic method dispatch [2]
}
}
fn frobify(&self);
}
struct Element<'a>: NodeFields<'a> {
name: ~str,
}
impl<'a> Node<'a> for Element<'a> {
fn frobify(&self) {
println!("Frobifying a <{}>", self.name);
}
}
// Yes, I know in the modern actual DOM an attribute is not a Node...but this makes for a
// nifty simple example
struct Attribute<'a>: NodeFields<'a> {
name: ~str,
value: ~str
}
impl<'a> Node<'a> for Attribute<'a> {
fn frobify(&self) {
// We know only elements have attributes as children, so we can downcast [3]
let parent_element:&Element = downcast(parent.unwrap()).unwrap();
println("Frobifying the attribute of {}: {}={}", parent_element.name, self.name, self.value);
}
}
// No `unsafe` code was needed for any of the above [6]
Here is jdm's servo example translated into rust+this proposal:
struct NodeFields {
parent: Rc<Fat<Node>>,
first_child: Rc<Fat<Node>>
}
impl NodeFields {
fn as_text_node<'a>(&'a self) -> &'a TextNode {
assert!(/*sensible check*/);
downcast(self)
}
}
trait Node:NodeFields {
fn as_element<'a>(&'a self) -> Option<&'a Element> {
None
}
}
struct TextNodeFields: NodeFields;
trait TextNode: Node {
// ...
}
impl TextNode for TextNodeFields {
// ...
}
struct ElementFields: NodeFields {
attrs: Vec<(~str,~str)>
}
trait ElementStaticMethods:ElementFields {
fn set_attribute(&mut self, key:&str, value:&str);
}
impl<T:Element> ElementStaticMethods for T {
fn set_attribute(&mut self, key:&str, value:&str) {
self.before_set_attr(key,value);
// ...update attrs...
self.after_set_attr(key,value);
}
}
trait Element: ElementFields Node {
fn before_set_attr(&mut self, key:&str, value:&str) {
// Implementation goes here--jdm's gist does not give an implementation
// but implies it exists (the method is not pure virtual)
}
fn after_set_attr(&mut self, key:&str, value:&str) {
// Implementation goes here--jdm's gist does not give an implementation
// but implies it exists (the method is not pure virtual)
}
}
impl<T:Element> Node for T {
fn as_element<'a>(&'a self) -> Option<&'a Element> {
Some(self as &Element)
}
}
struct HTMLImageElement: ElementFields;
impl Element for HTMLImageElement {
fn before_set_attr(&mut self, key:&str, value:&str) {
if key == "src" {
//..remove cached image with url /value/...
}
// Should specify that this works...wouldn't in current rust, but would
// it under the UFCS proposal?
Element::before_set_attr(self,key, value);
}
}
struct HTMLVideoElement: ElementFields {
cross_origin:bool
}
impl Element for HTMLVideoElement {
fn after_set_attr(&mut self, key:&str, value:&str) {
if key == "crossOrigin" {
self.cross_origin = (value == "true");
}
Element::after_set_attr(self,key,value);
}
}
fn process_any_element(element:&Element) {
//...
}
// Code outside of a function in jdm's example:
fn more_code() {
// We assume we need videoElement to be a fat object for the trait Element, e.g.
// because it would need to be stored in another element.
let videoElement:Rc<Fat<Element, HTMLVideoElement>> = Rc::new(fat(/*...*/));
process_any_element(&**videoElement);
// We are cloning the Rc, not the node itself
let node = videoElement.first_child.clone();
match node.as_element() {
Some(element) => {
// do things with element...
}
None => {
let text = node.as_text_node();
// ...
}
}
}
Some comments on this:
Note that for functions that take borrowed pointers as parameters, we still use fat pointers. This is more flexible for the caller, as it allows the method to be called using both fat and thin pointers. The cost of the virtual call will be roughly the same either way. If the method needed a fat object, it would have to take a thin pointer to it as a parameter instead (e.g. fn process_any_element(element:&Fat<Element>)
). I would expect this would mostly appear methods that construct data structures that have thin pointers inside them.
The annoying part is having to create extra types that don't mean anything. You don't just have Element
, you have ElementFields
and ElementStaticMethods
, even though the second and third of these aren't really useful. (ElementStaticMethods
is here so that a call to set_attribtue
is not itself virtual, but can make virtual calls to before_set_attribute
and after_set_attribute
.) Choosing a different arrangement for struct inheritance could alleviate this--traits with fields may come out as a better complement here. (Possibly rewrite this RFC and choose that alternative instead?) With this example, at least the leaves of the inheritance tree (HTMLImageElement
and HTMLVideoElement
) can simply be structs and are expressed in a more straightforward way.
The main alternative to this is virtual struct
and virtual fn
. Mere mention of this in the work week meeting notes generated a lengthy email thread about that idea. The motivation for this RFC was to present an alternative to this.
After this, @bill-myers proposed a completely different alternative involving extending the notion of an enum
to turn it into a syntax to describe an object hierarchy. @nick29581 filed what he described as an extension/variant of this, which keeps the extension of enums, but then adds back virtual
on top of it.
Some brief thoughts on reddit (with a link to some discussion on irc) by @eddyb talk about something very similar to the fat objects presented here. Like this proposal, it allows for either thin or fat objects, but instead of choosing between thin and fat for each object, it would be chosen for each struct once at the struct's declaration. This could happen either through a special Vtable<Trait>
type (added explicitly to the struct), or via an attribute on the struct. He says in the comment that he may be working on an RFC, so we may find more details about this soon.
There are multiple existing proposals floating around for how to implement single inheritance. Alternatives to it include letting traits specify fields, and Coercible and HasPrefix traits. Any of these would, in combination with the fat objects and downcast proposals in this RFC, fulfill the noted requirements. The HasPrefix
/Coercible
trait proposal in particular may be quite appealing when combined with this--it would allow you to write syntactically the constraints on downcast
, for instance (fn downcast<Derived,Base:Coercible<Derived>>(b:Base) -> Derived
). But for the purposes of this RFC I chose a syntactically simpler version instead because I didn't want to focus on it.
Intrinsics are added to std
but the exact path is not specified.
I'm sure there are other implementation concerns and edge case questions I haven't thought about.
This RFC has not addressed exactly how, in an overridden method, a parent method can be invoked from an overridden one. The RFC for unified function call syntax could hopefully address this in a general way, for all traits rather than just structural inheritance. (It would be good to be able to call the default impl from any trait when overriding it.)