Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extern types #1861

Merged
merged 19 commits into from
Jul 25, 2017
Merged

extern types #1861

merged 19 commits into from
Jul 25, 2017

Conversation

canndrew
Copy link
Contributor

@canndrew canndrew commented Jan 21, 2017

Add extern type declarations for declaring types from external libraries which have an unknown size/layout.

Rendered

@glaebhoerl
Copy link
Contributor

Given that this is declaring a new type and not an alias to an existing type (presumably you should be allowed to write trait impls for it, and so on), I think it should be extern struct rather than extern type.


Add a new kind of type declaration, an `extern type`:

extern type Foo;
Copy link
Member

@nagisa nagisa Jan 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diverges somewhat from extern fn where its a definition and not declaration. For consistency IMHO declarations should stay constrained to extern blocks, like below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, of course. They're two different things.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jan 21, 2017

What do size_of/align_of return for extern types?

@Ericson2314
Copy link
Contributor

@petrochenkov it's a DST, you mean {size,align}_of_val? Those shouldn't work either, see #1524 (comment).

@petrochenkov
Copy link
Contributor

Oh, right, size_of_val.
What do you mean "shouldn't work", report an error during monomorphization?
size_of_val is usable unconditionally, so it can be used with type parameters that can become extern types later.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jan 21, 2017

@petrochenkov I think will need to change #1524 (comment). I had a bunch of things to add to this RFC so I'm making a PR-on-PR; I propose making it panic for stop-gap.

@arielb1
Copy link
Contributor

arielb1 commented Jan 21, 2017

@glaebhoerl

It's sort of like a "truely" abstract type, so I think extern type is OK.

@glaebhoerl
Copy link
Contributor

But it's a nominal type that's knowably disjoint from all other types, isn't it?

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jan 21, 2017

I think the fact that it is, after all, called a "struct" in C tips me over the fence into @glaebhoerl's camp. Yes, this is used as a hack around not having real abstract types in C [and there should be a way to make the size of a concrete struct with private fields abstract in Rust!], but I'm OK with skeuomorphisms for FFI.

@petrochenkov
Copy link
Contributor

I may be a struct in C, or it may be a union, or it may be just a malloc-ated storage.

presumably you should be allowed to write trait impls for it, and so on

I wonder why would you need to do such a thing.
Extern type refers to some type on C side, you don't know anything about it, what traits can you possibly implement?

@canndrew
Copy link
Contributor Author

Thanks @Ericson2314 for some much-needed extra material.

@canndrew
Copy link
Contributor Author

Given that this is declaring a new type and not an alias to an existing type (presumably you should be allowed to write trait impls for it, and so on), I think it should be extern struct rather than extern type.

The fact they can only appear in extern blocks (where things already look wierd), and that they don't have = Something after them should help prevent them from being confused with type aliases. Besides, in some sense they are a type alias - they're an alias to some type in an external library.

I'd feel a little weird calling them structs when the type might not actually be a struct. That feels like a bit of a C-anachronism.

@canndrew
Copy link
Contributor Author

Extern type refers to some type on C side, you don't know anything about it, what traits can you possibly implement?

There's a lot of object-oriented C APIs that work with pointers to some opaque type. You could wrap these APIs and expose safe versions of the C functions as methods on the extern type.

@glaebhoerl
Copy link
Contributor

glaebhoerl commented Jan 21, 2017

I'd feel a little weird calling them structs when the type might not actually be a struct. That feels like a bit of a C-anachronism.

I think it'd be fine to just allow struct, enum, and union interchangeably fwiw.

I feel that the difference between whether it's an alias or a fresh type is much more significant than what variety of fresh type it is (especially given that the representation is hidden so that the difference is academic). type is pretty well entrenched as denoting an alias in Rust. Yes, this is an "alias to an external type" in some sense, but it's not [presumably going to be] an alias from the perspective of Rust's actual in-language semantics which is what really matters, given that it's what the Rust programmer will be dealing with.

@Stebalien
Copy link
Contributor

I really want a feature like this but have a couple of concerns about the semantics of this proposal:

  1. Personally, I wouldn't call these DSTs (dynamically sized). I'd call them unsized types. That's why we have to be careful about size_of_val and friends.
  2. They're not really extern. Unlike every other extern item, the name doesn't matter: they don't necessarily refer to some existing external item.
  3. They're useful beyond externally defined structures. For example, CStr should be defined as an unsized type (&CStr should be a thin pointer) but is currently a DST.

@Stebalien
Copy link
Contributor

An alternative is to drop the extern part and allow type MyType;:

  • type A = B; means there exists a type A that is equivalent to type B.
  • type A; means there is a type A that isn't equivalent to anything. It's not a struct, enum, union, etc. Just a type, no defined structure.

Posted separately to allow separate reactions (I like this alternative more then the proposal but I'm not entirely sure about it).

@briansmith
Copy link

The main issue I see with type A; is that it doesn't indicate explicitly that it is a FFI-compatible type or not. #[repr(C)] type A; or extern { type A; } makes it clear that it is FFI-compatible. I guess #[repr(C)] type A; is more consistent with #[repr(C)] struct A { ... } so I suggest going with that.

type A; makes the most sense of all alternatives proposed above. Using struct A; to refer to a type defined as a non-struct type in C would be confusing to newcomers, and distracting to the pedantic (e.g. me). Allowing all of enum, union, and struct means that we'd potentially need to add more such options for non-C languages (class for C++, for example, or a Haskell tuple type) or be at risk of the same kind of confusion when interfacing with non-C languages.

The RFC should be updated to specifically mention that references to such types are allowed. It should also mention that moves from these types are not allowed.

struct FfiStruct {
data: u8,
more_data: u32,
tail: OpaqueTail,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear whether you are proposing that this should be accepted by the compiler or not. I think that this is not as essential as the rest of the proposal and I'd suggest that you remove this example and specify that such types can only be used as pointer and reference types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying it should be accepted. This isn't really hard to implement - just make extern types effect the struct pointer the same way that slices and trait objects do. It's also consistent with allowing other DSTs at the end of structs. I think it should be accepted because it's a pattern I've seen a lot in C code.

Copy link
Member

@nagisa nagisa Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure seems useless to me. Consider 3 cases where FfiStruct could be used:

fn by_return() -> FfiStruct;
fn by_sret(retptr: *mut FfiStruct);
fn by_argument(arg: FfiStruct);

Now, all of these are invalid or can’t be made work:

  1. For by_return it may under covers be either a return by value or by sret pointer (analysed in next point); If its returned by value its all right. However you cannot really know if its by value or by sret without knowing the full defn' of structure;
  2. For by_sret compiler simply cannot how much of stack space to allocate for the retptr slot;
  3. For by_argument compiler cannot properly how to correctly pass such a structure to the C side (i.e. if the FfiStruct was supposed to be passed to C via registers, how many registers does the C side expect to be used?).

EDIT: only case where this could work is

fn double_indirection(retptr: *mut *mut FfiStruct)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

fn return_reference(&T) -> &FfiStruct {} 
fn consume_reference(&FfiStruct) {}

One of the main uses of this type would be to define opaque types with headers of known layouts such that they can be handled and have lifetimes managed as though they are Rust types, despite being defined in C++ land.

As an example, in the gecko codebase there is a type nsAString, which represents an abstract string. This type is an abstract base class which has (approximately) the following layout:

struct nsAString {
  const uint16_t* data;
  uint32_t length;
  uint32_t flags;
};

And has multiple subclasses, which may or may not add extra data after the above data, such as nsFixedString which has a layout like:

struct nsFixedString : public nsAString {
  uint32_t capacity;
  uint16_t* buffer;
};

In rust we can then define nsAString as:

#[repr(C)]
struct nsAString {
    data: *const u16,
    length: u32,
    flags: u32,
    _rest: OpaqueTail
}

And then we could take *mut nsAString, *const nsAString, *const nsFixedString etc. and cast them (through unsafe code) into &'a nsAString, working with them as though they were a rust object, able to directly access members from rust like length and flags, without having to worry about accidentally moving the data inside and breaking C++-defined invariants.

Currently in our rust bindings we're working around this limitation by defining #[repr(C)] struct nsAString([u8;0]); and doing casts to extract the fields from the header. You can see this here: http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/xpcom/rust/nsstring/src/lib.rs#160-163

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn return_reference(&T) -> &FfiStruct {} 

This can work, but you’ll need to either know the full type of T or get it on Rust side with *mut *mut T in the first place, basically moving the responsibility from FfiStruct directly to T.

Where’s the point of having a reference to

#[repr(C)]
struct nsAString {
    data: *const u16,
    length: u32,
    flags: u32,
    _rest: OpaqueTail
}

over a reference to

#[repr(C)]
struct nsAString {
    data: *const u16,
    length: u32,
    flags: u32,
}

So still, I’m not seeing the point of allowing such a thing, given in how many cases this cannot reasonably work in a FFI context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the current hacks look weird, and poorly approximate the true semantics to boot as @mystor shows.

@nagisa have you looked at the custom DST RFC? IMO thinking about then together helps if this seems too narrow on its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa
by_sret is intended to work. extern type pointers are "fat" pointers in the sense that they point to a DST but they're still pointer-sized (the extra metadata on the pointer is just a ()). So *mut FfiStruct is pointer-sized and ffi-safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@canndrew How? Namely, please describe how much memory would the caller need to allocate on the stack so it could produce a valid pointer to pass to the function?

@Ericson2314 yes, I’ve seen it.

Copy link

@eternaleye eternaleye Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa: The entire point of this RFC is that no Rust code can possibly create such a value on the stack - it can only obtain pointers to such values, and only from foreign (or unsafe pointer-casting) code.

It is meant almost exactly to mimic the C/C++ notion of an "incomplete type" - which cannot be allocated on the stack in C/C++ either, but can be pointed/referred to.

Copy link
Contributor Author

@canndrew canndrew Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa 32 bits on a 32 bit system, 64 bits on a 64 bit system? I'm not sure I understand the question.

Edit: Oh I get it. Yes, what @eternaleye said.

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Jul 12, 2017
@rfcbot
Copy link
Collaborator

rfcbot commented Jul 22, 2017

The final comment period is now complete.

@liigo
Copy link
Contributor

liigo commented Jul 22, 2017

In Rust, pointers to DSTs carry metadata about the object being pointed to. For strings and slices this is the length of the buffer, for trait objects this is the object's vtable. For extern types the metadata is simply (). This means that a pointer to an extern type is identical to a raw pointer.

Saying that it is not 'fat pointer' is more clear for me.

@canndrew
Copy link
Contributor Author

@aturon So we're sweet to merge this baby yeah?

@@ -64,7 +64,7 @@ These types are FFI-safe. They are also DSTs, meaning that they do not implement
In Rust, pointers to DSTs carry metadata about the object being pointed to.
For strings and slices this is the length of the buffer, for trait objects this is the object's vtable.
For extern types the metadata is simply `()`.
This means that a pointer to an extern type is identical to a raw pointer.
This means that a pointer to an extern type is identical to a raw pointer (ie. it is not a "fat pointer").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to just say it is not a fat pointer and not even bother bringing up raw pointers because raw pointers can be fat too! *const [T] is fat for example despite being a raw pointer.

@aturon
Copy link
Member

aturon commented Jul 25, 2017

Huzzah! This RFC has been merged! Tracking issue.

One is, we define the type as an uninhabited type. eg.

```rust
enum MyFfiType {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still in the RFC but wrong as it causes UB. See https://doc.rust-lang.org/nomicon/ffi.html#representing-opaque-structs and https://www.reddit.com/r/rust/comments/a3bf2p/patterns_of_refactoring_c_to_rust_the_case_of/eb5wxtw

This should probably also be fixed in the RFC text for people who look here because they want extern type but can't use it yet because it's not stable and need a workaround.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the UB in *mut MyFfiType?

Copy link
Member

@RalfJung RalfJung Dec 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ever doing *ptr is UB. This includes &*ptr as *const MyFfiType.

See https://doc.rust-lang.org/nightly/nomicon/ffi.html#representing-opaque-structs for what is currently recommended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dst Proposals re. DSTs A-repr #[repr(...)] related proposals & ideas A-typesystem Type system related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.