Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crABI v1 #3470

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

crABI v1 #3470

wants to merge 5 commits into from

Conversation

joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Aug 11, 2023

Note that the most eminently bikesheddable portion of this proposal is the
handling of niches, and the crABI Option and Result types built around
that. There are multiple open questions specifically about that.

I've also listed an open question about how to represent owned crABI
pointers as Rust types: Box<T> versus Box<T, NoDeallocate> versus
Box<T, FFIDeallocate<obj_free>>.

Rendered

@joshtriplett joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Aug 11, 2023
@Lokathor

This comment was marked as resolved.

@programmerjake
Copy link
Member

programmerjake commented Aug 11, 2023

for Box<T, A>, we could introduce a new trait:

// in std
pub trait BoxDrop<T: ?Sized>: Sized {
    fn box_drop(v: Pin<Box<T, Self>>);
}

impl<T: ?Sized, A: Allocator> BoxDrop<T> for A {
    #[inline]
    fn box_drop(v: Pin<Box<T, Self>>) {
        struct DropPtr<T: ?Sized, A: Allocator>(*mut T, A, Layout);
        impl<T: ?Sized, A: Allocator> Drop for DropPtr<T, A> {
            #[inline]
            fn drop(&mut self) {
                if self.2.size() != 0 {
                    unsafe { self.1.deallocate(NonNull::new_unchecked(self.0).cast(), self.2) }
                }
            }
        }
        let l = Layout::for_value::<T>(&v);
        let (p, a) = Box::into_raw_with_allocator(unsafe { Pin::into_inner_unchecked(v) });
        let v = DropPtr(p, a, l);
        unsafe { v.drop_in_place() }
    }
}

// the standard Box type
pub struct Box<T: ?Sized, A: BoxDrop<T> = Global>(...);

// replacement version of Drop for Box
impl<T: ?Sized, A: BoxDrop<T>> Drop for Box<T, A> {
    #[inline]
    fn drop(&mut self) {
        A::box_drop(unsafe { ptr::read(self) }.into_pin())
    }
}

usage demo:

pub struct FooDropper;

impl BoxDrop<Foo> for FooDropper {
    fn box_drop(v: Pin<Box<Foo, FooDropper>>) {
        drop_foo(v);
    }
}

extern "crabi" {
    pub type Foo;
    pub fn make_foo() -> Pin<Box<Foo, FooDropper>>;
    pub fn drop_foo(v: Pin<Box<Foo, FooDropper>>);
}

@joshtriplett
Copy link
Member Author

@programmerjake That's an interesting alternative!

Is there a way, rather than having to implement a trait, to instead have a single type parameterized with a function type?

@programmerjake
Copy link
Member

Is there a way, rather than having to implement a trait, to instead have a single type parameterized with a function type?

I thought about it, but it's very annoying to give function types a name, since you currently have to use TAIT:

struct FnDropper<T: ?Sized, F: Fn(Pin<Box<T, FnDropper<T, F>>>)>(F);

type FooDropFn = impl Fn(Pin<Box<Foo, FnDropper<Foo, FooDropFn>>>)
extern "crabi" {
    type Foo;
    fn make_foo() -> Pin<Box<Foo, FnDropper<Foo, FooDropFn>>>;
    fn drop_foo(v: Pin<Box<Foo, FnDropper<Foo, FooDropFn>>>);
}
#[defining(FooDropFn)]
fn _f() -> FooDropFn {
    drop_foo
}

@programmerjake
Copy link
Member

programmerjake commented Aug 11, 2023

maybe better usage demo:

// std API
pub struct FFIDropper;
// like C++ unique_ptr but where deleter is defined by T
pub type FFIBox<T: ?Sized> = Box<T, FFIDropper>;

// user API, this impl could easily just be a
// #[box_drop = drop_foo] proc-macro annotation on Foo
impl BoxDrop<Foo> for FFIDropper {
    fn box_drop(v: Pin<FFIBox<Foo>>) {
        drop_foo(v);
    }
}

extern "crabi" {
    pub type Foo;
    pub fn make_foo() -> Pin<FFIBox<Foo>>;
    pub fn drop_foo(v: Pin<FFIBox<Foo>>);
}

@programmerjake
Copy link
Member

programmerjake commented Aug 11, 2023

lots more discussion about BoxDrop and FFIBox and stuff here: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/BoxDrop.20proposal/near/383840871

@joshtriplett
Copy link
Member Author

@programmerjake I attempted to partially summarize that proposal in the alternatives section. I do agree that if we accepted that general proposal, it makes sense to use it for the specific case of crABI's handling of Box.

@EdorianDark
Copy link

Provide the initial version of a new ABI and in-memory representation supporting interoperability between high-level programming languages that have safe data types.

Are there other languages interested in this proposal? Or is the target to enable an ABI between Rust code?

@Araq
Copy link

Araq commented Aug 12, 2023

There is interest from Nim (disclaimer: I'm Nim's BDFL). But for Nim it would be really nice if "bit flags" which Nim maps to its set construct could become part of the spec. (Rust can only do the terrible low level bitwise operations here.)

Also quite discouraging is the lack of Swift support, IMO. A common ABI for subsets of Nim, Rust, Swift and C++ seems quite feasible.

@joshtriplett
Copy link
Member Author

Part of the goal here is to have a baseline level of support from any language that speaks C FFI, which then means that any language with a C FFI immediately has the ability to interoperate with crABI. Everything beyond that is then about the convenience of native support (whether language or library), rather than about whether it's supported or not. So, anything supported would need to map to an underlying C data type that can be passed through the C ABI.

(I do expect that eventually we'll want to support a full object/trait protocol, but I'm trying to get there incrementally rather than trying to do it all at once. The initial round of crABI support is optimizing for ease of initial support/adoption, rather than completeness.)

@Araq This mechanism: https://nim-lang.org/docs/manual.html#set-type-bit-fields ? (Verifying: does Nim support bitfields wider than a base integral data type, or do they have to fit in a base integral data type?) That seems like a reasonable data type to support cross-language. At a minimum, that seems like an ideal candidate for crABI v1.1 (which I expect to follow closely on the heels of crABI v1.0).

@Araq
Copy link

Araq commented Aug 13, 2023

@Araq This mechanism: https://nim-lang.org/docs/manual.html#set-type-bit-fields ?

Correct.

(Verifying: does Nim support bitfields wider than a base integral data type, or do they have to fit in a base integral data type?

It supports bitsets wider than any integral data type indeed. But an ABI could limit it to an integral type.

- `Option<bool>` is passed using a single `u8`, where 0 is `Some(false)`, 1 is
`Some(true)`, and 2 is `None`.
- `Option<char>` is passed using a single `u32`, where 0 through `0xD7FF` and
`0xE000` through `0x10FFFF` are possible `char` values, and `0x110000` is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility that unicode could expand the range of valid code points to include 0x110000? If so would u32::max() or one of the surragate pair numbers be better?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility that unicode could expand the range of valid code points to include 0x110000?

Rust assumes it will never happen. That is not the choice I would make, but it is the choice that Rust makes. The Unicode Consortium supposedly will never do this, and it will cause breakage if they do, but of course it is not literally impossible that they might reverse their commitment not to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the way UTF-16 is designed, it's literally impossible to encode unicode scalar values >= 0x110000. UTF-8/32 have ways to encode those out-of-range values but unicode almost certainly never will use them because they want UTF-16 to keep working since it's used soo many places: Win32, Java, JavaScript, C#, VB.net, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

char is a Unicode Scalar Value:
https://www.unicode.org/glossary/#unicode_scalar_value

Unicode scalar values outside the defined range are specifically forbidden by the Unicode Standard:

Ill-formed: A Unicode code unit sequence that purports to be in a Unicode encoding form is called ill-formed if and only if it does not follow the specification of that Unicode encoding form.

Any code unit sequence that would correspond to a code point outside the defined range of Unicode scalar values would, for example, be ill-formed.

https://www.unicode.org/glossary/#ill_formed_code_unit_sequence

(So expanding the range would be a breaking change for a lot of existing Unicode software.)

Copy link
Contributor

@tgross35 tgross35 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason 0x110000 is used for the niche rather than u32::MAX for Rust in general?

- Once `extern "C"` supports C-compatible handling of `u128` and `i128`,
`extern "crabi"` should do the same.

- Extensible enums. To define types that allow for extension, crABI would
Copy link

@tmccombs tmccombs Aug 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this, there are probably situations where it would be useful to declare a size and/or alignment of an enum to be larger than necessary, for future compatibility to allow adding new data that would otherwise change the ABI.

For a struct this can be done with a padding field, but that is somewhat difficult to do for an enum. I suppose you could make an unused, hidden variant with a value of the right size and alignment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Padding fields are not ideal either as they can end up in JSON or toString representations too easily where they are noise at best and a bug at worst.

Copy link

@daira daira Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the same issue: we have found it difficult to safely define a Rust type corresponding to a C or C++ enum, because an out-of-range representation of a Rust enum is immediate UB, unlike in C/C++. We had to resort to using a repr(C) or repr(transparent) struct on the Rust side in order to be able to gracefully handle errors on the C/C++ side.

Example: https://github.com/zcash/zcash/blob/2112e467ee31ea95cf81904a6aae397fa3d031ae/src/rust/src/zip339_ffi.rs#L11-L37

and the corresponding C/C++ type:
https://github.com/zcash/zcash/blob/2112e467ee31ea95cf81904a6aae397fa3d031ae/src/rust/include/rust/zip339.h#L17-L38

(Rust here is stricter than C++17. In the latter, casting an integer outside the range of the enumeration values to the enum type is UB, but it is not UB to cast a value to the enum type that is within range but not one of the defined values.)


- Is there a better way we can handle tuple types? Having to use a distinct
syntax like `cr#()` is onerous; one of the primary values of tuples is
brevity. In the future, if we have variadic generics, we could potentially

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, converserly, variadic generics could build on the crabi tuples. There is definitely some overlap here, since having a tuple with well defined structure, with fields in the declaration order is also useful for using recursion to process the first element of a tuple, and pass the rest of the tuple to next iteration.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you note, there is some overlap, in that both crabi tuples and 'a tuple where you can borrow the tail by reference' (which could potentially used for variadic generics) need the fields to be stored in order.

However, in other ways they have opposite goals. 'A tuple where you can borrow the tail by reference' would likely need to store, e.g., cr#(u8, u8, u32) like cr#(u8, (u8, u32)). First, this would be less efficient due to additional padding introduced. While crABI does already sacrifice layout efficiency to some extent, by giving up reordering and some niche optimizations, that doesn't mean it doesn't care about performance, and this would be an unnecessary extra loss. Second, given crABI's goal of being easy to bind to other languages, it's beneficial for cr#(u8, u8, u32) to translate directly to the obvious C struct equivalent (struct { uint8_t a; uint8_t b; uint32_t c; }), rather than to some other structure.

@tmccombs
Copy link

Given that this is intended for cross-languange interfaces there should probably be a formal, mostly language agnostic specification of the ABI. Should this RFC discuss where that specification should go, and how it will be created? Will maintainers from other languages (such as nim) be involved in that process?

@comex
Copy link

comex commented Aug 13, 2023

I suggest adding to the RFC that crABI will not initially define a stable LLVM CFI mangling (see also #3296, ping @rcvalle).

For an example of where this would be an issue, it's one thing to say that (quoting the RFC):

extern "crabi" fn func(buf: &mut [u8]);

is equivalent to:

struct u8_slice {
    uint8_t *data;
    size_t len;
};
extern void func(struct u8_slice buf);

But under LLVM CFI, if func ends up being turned into a function pointer, it will get a hash based on the Itanium C++ name mangling of the function signature, which has to match between caller and callee ends. On the C side, it would be mangled based on the actual name of the struct (in this case, u8_slice). On the Rust side, slices are currently mangled as vendor extended types, which can't be expressed in C or C++.

One potential solution is to define a standard C++ equivalent name for CFI purposes, e.g. [u8] could be mangled as if it were a C++ type named ::rust::slice<u8>. That would help when binding to C++, but not when binding to C, let alone other languages.

Another potential solution is to define a standard C struct name (say, _rust_slice_u8), but that's just really gross (how would it deal with more complex types? manually writing out the mangling in the struct name?), and also wouldn't work well with C++.

I think a better approach is to add an attribute to Clang to customize the CFI mangling for a C struct definition. After all, I'm not aware of any other implementations of LLVM CFI besides rustc and Clang, so it's not critical for us to fit into the existing constraints. We could then either keep using vendor extended types or go with the C++ equivalent name; which approach works better would depend on the design of said attribute.

But I'm not volunteering to send a patch to Clang; I think that onus is on anyone who wants to use crABI in the context of CFI-protected (C/C++)-Rust interop.

If that doesn't happen soon, though, then what? Clearly there's no need to block anything about crABI not related to CFI. The question is whether it should block stabilizing the LLVM CFI mangling. There's an argument that it shouldn't: we could stabilize the CFI mangling as-is and it would still be useful to protect Rust-to-Rust calls. But given the desire to interoperate well with other languages, I think it would be better to wait on stabilizing the CFI mangling until we have a full answer for how it will interop with Clang.

@Araq
Copy link

Araq commented Aug 14, 2023

Please leave name mangling unspecified until interop between different languages has been established in some prototypes. It also does not have to be part of crABI at all and could be a different spec altogether.

text/3470-crabi-v1.md Outdated Show resolved Hide resolved
- Should we provide *fewer* niche optimizations? Those for `NonZero` and
reference types provide obvious value; are those for `bool` and `char` really
useful enough to justify the special case in languages that will have to
handle them explicitly rather than automatically?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen Option<bool> in many APIs, but some string APIs use Option<char>. Is there a way to count how popular these types are in libraries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +319 to +347
## `[T; N]` - fixed-size array by value

crABI supports passing a fixed-size array by value (as opposed to by
reference). To represent this via the C ABI, crABI treats this equivalently to
a C array of the same type passed by value.

Note that this means C code can use an array directly in a context where it
will be interpreted by-value (such as in a struct field), but needs to use a
structure with a single array field in contexts where it would otherwise be
interpreted as a pointer (such as in a function argument or return value).

For instance:

```rust
extern "crabi" fn func(rgb: [u16; 3])
```

is equivalent to:

```c
struct func_rgb_arg {
uint16_t array[3];
};
extern void func(struct func_rgb_arg rgb);
```

Note that crABI does *not* pass the length, since it's a compile-time constant;
the recipient must also know the correct size. (Use one of the slice-based
types for a type with a runtime-determined length.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to remember issues about zero-sized types in C; what's expected behaviour if N is 0? I'd expect it to act like the other zero-sized types, i.e. the function parameter in it's entirety would just not be there in the C ABI, but that's not entirely clear from the example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the [T; 0] case might depend on whether the array is passed by reference (address and length) or value (just address). It seems like passing by value is ambiguous here.

Also, what's the expected behaviour for arrays containing zero-sized types?
Is it different for [(), N] and [(), 0]?
Is it different passing by value and by reference?

Copy link
Member

@bjorn3 bjorn3 Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the rust abi all ZST's are ignored. Most C abi's follow this as well, but on a couple of archs ZST's still consume a register: https://github.com/rust-lang/rust/blob/6ef7d16be0fb9d6ecf300c27990f4bff49d22d46/compiler/rustc_ty_utils/src/abi.rs#L419-L421 I think we should ignore them on all archs. The C side can simply always omit the args as necessary, but it feels wrong for x86_64-pc-windows-gnu and x86_64-pc-windows-msvc to have an incompatible crABI given that it is possible to call MSVC libraries using MinGW just fine for as long as you have the right import library and in fact MinGW depends on this to call system libraries.

structures, unless the newer side restricts itself to features understood by an
older version.

This RFC defines crABI 1.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that we feel pretty confident in most of the decisions in this RFC, but since crABI has not seen actual usage, especially regarding interop with other languages, it feels prudent to me if we started at 0.1.0 and allowed ourselves breaking changes, and only decided to move to 1.0 once enough (different) eyes have been laid on the problem.

@nortti0
Copy link

nortti0 commented Aug 20, 2023

One way to reduce the complexity of the niche optimizations would be to only use two values for the niche value:

  • Option<&T>, Option<&mut T>, Option<NonNull<T>>, Option<Box<T>>, and
    Option of any function pointer type are all passed using a null pointer to
    represent None.
  • Option of any of the NonZero* types is passed using a value of the
    underlying numeric type with 0 as None.

These can continue to use the zero value for None.

  • Option<bool> is passed using a single u8, where 0 is Some(false), 1 is
    Some(true), and 2 is None.
  • Option<char> is passed using a single u32, where 0 through 0xD7FF and
    0xE000 through 0x10FFFF are possible char values, and 0x110000 is
    None.
  • Option<OwnedFd> and Option<BorrowedFd> are passed using -1 to represent
    None

These could all use the all bits set (i.e. 0.wrapping_sub(1)) value for None, as is already used for the *Fd types.

A second, further way to simplify the specification here and allow users to enable the niche optimization for their own types would be to define a NonNegativeI* (other spellings are possible, e.g. u31) series of types. Types that are currently called out here explicitly would instead be defined to be passed using those types, and the niche optimizations section could be changed to read:

  • Option<&T>, Option<&mut T>, Option<NonNull<T>>, Option<Box<T>>, and Option of any function pointer type are all passed using a null pointer to represent None.
  • Option of any of the NonZero* types is passed using a value of the underlying numeric type with 0 as None.
  • Option of any of the NonNegative* types is passed using a value of the underlying numeric type with all bits set as None.
  • Option of a repr(transparent) type containing one of the above as its only non-zero-sized field will use the same representation.

Comment on lines +260 to +261
Slices translate to a by-value struct containing two fields: a pointer to the
element type, and a `size_t` number of elements, in that order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this properly translate to the same limitation as rust slices? I.e. at most isize::MAX bytes but at most usize::MAX items (for ZSTs)?

Comment on lines +380 to +382
As a special case, if an `enum` using `repr(crabi)` has exactly two variants,
one of which has no fields and the other of which has a single field, and the
single field type has a specific type (defined below) with a "niche" value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about non_exhaustive?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any non_exhaustive type eligible for a stable ABI like this? Or would it somehow be limited to extensions in ways that don't break the ABI? Then again, do we have restrictions on adding fields or other layout-affecting changes being made to extant ABI-declared types?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe non_exhaustive is orthogonal with "stable ABI".

For example, a linux syscall struct must have a stable layout, but should be planned with later extension in mind, as it may gain additional variants/options in future kernel versions, but won't change the layout of existing syscalls.
e.g. by including some reserved fields or leaving space in a bitset for to-be-added flags.

non_exhaustive would then remind consumers to defensively program against "timetravelers from the future" (e.g. return -EINVAL; in the linux syscall example)

Another example would be the io_uring, which keeps adding new opcodes as the kernel implementation grows (i.e. additionally-accepted values for the opcode discriminant)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[non_exhaustive] doesn't have any effect for the crate that defines the type. Instead it has an effect on users. This means that if the user tried to use a newer version of the type, doing so while using the older version of the code that defined is would be UB without said code being able to report an error for the invalid discriminant. Also extensions to #[non_exhaustive] types can only possibly work for the memory layout. The calling convention can drastically change even for adding a single field or enum variant.

@programmerjake
Copy link
Member

A second, further way to simplify the specification here and allow users to enable the niche optimization for their own types would be to define a NonNegativeI* (other spellings are possible, e.g. u31) series of types. Types that are currently called out here explicitly would instead be defined to be passed using those types, and the niche optimizations section could be changed to read:

that wouldn't work for (some of) the Win32 handle types, because iirc negative handles are perfectly valid, it's just -1 that's invalid.

@ChrisDenton
Copy link
Member

that wouldn't work for (some of) the Win32 handle types, because iirc negative handles are perfectly valid, it's just -1 that's invalid.

-1 is perfectly valid. It's a pseudo handle for the current process. But yes, negative handles are valid; they're static values with special meaning. The only common niche between real handles and pseudo handles is 0.

@programmerjake
Copy link
Member

that wouldn't work for (some of) the Win32 handle types, because iirc negative handles are perfectly valid, it's just -1 that's invalid.

-1 is perfectly valid. It's a pseudo handle for the current process. But yes, negative handles are valid; they're static values with special meaning. The only common niche between real handles and pseudo handles is 0.

I'm referring to OwnedSocket which has a niche for INVALID_SOCKET aka. -1 because that is not a valid socket handle.

*nightly-only* implementations of that version of crABI. Versions of crABI
should not be considered stable until available in stable Rust.

Future versions of crABI may also establish allow-by-default lints for the use

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a (wildly excited) outsider, this raises a question for me: will the deny-activation of such lints be enough info for the compiler to restrict itself to a lower subset of crABI?

Or must we reserve some kind of "version" specifier in the repr attribute? Something like repr(crABI, u8, v=2.1). perhaps a global annotation on the module level, to avoid repetition?

I'm thinking of cases like an enum that would have different possibilities of niche-optimisation under crABI v1, v2, v3..
Does the compiler have access to the active lint selection in the enum layout code?
Are lints even supposed to act like compiler flags to this extent?

@matklad
Copy link
Member

matklad commented Aug 24, 2023

Hm, I am quite a bit surprised that the RFC doesn’t talk about aliasing at all. Consider this crabi function declaration:

struct u8_slice {
    uint8_t *data;
    size_t len;
};
extern void copy(struct u8_slice dst, struct u8_slice src);

is the code calling the function required to ensure that src and dst do not overlap? Is the code implementing the function allowed to assume that the slices do not overlap?

Do we just use Type Based Alias Analysis rules here by virtue of just deferring to C ABI?

@bjorn3
Copy link
Member

bjorn3 commented Aug 24, 2023

is the code calling the function required to ensure that src and dst do not overlap?

If the rust side is &mut [u8] then yes. If it is *mut [u8] then no. In general I did expect the regular rust memory model rules to apply.

Do we just use Type Based Alias Analysis rules here by virtue of just deferring to C ABI?

TBAA is entirely incompatible with rust.

@matklad
Copy link
Member

matklad commented Aug 24, 2023

TBAA is entirely incompatible with rust.

Yes, but, if I understand this right, that's what the current RFC implicitly proposes to use, by saying that "we lower to C ABI".

Taking example from https://stefansf.de/post/type-based-alias-analysis/, the following crabi declaration:

extern void foo(int *x, short *y)

also carries implicit constraint that x and y do not alias (level of confidence: 0.7).

We definitely don't want to have that in crabi, because that's not how the Rust works, and not how an ideal ABI would work. But that means we need to explicitly define aliasing rules for crabi, as otherwise we are inheriting those from C.

@jonathanpallant
Copy link

jonathanpallant commented Aug 24, 2023

If I have some crabi function which takes, say, core::crabi::Option<u32> as an argument, can I pass Some(42) and have it auto-convert (or auto-infer), or do I have to say core::crabi::Option::Some(42) (or Some(42).into())?

I found the manual conversions was one of the annoying parts about using the C ABI in https://github.com/Neotron-Compute/Neotron-FFI/blob/develop/src/option.rs (e.g. here or here)

But also, yay, I could basically delete the neotron-ffi crate.

@comex
Copy link

comex commented Aug 24, 2023

extern void foo(int *x, short *y)

also carries implicit constraint that x and y do not alias (level of confidence: 0.7).

You don't get UB in C just by having pointers that alias, only if you actually dereference them with incompatible types.

So the full example from the post you linked has UB if x == y:

void foo(int *x, short *y) {
    *x  = 40;
    *y  = 0;
    *x += 2;
}

But this version would not, as long as the pointee was originally ether a dynamic allocation or a variable of declared type int:

void foo(int *x, short *y) {
    *x  = 40;
    memset(y, 0, sizeof(short));
    *x += 2;
}

As a result, there's no need for crABI to, say, munge the types to void * at the boundary. As such, I think the issue is largely out of scope for crABI.

A potential exception is that if a pointer to a Rust local or global variable is sent to C, the C side might want to know what the "declared type" is for the purpose of C's aliasing model. The right answer should be that the Rust variable is like a dynamic allocation and doesn't have a declared type. But I'm not sure if that works in all cases in the current implementation, or if it should be guaranteed for all Rust implementations. Still, it seems somewhat out of scope…

@Maix0
Copy link

Maix0 commented Nov 23, 2023

I would actually like to see a deny by default lint&compiler flag (which is a new thing I think) that would (when allowed/warned) make the transparent conversion between crabi::Option and std::Option (and the result type as well).

As this would be an deny by default, you wouldn't fall into relying on it by accident, but would still allow someone to quickly prototype/get/use an FFI API quickly.

I do feel strongly against having this in everyday program, as it is indeed basically a deep memory copy, but this could allow some leeway when trying stuff.

It is true that having to duplicate those types is sad, but as said in the RFC, there seems to be no other way.

@jeffparsons
Copy link
Contributor

It is true that having to duplicate those types is sad, but as said in the RFC, there seems to be no other way.

I know I'm just one data point, but as a Rust user who is excited about crABI: I get it. I'll live. Better a bit of minor friction than lurking dragons.

And it seems to me that it's pretty easy to leave the door open to creative solutions in future, so at least to me it doesn't seem like solving it now should be considered a blocker.

Comment on lines +497 to +504
The translation of slices and similar uses structs containing pointer/length
pairs, rather than inlining the pointer and length as separate arguments.
[As noted above][types], this is typically passed and returned in an efficient
fashion on major targets. However, in some languages, such as C, this will
require separately defining a structure and then using that structure. This
still seems preferable, though, as combining the two into one struct allows for
uniform handling between arguments, return values, and fields, as well as
keeping the pointer and length more strongly associated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing slices as two arguments (ptr then len) would allow Rust bindings to existing C code that does this to use slices directly. For example, POSIX ssize_t send(int sockfd, const void buf, size_t len, int flags) could be exposed in Rust as

extern "crABI" { // or perhaps even "C" ?
    fn send(socket: c_int, buf: *const [u8], flags: c_int) -> ssize_t;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing a structure of two fields is generally different than passing two arguments though, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is generally different. My suggestion is to make them the same for slices with extern "crABI"/extern "C", as a special case only for that specific combination.

Comment on lines +370 to +375
If an `enum` specifies `repr(crabi)` but does not specify a discriminant type,
the `enum` is guaranteed to use the smallest discriminant type that holds the
maximum discriminant value used by a variant in the `enum`.

If the `enum` has no fields, or no fields with a non-zero size, crABI will
represent the `enum` as only its discriminant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If an `enum` specifies `repr(crabi)` but does not specify a discriminant type,
the `enum` is guaranteed to use the smallest discriminant type that holds the
maximum discriminant value used by a variant in the `enum`.
If the `enum` has no fields, or no fields with a non-zero size, crABI will
represent the `enum` as only its discriminant.
If an `enum` specifies `repr(crabi)` but does not specify a discriminant type,
the `enum` is guaranteed to use the smallest discriminant type that holds the
maximum discriminant value used by a variant in the `enum`. Enums with zero
or one variants have a zero-sized discriminant.
If the `enum` has no fields with a non-zero size, crABI will represent the
`enum` as only its discriminant.

Clarify how zero-sized enums work

Comment on lines +362 to +365
crABI supports arbitrary `enum` types, if declared with `repr(crabi)`. These
are always passed using the same layout that Rust uses for enums with `repr(C)`
and a specified discriminant type:
<https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
crABI supports arbitrary `enum` types, if declared with `repr(crabi)`. These
are always passed using the same layout that Rust uses for enums with `repr(C)`
and a specified discriminant type:
<https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc>
crABI supports arbitrary `enum` types, representing discriminated unions, if
declared with `repr(crabi)`. These are always passed using the same layout that Rust
uses for enums with `repr(C)` and a specified discriminant type:
<https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc>

Clarify what enum as a type is

Comment on lines +9 to +11
Provide the initial version of a new ABI and in-memory representation
supporting interoperability between high-level programming languages that have
safe data types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Provide the initial version of a new ABI and in-memory representation
supporting interoperability between high-level programming languages that have
safe data types.
Provide the initial version of a new C-Rust ABI (`crABI`) and in-memory
representation supporting interoperability between high-level programming languages that have
safe data types.

Introduce the origin of the term somewhere in the summary

Comment on lines +86 to +87
- A repr for laying out data structures (`struct`, `union`, `enum`) compatible
with crABI: `repr(crabi)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

union is mentioned here but its ABI/layout is not mentioned in the doc, it could use a short section


An implementation of crABI should document which version of crABI it
implements, which compactly conveys supported and unsupported functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine some cases where ABI version winds up in the binary as a way to verify compatibility. It may not hurt to specify a preferred representation:

Suggested change
If ABI version needs to be encoded in a binary for any reason, it should be
stored as a struct representing major and minor versions as 8-bit integers. If
applicable, the symbol name should be `__crabi_version`.
```c
struct crabi_version {
uint8_t major;
uint8_t minor;
};
struct crabi_version __crabi_version = { .major = 1, .minor = 0 };

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a global __crabi_version would cause conflicts if multiple static libraries that export a crABI interface are linked together. And using COMDAT to deduplicatw wouldn't work either if both static libraries use a different crABI version.

outside that range are passed or returned, and in particular the compiler may
generate code that does not check this assumption, or may optionally include
validation assertions when debugging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a section ``## T - owned values` and just specify that anything `#[repr(C)]` or `#[repr(crABI)]` can be passed by value. I know this is mentioned elsewhere, but a specific section would make this more in line with the other types listed.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Guaranteed niche optmization is the most uncertain part of the proposed crABI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Rationale and alternatives" is big enough that it could use subsections e.g. ## Guaranteed niche optimization

Comment on lines +506 to +514
@programmerjake made a
[proposal](https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674249638)
([sample usage](https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674265515))
to modify the standard impl of `Drop` for `Box` to allow plugging in an
arbitrary function (via a `BoxDrop` trait), to drop the `Box` as a whole. This
would be generally useful (e.g. for object pooling), and would then permit
crABI to define a `box_drop` function that calls an FFI function to free the
object. If we accepted that proposal, it would make sense to use it to
represent crABI boxes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this go in "Future possibilities"?

Comment on lines +522 to +531
- Swift's stable ABI
- The `abi_stable` crate (which aims for Rust-to-Rust stability, not
cross-language interoperation, but it still serves as a useful reference)
- `stabby`
- UniFFI
- Diplomat
- C++'s various ABIs (and the history of its ABI changes). crABI does not,
however, aim for compatibility with or supersetting of any particular C++
ABI.
- Many, many interface description languages (IDLs).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do Swift, C++ or any others define an ABI for a slice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wasm component model has a list type, but currently only allowes owned values. C++ has string_view as &str counterpart and span<T> as &[T] counterpart.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slice type []T is a sequence of a *[cap]T pointer to the slice backing store, an int giving the len of the slice, and an int giving the cap of the slice.

That sounds like it may be storage-compatible with the struct slice { uint8_t *data; size_t len; } defined here, which is kind of nice. It seems like C compatibility isn't necessarily a goal of go's ABI though so that might be it.

Copy link
Contributor

@tgross35 tgross35 Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, string_view seems to use the opposite ((len, pointer)) on both GCC and Clang

(thanks Miguel for the correction)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you meant (size, pointer), but while libstdc++ uses that one, libc++ (i.e. LLVM's) and Microsoft's use (pointer, size) instead.

For span (dynamic), all three use (pointer, size).

Well, at least the versions I looked at.

- Should we provide *fewer* niche optimizations? Those for `NonZero` and
reference types provide obvious value; are those for `bool` and `char` really
useful enough to justify the special case in languages that will have to
handle them explicitly rather than automatically?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: Jeff Parsons <jeff@parsons.io>

Today, developers building projects incorporating multiple languages, or
calling a library written in one language from another, often have to use the C
ABI as a lowest-common-denominator for cross-language function calls. As a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ABI as a lowest-common-denominator for cross-language function calls. As a
ABI as a lowest common denominator for cross-language function calls. As a

("Lowest common denominator" doesn't have any compound adjectives.)

both languages have a safe type for counted UTF-8 strings.

For popular pairs of languages, developers sometimes create higher-level
binding layers for combining those languages. However, the creation of such
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
binding layers for combining those languages. However, the creation of such
binding layers to support communication between those languages. However, the creation of such

Any type whose ABI is already defined by C will be passed through crABI
identically. Types defined by crABI that the C ABI does not support will be
translated into a representation using types the C ABI supports (potentially
indirectly via other crABI-supported types).
Copy link

@daira daira Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, has it been considered to extend the C ABI (as supported by Rust) rather than defining a new ABI? After all they will coincide for anything that is currently valid.

I can think of the following potential reasons not to:

  1. If the platform's "official" C ABI were to newly define something differently to Rust's anticipation of it (say, defining a different way of passing [u]int128_t parameters), that would introduce an incompatibility.
  • Note that if that happened, the guarantee in the above paragraph couldn't hold either. A new ABI version would probably need to be created to restore it.
  1. A new ABI might be able to make weaker stability guarantees initially.

(1) seems not convincing, but (2) could be.

specify a discriminant type, the enum is guaranteed to use the smallest
discriminant type that holds the maximum discriminant value used by a variant
in the enum. (This differs from the behavior of `repr(C)` enums without a
discriminant type.)
Copy link

@daira daira Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should differ. For a start it contradicts the paragraph about C ABI compatibility at lines 69-72 above. The representation of an enum in C is implementation-defined as far as the standard is concerned, but in practice you can assume it is a well-defined function, for each platform, of the minimum and maximum enumeration values. (Typically, it has the same representation as the smallest standard integer type that can represent the elements potentially subject to some minimum width.)

I don't think users will expect repr(crabi) to differ from repr(C) here. It is better to write repr(crabi, u8), for example, if that's what you mean. If it does differ, then that must be explicitly called out in the paragraph at lines 69-72.

We do need to specify what the niche value is for Option of an enum type, if we are supporting niche-value optimization for enums (and I think we should). The obvious answer is "0 if possible, otherwise the all-ones value of the chosen integer type if possible, otherwise there is no niche-value optimization for this type".

larger than 0x10FFFF, or a value in the range 0xD800 to 0xDFFF inclusive.

Note that there is no special handling for an array of values of this type,
which is not equivalent to a string (unless using a UCS-4 encoding).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not equivalent to either a NUL-terminated or a counted UCS-4 encoding either.

Suggested change
which is not equivalent to a string (unless using a UCS-4 encoding).
which is not equivalent to a string.

# crABI versioning and evolution
[crabi-versioning-and-evolution]: #crabi-versioning-and-evolution

crABI has has a major and minor version number, similar to semver.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small typo here, has has

that already implies passing by pointer (such as a function argument or return
value), or translate it explicitly to a pointer (e.g `uint16_t (*rgb)[3]`) in
contexts where just writing the array would imply by-value (such as a struct
field).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C types uint16_t * and uint16_t (*)[3] are distinct types. The wording here clearly specifies that the former be used, yet an example uses the latter.

It should be made clear which one to use. And I believe &[T; N] should always be lowered to T (*)[N].

C can represent this as an array (e.g. uint16_t rgb[3]) in contexts where
that already implies passing by pointer (such as a function argument or return
value)

This can simply be removed. It is true that function parameters declared as arrays are pointers and that returning arrays is illegal, but that doesn’t concern crABI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.