Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI freedom #506

Open
arnetheduck opened this issue Jan 20, 2023 · 10 comments
Open

ABI freedom #506

arnetheduck opened this issue Jan 20, 2023 · 10 comments

Comments

@arnetheduck
Copy link

arnetheduck commented Jan 20, 2023

Abstract

Nim ABI should be documented as undefined by default, allowing the compiler to make free optimization choices, with the possibility to define compatibility options.

Motivation

No response

Description

Currently, the ABI of Nim is not well-defined, except that it's loosely based on whatever the backend decides to emit.

There exist some backend-specific pragmas to control some aspects of the ABI - for example {.packed.}, {.align.}, {.bycopy.} etc, but these are spotty and live in a vacuum of otherwise undefined behavior - for example, how parameters are passed depends on undocumented and arbitrary features like the size of the object.

With this proposal, the idea would be two-fold:

  • enshrine the undefined:ness in the specification, explicitly pointing out for example that the parameter passing distinction between pointer and value may change and that the order of fields in an object may change / be reorganised by the compiler as it sees fit
  • document the ABI more in detail when the code is annotated with exportc - this means defining behaviors and disallowing the use of features with undefined behaviors in such functions
    • example: if a function is tagged exportc, the var and "ordinary" parameter passing should be well-defined and documented - alternatively, it should be disallowed and only be allowed with further more specific annotations (ie bycopy).
    • exportc for object would force the compiler to generate fields in C order and rules
    • etc

Allowing the compiler ABI-freedom allows the implementation of significant optimizations - one such optimization is field reordering for alignment purposes: this allows the compiler to order fields according to an optimal arrangement for the target platform, taking into account alignment requirements etc.

Code Examples

type
  SomeObject = object
    f0: char
    f1: int
    f2: char
    f3: int

proc f(v: SomeObject) =
  # This prints 32 on a 64-bit platform today - the optimal size on x86_64 is
  # 24 however achieved by ordering the fields in decreasing size order.
  # Applying the optimizations made possible by ABI freedom also means that
  # it `v` could be passed to `f` by value instead of by pointer, assuming the 
  # cutoff is `3*sizeof(int)`, thus making `f` more amenable to further optimizations
  echo sizeof(SomeObject)


type
  SomeExported {.exportc.} =
    # this type would use ABI rules matching `C` as closely as possible
    ...

Most benefits of ABI freedom can be realized using the C backend - nlvm can take further advantage by guiding the optimizer using llvm-specific metadata that is used to achieve good performance in other languages such as rust/swift/etc which already have features like this.

Backwards Compatibility

Backwards compatibility can be achieved by adding various degrees of strictness to the language in phases, starting with warnings (akin to deprecations) and finally introducing compile-time errors for things that previously were undefined and may become invalid under such optimizations.

Some backwards incompatibility is expected when encountering code that relies on the current implicit undefined behavior - for example, code might assume that just because previous versions used a pointer to pass >24-byte objects to a function, this will remain so. Such code is arguably already broken, but the edge can be taken off the upgrade by simply highlighting such code as invalid, either via warning or error - it's viable because it's constrained to exportc / importc functions which see limited use.

@Araq
Copy link
Member

Araq commented Jan 26, 2023

Yes and no. Large parts of the ABI are undefined and changing with the various mm and exceptions implementations but there is a "stable" subset:

  • enums must be 1, 2, 4 or 8 bytes and the size must be minimal unless they have fields that map to negative values.
  • char must be 1 byte and unsigned.
  • openArrays are always expanded (ptr, len) pairs.
  • .bycopy objects must match C structs.
  • .union objects must match C unions.
  • cstring, cint etc must match C.
  • var T must be mapped to "pointer to T".
  • inheritance is compatible with C++ if compilation to C++ is used.
  • The calling conventions like cdecl, stdcall must match the platform's calling conventions though that is largely obsolete on any target except for windows, x86, 32 bits.

It's unwise to leave these things unspecified, every C wrapper depends on my outlined informal spec. But we can allow for object field reorderings unless .bycopy is used.

@arnetheduck
Copy link
Author

enums must be 1, 2, 4 or 8 bytes and the size must be minimal unless they have fields that map to negative values.

does the compiler actually generate C code that enforces this? in C, it's arbitrarily sized.

https://stackoverflow.com/questions/366017/what-is-the-size-of-an-enum-in-c -

@Araq
Copy link
Member

Araq commented Feb 12, 2023

does the compiler actually generate C code that enforces this?

Yes, it doesn't map enum to C's enum but to e.g. unsigned char etc.

@juancarlospaco
Copy link
Contributor

there is a "stable" subset:

  • enums must be 1, 2, 4 or 8 bytes and the size must be minimal unless they have fields that map to negative values.
  • char must be 1 byte and unsigned.
  • openArrays are always expanded (ptr, len) pairs.
  • .bycopy objects must match C structs.
  • .union objects must match C unions.
  • cstring, cint etc must match C.
  • var T must be mapped to "pointer to T".
  • inheritance is compatible with C++ if compilation to C++ is used.
  • The calling conventions like cdecl, stdcall must match the platform's calling conventions though that is largely obsolete on any target except for windows, x86, 32 bits.

That should be documented somewhere,
even literal copypaste of that is better than nothing IMHO.

@treeform
Copy link

I think it's fine that Nim's ABI is largely undefined. Nim does not map well to plain C. I think my project Genny solves this issue in a better way. Instead of hoping to get the correct Nim ABI, just guide it to generate the best ABI you can. Nim can do a ton of meta programming. This way you control exactly what happens: who frees the objects, how polymorphism and generics are handled. You even get a .h file you can use in your C project with correct layout and functions.

My goal with Genny is to generate the best wrapper not just for the C language and others: python, js, C++, ruby... as well.

@beef331
Copy link

beef331 commented Mar 24, 2023

As nice as genny is I think something akin to cbindgen or nbindgen makes a bit more sense, especially since the Nim compiler can be used mostly as a library. To this end I did toy with this premise, the present state takes something like https://gitlab.com/beef331/seeya/-/blob/main/tests/basicproc.nim and emits https://gitlab.com/beef331/seeya/-/blob/main/tests/testlib.h It's not the prettiest or best, but it was just a toy afterall.

@treeform
Copy link

I think @beef331's library reinforces my point. A library should generate an ABI and .h files and similar files for other languages. Instead of Nim compiler trying to do it.

@omentic
Copy link

omentic commented Jun 23, 2023

This seems good to me. I don't see too much point in half-defining an ABI outside of what is strictly necessary for existing C interop. Speaking of which, is the structure of enums, chars, openarrays, and var T defined above necessary for anything or just best practice?

However: note that the Rust language team has begun work on a language-independent "crabi" that aims to be an ABI for languages with proper type systems, and a superset of the C ABI. This seems to be a very long term project: but as I understand it will provide for significant enough improvements to dynamic linking and cross-language interop that it would be well worth conforming to it when it rolls around.

@omentic
Copy link

omentic commented Aug 12, 2023

crabi is moving faster than expected. An initial draft of the ABI is available here: rust-lang/rfcs#3470
https://github.com/rust-lang/rfcs/pull/3470/files

It is written with Rust in mind: but as Nim shares a similar type system, respecting it may be valuable.

@arnetheduck
Copy link
Author

Rust language team has begun work on a language-independent "crabi"

More broadly, the ABI freedom should apply to "native nim" by default - on top of that, any constraints should be explicit whether that is C, C++ or in the future, crabi.

There are other interop examples, ie haskell and swift use specific error passing and call/argument conventions which also represent plausible ABI constraints that could be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants