Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document endianness #449

Closed
vapier opened this issue Oct 9, 2021 · 9 comments
Closed

document endianness #449

vapier opened this issue Oct 9, 2021 · 9 comments

Comments

@vapier
Copy link
Contributor

vapier commented Oct 9, 2021

i get that WASI is built on top of WASM and thus it can be easy to "just know" that WASI is obviously little endian, but for the sake of clarity, it might help to explicitly state this in the API docs. especially when one considers that there have been cases of interopt using a specific endian all the time (e.g. "network byte order" is always big endian).

@sunfishcode
Copy link
Member

Is there a particular place or particular functions where it would be helpful to document this? WASI follows the endianness conventions that all little-endian platforms follow, so it's not immediately clear where we should document this.

Also, it's worth noting that interface types are endian-independent, so as WASI transitions to those, the API specifications will be endian-independent, and any endianness sensitivity will be a result of a specific binding layer.

@vapier
Copy link
Contributor Author

vapier commented Oct 12, 2021

the word "endian" does not appear anywhere in the WASI spec. the first section is "types", so seems like putting a section on endian first would work.

unless WASI is aspiring to grow beyond WASM, it's always going to be little endian.

@sunfishcode
Copy link
Member

Interface types doesn't expose the storage of the values, so it doesn't expose endianness. This is the direction that WASI is evolving, and as such, it's convenient to avoid having documentation talk about endianness unless there's a specific need for it.

The x86_64 psabi document, for example, doesn't say the word "endian" anywhere either, except in the layout of __int128 which isn't a register type. Is there something in WASI's documentation that gives the impression that something might not be little-endian right now, that would be helpful to clarify?

@vapier
Copy link
Contributor Author

vapier commented Oct 12, 2021

i'm not sure why you're resisting writing clear specifications. i filed this bug because i had people ask about it. they read the spec while reviewing code and couldn't find the answer.

referring to other specs that are ambiguous isn't really a good argument. i'll note that the AMD64 psABI states that it only uses ELFDATA2LSB for ELF objects which is little endian encoding, and it says " These values use the same byte order as other word values in the AMD64 architecture" while failing to define that byte order in the data representation section.

anyone implementing WASI needs to know what endianness these interfaces are using. for values passed as immediate values (i.e. function arguments), it's not terribly relevant as it's probably reasonable to assume one doesn't have to do byte swapping on registers (or equiv), but WASI also defines pointers to data structures & multi-byte words in memory. anyone working on either side of the boundary needs to know what endianness those are supposed to be. a naive memcpy(memory_buffer, &integer, 4) isn't portable.

something in WASI's documentation that gives the impression that something might not be little-endian right now

where in the documentation is there any clue that it's little endian and not big endian ? or XOR endian or network endian or host endian or PDP endian or some other endian ?

i'll point out that network interfaces have a long history of always being big endian (i.e. "network endian") precisely so that peers don't have to negotiate if their CPUs are using different endianness.

@sunfishcode
Copy link
Member

My thought was to try to uncover a possible root cause for confusion, rather than focus on what might turn out to be a symptom.

Also, as I mentioned above, a high-level direction for us is to move away from raw pointers and endianness, at the specification level. I'm happy to mention endianness if there are specific things that are confusing. And of course we'll mention endianness if we add APIs that expose network byte order (as other little-endian platforms do). However in absence of specific needs, it's convenient to treat endianness as a property of the bindings we're currently using, rather than something that the WASI APIs themselves need to document, so that we can more easily migrate to different kinds of bindings, including bindings that don't expose endianness at all.

@vapier
Copy link
Contributor Author

vapier commented Oct 13, 2021

who do you see as the target audience of the WASI spec ? is it application programmers (i.e. people writing "hello world"), or language bindings implementers, or runtime implementers ?

if application programmers need to read this spec, then we have failed them. they should never need to peek under the hood here. the only thing they need is a POSIX compiler & environment. which is what wasi-sdk does now fairly well.

people working on language bindings & runtimes very much need to know these details. no level of abstraction at the API level changes that. the whole point of WASI is to connect completely unrelated runtimes and still have things Just Work. we're never going to get away from raw memory access (like we have with pointers now) which means these details need to be defined precisely.

if we do somehow manage to make details like endianness irrelevant years in the future, it's pretty trivial to just delete such sections & discussions from the spec. but i don't see how that aspiration is relevant now. the WASI API is steeped up to its eyes in multibyte integers with no explanation as to its encoding, and it's doing a disservice leaving things ambiguous. i still don't see why you think it's reasonable that everyone should naturally assume everything is little endian. there is nothing in the spec to suggest that. assuming host cpu endianness seems like a more natural default assumption.

@sbc100
Copy link
Member

sbc100 commented Oct 13, 2021

I tend to agree that we should not avoid documenting how things work today (as in wasi_snapshot_preview1) because we have aspirations to side step certain issues in the future.

We do, after all, document the requirement to export the wasm memory, even though we hope to avoid that one day too.

@linclark
Copy link
Member

I tend to agree that we should not avoid documenting how things work today (as in wasi_snapshot_preview1) because we have aspirations to side step certain issues in the future.

We're very close to making the switch to using Interface Types based on Canonical ABI, as Alex has demonstrated in recent meetings.

With that, it feels like endianness should be documented at the Canonical ABI level, rather than in WASI itself. That could happen in the Interface Types repo as soon as WebAssembly/interface-types#132 lands

@sunfishcode
Copy link
Member

The endianness of the Canonical ABI is now documented as "little".

As seen in these links, the ABI documentation is already greatly improved for Preview2. In addition to endianness, it has full ABI documentation. Preview1's documentation isn't anywhere near this complete, and wouldn't be enough for someone to build an implementation on, even if we added endianness. So at this point, I think it makes sense to focus on Preview2 as the direction of the platform going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants