-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document endianness #449
Comments
Is there a particular place or particular functions where it would be helpful to document this? WASI follows the endianness conventions that all little-endian platforms follow, so it's not immediately clear where we should document this. Also, it's worth noting that interface types are endian-independent, so as WASI transitions to those, the API specifications will be endian-independent, and any endianness sensitivity will be a result of a specific binding layer. |
the word "endian" does not appear anywhere in the WASI spec. the first section is "types", so seems like putting a section on endian first would work. unless WASI is aspiring to grow beyond WASM, it's always going to be little endian. |
Interface types doesn't expose the storage of the values, so it doesn't expose endianness. This is the direction that WASI is evolving, and as such, it's convenient to avoid having documentation talk about endianness unless there's a specific need for it. The x86_64 psabi document, for example, doesn't say the word "endian" anywhere either, except in the layout of |
i'm not sure why you're resisting writing clear specifications. i filed this bug because i had people ask about it. they read the spec while reviewing code and couldn't find the answer. referring to other specs that are ambiguous isn't really a good argument. i'll note that the AMD64 psABI states that it only uses anyone implementing WASI needs to know what endianness these interfaces are using. for values passed as immediate values (i.e. function arguments), it's not terribly relevant as it's probably reasonable to assume one doesn't have to do byte swapping on registers (or equiv), but WASI also defines pointers to data structures & multi-byte words in memory. anyone working on either side of the boundary needs to know what endianness those are supposed to be. a naive
where in the documentation is there any clue that it's little endian and not big endian ? or XOR endian or network endian or host endian or PDP endian or some other endian ? i'll point out that network interfaces have a long history of always being big endian (i.e. "network endian") precisely so that peers don't have to negotiate if their CPUs are using different endianness. |
My thought was to try to uncover a possible root cause for confusion, rather than focus on what might turn out to be a symptom. Also, as I mentioned above, a high-level direction for us is to move away from raw pointers and endianness, at the specification level. I'm happy to mention endianness if there are specific things that are confusing. And of course we'll mention endianness if we add APIs that expose network byte order (as other little-endian platforms do). However in absence of specific needs, it's convenient to treat endianness as a property of the bindings we're currently using, rather than something that the WASI APIs themselves need to document, so that we can more easily migrate to different kinds of bindings, including bindings that don't expose endianness at all. |
who do you see as the target audience of the WASI spec ? is it application programmers (i.e. people writing "hello world"), or language bindings implementers, or runtime implementers ? if application programmers need to read this spec, then we have failed them. they should never need to peek under the hood here. the only thing they need is a POSIX compiler & environment. which is what wasi-sdk does now fairly well. people working on language bindings & runtimes very much need to know these details. no level of abstraction at the API level changes that. the whole point of WASI is to connect completely unrelated runtimes and still have things Just Work. we're never going to get away from raw memory access (like we have with pointers now) which means these details need to be defined precisely. if we do somehow manage to make details like endianness irrelevant years in the future, it's pretty trivial to just delete such sections & discussions from the spec. but i don't see how that aspiration is relevant now. the WASI API is steeped up to its eyes in multibyte integers with no explanation as to its encoding, and it's doing a disservice leaving things ambiguous. i still don't see why you think it's reasonable that everyone should naturally assume everything is little endian. there is nothing in the spec to suggest that. assuming host cpu endianness seems like a more natural default assumption. |
I tend to agree that we should not avoid documenting how things work today (as in We do, after all, document the requirement to export the wasm memory, even though we hope to avoid that one day too. |
We're very close to making the switch to using Interface Types based on Canonical ABI, as Alex has demonstrated in recent meetings. With that, it feels like endianness should be documented at the Canonical ABI level, rather than in WASI itself. That could happen in the Interface Types repo as soon as WebAssembly/interface-types#132 lands |
The endianness of the Canonical ABI is now documented as "little". As seen in these links, the ABI documentation is already greatly improved for Preview2. In addition to endianness, it has full ABI documentation. Preview1's documentation isn't anywhere near this complete, and wouldn't be enough for someone to build an implementation on, even if we added endianness. So at this point, I think it makes sense to focus on Preview2 as the direction of the platform going forward. |
i get that WASI is built on top of WASM and thus it can be easy to "just know" that WASI is obviously little endian, but for the sake of clarity, it might help to explicitly state this in the API docs. especially when one considers that there have been cases of interopt using a specific endian all the time (e.g. "network byte order" is always big endian).
The text was updated successfully, but these errors were encountered: