-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom sections in the text format #1153
Comments
Good point, there should be a way to specify these sections in the text format. It seems like this was probably discussed in the past, but I can't remember where that may have been. @rossberg, any thoughts on this? |
Well, a couple of issues with expressing custom sections directly:
In general, the custom section format is more like a detail of the binary format. The assumption was that relevant custom sections are not written verbatim but rather synthesised from the text format, like the name section or the binding section. What we should think about, then, is a generic syntax for annotations that can be put anywhere in the syntax tree. That would already be needed by the binding proposal. My suggestion would be to use nodes of the form WDYT? |
Since the core spec does have a defined notion of a custom section, I think it makes sense to give a fully-specified representation in the text format. While it's true that we'd have a hard time expressing the precise placement of the custom section, I expect it's fine to just say that (Honestly, I wonder about the utility of allowing custom sections anywhere but at the end; I bet we could remove that "feature" and nothing would break.) |
All known sections have to be ordered, so you could just use a number to specify which known section it comes after. Something like
I agree if we assume the purpose of the text format is just to generate tests for the spec. But we're already using the text format as a way to express the contents of the binary, and AFAICT it doesn't lose much information currently. The only thing I can think of right now is the length of varint values and custom section data. Are there others? Also, we could make it slightly nicer than raw bytes by having a structured data format. The
RE: annotations Agreed, annotations would be useful. I believe @yurydelendik was suggesting something like this before, maybe he has some thoughts about it. And you're right, I think we could handle custom sections in a structured way doing this. But I'd like to see a way to handle a custom section that has unstructured data, or one that is unknown to the parser too. |
All known sections have to be ordered, so you could just use a number to
specify which known section it comes after. Something like (custom 0 ...)
would come before the type section (1). (custom 3 ...) would go after the
function section (3) and before the table section (4).
That would be rather brittle and expose low-level details of the binary
encoding. In particular, we have assumed that we may insert new sections
anywhere in future extensions of the binary format, so a numeric scheme is
not future-proof.
If we could adopt @lukewagner's suggestion of eliminating free placement of
custom sections then I'd feel more comfortable, but I'm not sure how
realistic that is.
The only thing I can think of right now is the
length of varint values and custom section data. Are there others?
No, none that I'm aware of.
Also, we could make it slightly nicer than raw bytes by having a structured
data format. The name section and the reloc section follow the same basic
structure of other sections, using varints, strings and vectors. If we
provided those primitives we could make it pretty easy to generate. They
wouldn't roundtrip very nicely of course. Something like this, maybe:
(custom 12 "foo"
(string "hello")
(vector
(group (varuint32 1) (f32 3.4))
(group (varuint32 2) (bytes "12345"))
)
)
That would be cute. But I immediately worry about this becoming an
open-ended DSL without ever eliminating bias towards known custom sections.
Agreed, annotations would be useful. I believe @yurydelendik
<https://github.com/yurydelendik> was suggesting something like this
before, maybe he has some thoughts about it. And you're right, I think we
could handle custom sections in a structured way doing this. But I'd like
to see a way to handle a custom section that has unstructured data, or one
that is unknown to the parser too.
Sure thing, we can simply support `(@Custom "name" "contents")` etc as a
generic fallback. AFAICS, that could subsume the suggestion above.
|
Right, I forgot that new known sections may not be ordered. I think it will still work, though. If we assume that all known sections can occur only 0 or 1 times, as is currently true, then it doesn't seem like this is a problem. The number can just mean which section the custom section is before in the given module. If the section doesn't occur in the module, we could say that the text for that section is invalid. If we decide later that a known section can occur more than once, we can extend the text format at the same time to indicate which section we mean. And if using a number is gross/ugly, we can always use the names given in the spec:
If we did this, we'd probably also want to require that you can't specify sections out of order. Not so sure about the before/after thing either, but it's easy to understand and allows all placements.
It probably isn't used much, but I would prefer not to break compatibility over it.
Right, this covers everything, it just is inconvenient. |
In addition to text-format motivations, there's also the fact that if it's an infrequently used feature, it will be undertested and likely to have problems in practice. I know we've had specific bugs about custom sections in weird places. Maybe worth putting discussion/poll on CG agenda? |
At the most recent CG meeting, we had some opposition to the idea of requiring custom sections to be at the end. The reason is that some uses cases for custom section involve informing later stages of the compilation pipeline. For example, tools might want to provide extra hints (which functions should be compiled first, which locals should get registers, etc.) that VMs could optionally consume. In this case, we'd want to read the hints before we start streaming compilation of the code. |
First pass proposal overview for custom sections in text format: https://gist.github.com/binji/d1cfff7faaebb2aa4f8b1c995234e5a0 |
I've updated the gist after some feedback. Sorry I didn't notice this earlier, it seems that gist comments don't show up in my notifications (or I missed them). |
@binji GitHub doesn't send notifications for Gist comments, it's very annoying. |
I prototyped something similar to @binji's proposed syntax, but an issue I ran into is that it can express more information about the section ordering than the binary format. For example, a binary module with no data segments cannot distinctly encode |
@AndrewScheidecker, you might want to discuss this over at the annotations proposal, which contains a more up-to-date and complete definition of custom section annotations. To reply to your commt, though, I am not sure why you consider this a problem. There are many examples of the text format being able to express the same binary in multiple ways. How is this different? Providing a unique way of describing placement is not a goal of these multiple forms, but being able to place something reliably in a fashion that is agnostic to the actual absence or presence of particular sections. So this is working as intended. You pick the placement that is correct in the presence of all sections, but it will also work fine if a respective section happens to be absent. You don't have to worry about which case you're in. |
If it is useful to express ordering constraints relative to virtual sections that may or may not be present in the binary module, then it must be worthwhile to encode those constraints in the binary module somehow. Imagine that some compiler produces a WASM object file with a custom section that needs to be ordered between the code and data sections, but that module does not contain a code section. If you want to link that object file with another that does have a code section, then you need some additional metadata (or knowledge of that particular custom section) to ensure that the custom section ends up after that code section and not before it in the linked WASM module. There's no text format involved here, but this scenario would benefit from being able to express the ordering constraints relative to virtual sections that are proposed here for the text format only. It's true that there's other information in the text format that is not present in the abstract syntax and binary format, but the stuff I can think of is all trivia: the interleaving of definitions of different kinds, function types that aren't explicitly declared up front, comments, whitespace, expression vs instruction syntax, etc. |
I don't think that follows. You shouldn't think of placements as a restrictive mechanism but a descriptive one. But more importantly, as you say, this has nothing to do with the text format. Your complaint is about the design of the binary format itself. But that is an inherent and unsolvable (and known) problem with the notion of custom data. It is true that a generic tool dealing with unfamiliar custom sections cannot know how to handle them correctly. But that is a much more general problem. To be correct, a linker might need to combine or modify certain custom sections, but by their nature of being custom, it generally has no way of knowing if or how. Their placement probably is the smallest problem such a tool faces. There is no solution to this.
Function type desugaring in particular is way more complicated. ;) |
My complaint is not about only the binary format, or only the text format, but about a mismatch between them. :) What I'm doing for now is to restrict the text format to prohibit specifying ordering relative to virtual sections that are not present according to some predicate defined on the abstract syntax. When decoding a binary module, empty sections (or sections that may not be present according to the abstract syntax predicate) are ignored for purposes of inferring the custom section order. With those changes, I can round-trip custom sections ast->text->ast and ast->binary->ast.
The desugaring is non-trivial, but the additional information in the text format is "trivia" in the sense that it doesn't affect the meaning of the program. |
I think I see what you're saying @AndrewScheidecker. I agree it would be better to continue the discussion on the annotations proposal, however. Would you mind opening a new issue there instead? We haven't done much work on that recently, but if someone picked it up, I wouldn't want this concern to fall through the cracks. |
The design document on the text format says:
However the specification doesn't specify how to encode custom sections: https://webassembly.github.io/spec/text/modules.html#text-module and wasm2wat ignores custom sections.
The text was updated successfully, but these errors were encountered: