-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata in s-expression? atoms #258
Comments
To check my understanding: are you asking for a way to reliably generate unknown sections (with given names) in the current s-expr text language? That seems useful for testing, in addition to other things. |
I agree that a way to represent our arbitrary unknown sections in the s-expr format seems useful. This feature would wants a way for these sections to contain references to arbitrary locations in the code. In the binary format one might imagine using byte offsets for this, but that's less practical in a text format. Possible approches include:
|
These mark nodes should probably behave kind of like blocks that yield the last value, something like:
This way you could also refer to a region of the AST, the exact meaning of what you're referring to depends on the use. |
This will be awesome for unknown sections, e.g. for WebAssembly/design#208 . But I want to limit this issue only to annotating AST nodes with metadata in textual formats such as s-expression, like described by @sunfishcode's comment above. And to be more specific, for source-level debug information. |
I'm not sure it's worth designing this specific feature at the moment, since we haven't settled on an actual textual format. It may not be s-expressions. I agree that this issue is important, but too early until we figure out what the textual format is. |
We need this particular feature so we can make progress on source-level debugging tooling. A temporary solution might be okay, we could always adapt it to the actual textual format. Yury's prototype: http://people.mozilla.org/~mbebenita/wasm/wast-debugging.mp4 |
Rather than (mark $name) being a separate node, what about adding a @name On Fri, Feb 26, 2016 at 3:05 PM, Michael Bebenita notifications@github.com
|
Sounds even better. Would @name subsume names of label targets as well? If so, then we would have to prevent duplicate labels and shadowing of labels in block scopes. To refer to a particular AST node, we could just use the syntax @functionName:@labelname, and might as well use the $ instead of @ since it's already how we define names. |
Personally I think the locations should be based on a property of the source code rather than requiring annotations to the source code. For example, the text source file character position, or the form number obtained by a depth first walk of the sexp, or a list of indexes to walk to the node, etc - all with different tradeoffs. Can your tool emitting the debug info track such a key while emitting the wasm binary or text? If people want to represent unknown sections in the wast then it might need to be a binary blob for now, but I hope the community can work together to use a common data layer even if it needs the flexibility to handle both pre and post order data encodings. |
On Fri, Feb 26, 2016 at 4:02 PM, JSStats notifications@github.com wrote:
|
Would I be correct that these annotations are not visible in the binary encoding of the wasm sections, and that even a wasm debug section would not use them? If so then they appear to be purely a tooling issue and I'll stay out of this one :) |
On Fri, Feb 26, 2016 at 4:13 PM, JSStats notifications@github.com wrote:
I was assuming that the translation from s-expr to binary would store a
|
@titzer I presume then that the annotations table binary encoding will use some key, and probably something short like a pc offset, so will not be robust to transforms either. I think it was conceded some time ago that tools would be expected to transform functions on the function-level granularity, so the tools would be expected to decode the AST and the function annotations into an intermediate representation that supports tracking the annotations while transforming the code then to re-encode both. Seems like a tooling issue to me for now, and perhaps moving into a debug section definition to encode the location of the annotations. |
@yurydelendik Ah hah, I see now. So to check my understanding v.2: the root problem is that you need to get the byte offsets of various nodes (for debug info) and you don't want to have to duplicate all the logic in the .wast-to-.wasm just to get these offsets; you want to be able to reuse the existing .wast-to-.wasm tooling and extract this data. I was thinking that this offset info could just as well be a second file output, but I think it would be more convenient to use a section inside the .wasm. For example, it'd make it easier to add to SM's The "@functionName:@labelname" @mbebenita mentioned makes sense to use as the key since it has the stability property and obvious correspondence to what you wrote in the .wast. So concretely, this new "label-offsets" section could be a sequence of (function name string, label name string, offset) tuples. But, to save space (potentially hundreds of MB for big .wasts), we could also add one level of nesting and have the section contain a sequence of functions where each function started with the function name string followed by a sequence of (label name, offset) pairs. This would actually have nice symmetry with the optional "func names"/"local names" sections discussed earlier. |
Let's think of wast as some format that helps us with discovery of WebAssembly, e.g. understanding how source code maps to wasm AST or inspect what information is associated with specific operations. I'm thinking that having this information expressed in some different syntax might be useful: the visualization source-to-wast utilities can be created, wast round-trip tooling (e.g. injecting some extra diagnostics code while preserving original source mapping), or just learn the platform basics; while tools that is not interested in this information may easily ignore or strip it. Currently for custom tooling it can be replaced by special comments: they are easily strip-able and not intervene with primary spec prototype or existing implementations, but having something that has status of metadata or pre-processor directive would be nice. More simplified analogs to LLVM syntax I mentioned above will be |
There are three components to this:
The second component is, I think, the tricky part. I think it is an argument that the text format should be more like the current stack-machine assembly output from LLVM than the S-expression AST. Either way, I don't think the text format should interleave debug information with semantic code. However, I can see it being useful for the text format to include an inline declaration to associate an operation with an identifier that can be referenced by the metadata instead of using a naked index. Here's a sketch of what I'm imagining: Source:
WebAssembly text format:
Even if the text format supports metadata directly, it might be useful to add a way to declare sections containing a "raw" string, similar to static data, so binaries with unknown sections can be converted to text without losing information. |
This issue is old, but most of what's been discussed here is beyond the scope of the core Wasm spec and its S-expression format. => Closing. |
* Support i16x8 and implement neg, add, sub, mul Create a new module I16, which uses Int.Make, and is backed by Int32. It reuses a bunch of logic in Int. It stores 16-bit integers sign-extended in Int32. This means that -1 (0xFFFF) is stored as 0xFFFFFFFF, rather than 0x0000FFFF. All the bytes decode/encode logic is also done using the signed form. * Remove debug code, better names, make sign extend check more generic
* Update index of instructions. Fixes Issue WebAssembly#258 - Python script tweak, TRY has now two validation and two execution rules.
Most of the languages have some way of expressing code metadata on the source level. It's probably not an issue for binary format since it's easy to add unknown sections [1] that will refer encoded op-codes/data by offsets in the e.g. function sections. However the wast/s-expression is missing this capability, and currently this format is heavily used for prototyping. While it's possible to provide this information as comment e.g. with '!' added
(;!DILocation (line 2) (column 9) (scope (!4)));)
, it will be nice to define it as a syntax and probably skip/ignore that for MVP (as we do it for the comments). What we are trying to reach here is some verifiable structure for the nested metadata tags.The main use case is to include source-level debug information into intermediate WebAssembly language. At the end we would like to see something like LLVM has (see [2] and [3]). Perhaps something like:
The main idea will be to associate some metadata with s-expression list nodes, but still be something we can validate and serialize to/deserialize from binary format.
(Related issues found [4])
[1] https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#unknown-sections
[2] http://llvm.org/docs/LangRef.html#metadata
[3] http://llvm.org/docs/SourceLevelDebugging.html#object-lifetimes-and-scoping
[4] WebAssembly/design#208
The text was updated successfully, but these errors were encountered: