Skip to content
This repository was archived by the owner on Jan 15, 2025. It is now read-only.

Commit 1a33be6

Browse files
authored
[spec/interpreter/test] Spec & implement proposal (PR #4)
2 parents c047072 + f11bb0b commit 1a33be6

File tree

5 files changed

+438
-10
lines changed

5 files changed

+438
-10
lines changed

document/core/appendix/custom.rst

+200-6
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
1-
.. index:: custom section, section, binary format
1+
.. index:: custom section, section, binary format, annotation, text format
22

3-
Custom Sections
4-
---------------
3+
Custom Sections and Annotations
4+
-------------------------------
55

6-
This appendix defines dedicated :ref:`custom sections <binary-customsec>` for WebAssembly's :ref:`binary format <binary>`.
7-
Such sections do not contribute to, or otherwise affect, the WebAssembly semantics, and like any custom section they may be ignored by an implementation.
6+
This appendix defines dedicated :ref:`custom sections <binary-customsec>` for WebAssembly's :ref:`binary format <binary>` and :ref:`annotations <text-annot>` for the text format.
7+
Such sections or annotations do not contribute to, or otherwise affect, the WebAssembly semantics, and may be ignored by an implementation.
88
However, they provide useful meta data that implementations can make use of to improve user experience or take compilation hints.
99

10-
Currently, only one dedicated custom section is defined, the :ref:`name section<binary-namesec>`.
1110

1211

1312
.. index:: ! name section, name, Unicode UTF-8
@@ -138,3 +137,198 @@ It consists of an :ref:`indirect name map <binary-indirectnamemap>` assigning lo
138137
\production{local name subsection} & \Blocalnamesubsec &::=&
139138
\Bnamesubsection_2(\Bindirectnamemap) \\
140139
\end{array}
140+
141+
142+
.. index:: ! name annotation, name, Unicode UTF-8
143+
.. _text-nameannot:
144+
145+
Name Annotations
146+
~~~~~~~~~~~~~~~~
147+
148+
*Name annotations* are the textual analogue to the :ref:`name section <binary-namesec>` and provide a textual representation for it.
149+
Consequently, their id is :math:`\T{@name}`.
150+
151+
Analogous to the name section, name annotations are allowed on :ref:`modules <text-module>`, :ref:`functions <text-func>`, and :ref:`locals <text-local>` (including :ref:`parameters <text-param>`).
152+
They can be placed where the text format allows binding occurrences of respective :ref:`identifiers <text-id>`.
153+
If both an identifier and a name annotation are given, the annotation is expected *after* the identifier.
154+
In that case, the annotation takes precedence over the identifier as a textual representation of the binding's name.
155+
At most one name annotation may be given per binding.
156+
157+
All name annotations have the following format:
158+
159+
.. math::
160+
\begin{array}{llclll}
161+
\production{name annotation} & \Tnameannot &::=&
162+
\text{(@name}~\Tstring~\text{)} \\
163+
\end{array}
164+
165+
166+
.. note::
167+
All name annotations can be arbitrary UTF-8 :ref:`strings <text-string>`.
168+
Names need not be unique.
169+
170+
171+
.. index:: module
172+
.. _text-modulenameannot:
173+
174+
Module Names
175+
............
176+
177+
A *module name annotation* must be placed on a :ref:`module <text-module>` definition,
178+
directly after the :math:`\text{module}` keyword, or if present, after the following module :ref:`identifier <text-id>`.
179+
180+
.. math::
181+
\begin{array}{llclll}
182+
\production{module name annotation} & \Tmodulenameannot &::=&
183+
\Tnameannot \\
184+
\end{array}
185+
186+
187+
.. index:: function
188+
.. _binary-funcnameannot:
189+
190+
Function Names
191+
..............
192+
193+
A *function name annotation* must be placed on a :ref:`function <text-func>` definition or function :ref:`import <text-import>`,
194+
directly after the :math:`\text{func}` keyword, or if present, after the following function :ref:`identifier <text-id>` or.
195+
196+
.. math::
197+
\begin{array}{llclll}
198+
\production{function name annotation} & \Tfuncnameannot &::=&
199+
\Tnameannot \\
200+
\end{array}
201+
202+
203+
.. index:: function, parameter
204+
.. _binary-paramnameannot:
205+
206+
Parameter Names
207+
...............
208+
209+
A *parameter name annotation* must be placed on a :ref:`parameter <text-param>` declaration,
210+
directly after the :math:`\text{param}` keyword, or if present, after the following parameter :ref:`identifier <text-id>`.
211+
It may only be placed on a declaration that declares exactly one parameter.
212+
213+
.. math::
214+
\begin{array}{llclll}
215+
\production{parameter name annotation} & \Tparamnameannot &::=&
216+
\Tnameannot \\
217+
\end{array}
218+
219+
220+
.. index:: function, local
221+
.. _binary-localnameannot:
222+
223+
Local Names
224+
...........
225+
226+
A *local name annotation* must be placed on a :ref:`local <text-param>` declaration,
227+
directly after the :math:`\text{local}` keyword, or if present, after the following local :ref:`identifier <text-id>`.
228+
It may only be placed on a declaration that declares exactly one local.
229+
230+
.. math::
231+
\begin{array}{llclll}
232+
\production{local name annotation} & \Tlocalnameannot &::=&
233+
\Tnameannot \\
234+
\end{array}
235+
236+
237+
.. index:: ! custom annotation, custom section
238+
.. _text-customannot:
239+
240+
Custom Annotations
241+
~~~~~~~~~~~~~~~~~~
242+
243+
*Custom annotations* are a generic textual representation for any :ref:`custom section <binary-customsec>`.
244+
Their id is :math:`\T{@custom}`.
245+
By generating custom annotations, tools converting between :ref:`binary format <binary>` and :ref:`text format <text>` can maintain and round-trip the content of custom sections even when they do not recognize them.
246+
247+
Custom annotations must be placed inside a :ref:`module <text-module>` definition.
248+
They must occur anywhere after the :math:`\text{module}` keyword, or if present, after the following module :ref:`identifier <text-id>`.
249+
They must not be nested into other constructs.
250+
251+
.. math::
252+
\begin{array}{llclll}
253+
\production{custom annotation} & \Tcustomannot &::=&
254+
\text{(@custom}~~\Tstring~~\Tcustomplace^?~~\Tdatastring~~\text{)} \\
255+
\production{custom placement} & \Tcustomplace &::=&
256+
\text{(}~\text{before}~~\text{first}~\text{)} \\ &&|&
257+
\text{(}~\text{before}~~\Tsec~\text{)} \\ &&|&
258+
\text{(}~\text{after}~~\Tsec~\text{)} \\ &&|&
259+
\text{(}~\text{after}~~\text{last}~\text{)} \\
260+
\production{section} & \Tsec &::=&
261+
\text{type} \\ &&|&
262+
\text{import} \\ &&|&
263+
\text{func} \\ &&|&
264+
\text{table} \\ &&|&
265+
\text{memory} \\ &&|&
266+
\text{global} \\ &&|&
267+
\text{export} \\ &&|&
268+
\text{start} \\ &&|&
269+
\text{elem} \\ &&|&
270+
\text{code} \\ &&|&
271+
\text{data} \\
272+
\end{array}
273+
274+
The first :ref:`string <text-string>` in a custom annotation denotes the name of the custom section it represents.
275+
The remaining strings collectively represent the section's payload data, written as a :ref:`data string <text-datastring>`, which can be split up into a possibly empty sequence of individual string literals (similar to :ref:`data segments <text-data>`).
276+
277+
An arbitrary number of custom annotations (even of the same name) may occur in a module,
278+
each defining a separate custom section when converting to :ref:`binary format <binary>`.
279+
Placement of the sections in the binary can be customized via explicit *placement* directives, that position them either directly before or directly after a known section.
280+
The placements :math:`\T{(before~first)}` and :math:`\T{(after~last)}` denote virtual sections before the first and after the last known section, respectively.
281+
When the placement directive is omitted, it defaults to :math:`\T{(after~last)}`.
282+
283+
If multiple placement directives appear for the same position, then the sections are all placed there, in order of their appearance in the text.
284+
For this purpose, the position :math:`\T{after}` a section is considered different from the position :math:`\T{before}` the consecutive section, and the former occurs before the latter.
285+
286+
.. note::
287+
Future versions of WebAssembly may introduce additional sections between others or at the beginning or end of a module.
288+
Using :math:`\T{first}` and :math:`\T{last}` guarantees that placement will still go before or after any future section, respectively.
289+
290+
If a custom section with a specific section id is given as well as annotations representing the same custom section (e.g., :math:`\T{@name}` :ref:`annotations <text-nameannot>` as well as a :math:`\T{@custom}` annotation for a :math:`\T{name}` :ref:`section <binary-namesec>`), then two sections are assumed to be created.
291+
Their relative placement will depend on the placement directive given for the :math:`\T{@custom}` annotation as well as the implicit placement requirements of the custom section, which are applied to the other annotation.
292+
293+
.. note::
294+
295+
For example, the following module,
296+
297+
.. code-block:: none
298+
299+
(module
300+
(@custom "A" "aaa")
301+
(type $t (func))
302+
(@custom "B" (after func) "bbb")
303+
(@custom "C" (before func) "ccc")
304+
(@custom "D" (after last) "ddd")
305+
(table 10 funcref)
306+
(func (type $t))
307+
(@custom "E" (after import) "eee")
308+
(@custom "F" (before type) "fff")
309+
(@custom "G" (after data) "ggg")
310+
(@custom "H" (after code) "hhh")
311+
(@custom "I" (after func) "iii")
312+
(@custom "J" (before func) "jjj")
313+
(@custom "K" (before first) "kkk")
314+
)
315+
316+
will result in the following section ordering:
317+
318+
.. code-block:: none
319+
320+
custom section "K"
321+
custom section "F"
322+
type section
323+
custom section "E"
324+
custom section "C"
325+
custom section "J"
326+
function section
327+
custom section "B"
328+
custom section "I"
329+
table section
330+
code section
331+
custom section "H"
332+
custom section "G"
333+
custom section "A"
334+
custom section "D"

document/core/text/lexical.rst

+32-2
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ The character stream in the source text is divided, from left to right, into a s
5050
(\text{a} ~|~ \dots ~|~ \text{z})~\Tidchar^\ast
5151
\qquad (\mbox{if occurring as a literal terminal in the grammar}) \\
5252
\production{reserved} & \Treserved &::=&
53-
\Tidchar^+ \\
53+
\Tidchar^+ ~|~ \text{,} ~|~ \text{;} ~|~ \text{[} ~|~ \text{]} ~|~ \text{\{} ~|~ \text{\}} \\
5454
\end{array}
5555
5656
Tokens are formed from the input character stream according to the *longest match* rule.
@@ -77,7 +77,7 @@ Any token that does not fall into any of the other categories is considered *res
7777
White Space
7878
~~~~~~~~~~~
7979

80-
*White space* is any sequence of literal space characters, formatting characters, or :ref:`comments <text-comment>`.
80+
*White space* is any sequence of literal space characters, formatting characters, :ref:`comments <text-comment>`, or :ref:`annotations <text-annot>`.
8181
The allowed formatting characters correspond to a subset of the |ASCII|_ *format effectors*, namely, *horizontal tabulation* (:math:`\unicode{09}`), *line feed* (:math:`\unicode{0A}`), and *carriage return* (:math:`\unicode{0D}`).
8282

8383
.. math::
@@ -124,3 +124,33 @@ The *look-ahead* restrictions on the productions for |Tblockchar| disambiguate t
124124

125125
.. note::
126126
Any formatting and control characters are allowed inside comments.
127+
128+
129+
.. index:: ! annotation
130+
single: text format; annotation
131+
.. _text-annot:
132+
133+
Annotations
134+
~~~~~~~~~~~
135+
136+
An *annotation* is a bracketed token sequence headed by an *annotation id* of the form :math:`\T{@id}`.
137+
No :ref:`space <text-space>` is allowed between the opening parenthesis and this id.
138+
Annotations are intended to be used for third-party extensions;
139+
they can appear anywhere in a program but are ignored by the WebAssembly semantics itself, which treats them as :ref:`white space <text-space>`.
140+
141+
Annotations can contain other parenthesized token sequences (including nested annotations), as long as they are well-nested.
142+
:ref:`String literals <text-string>` and :ref:`comments <text-comment>` occurring in an annotation must also be properly nested and closed.
143+
144+
.. math::
145+
\begin{array}{llclll@{\qquad\qquad}l}
146+
\production{annot} & \Tannot &::=&
147+
\text{(@}~\Tidchar^+ ~(\Tspace ~|~ \Ttoken)^\ast~\text{)} \\
148+
\end{array}
149+
150+
.. note::
151+
The annotation id is meant to be an identifier categorising the extension, and plays a role similar to the name of a :ref:`custom section <binary-customsec>`.
152+
By convention, annotations corresponding to a custom section should use the custom section's name as an id.
153+
154+
Implementations are expected to ignore annotations with ids that they do not recognize.
155+
On the other hand, they may impose restrictions on annotations that they do recognize, e.g., requiring a specific structure by superimposing a more concrete grammar.
156+
It is up to an implementation how it deals with errors in such annotations.

document/core/util/macros.def

+22
Original file line numberDiff line numberDiff line change
@@ -550,6 +550,9 @@
550550
.. |Tlinechar| mathdef:: \xref{text/lexical}{text-comment}{\T{linechar}}
551551
.. |Tblockchar| mathdef:: \xref{text/lexical}{text-comment}{\T{blockchar}}
552552

553+
.. |Tannot| mathdef:: \xref{text/lexical}{text-annot}{\T{annot}}
554+
.. |Tannottoken| mathdef:: \xref{text/lexical}{text-annot}{\T{annottoken}}
555+
553556

554557
.. Values, non-terminals
555558

@@ -1028,6 +1031,25 @@
10281031
.. |Blocalnamesubsec| mathdef:: \xref{appendix/custom}{binary-localnamesec}{\B{localnamesubsec}}
10291032

10301033

1034+
.. Annotations
1035+
.. -----------
1036+
1037+
.. Custom annotations, non-terminals
1038+
1039+
.. |Tcustomannot| mathdef:: \xref{appendix/custom}{text-customannot}{\T{customannot}}
1040+
.. |Tcustomplace| mathdef:: \xref{appendix/custom}{text-customannot}{\T{customplace}}
1041+
.. |Tsec| mathdef:: \xref{appendix/custom}{text-customannot}{\T{sec}}
1042+
1043+
1044+
.. Name annotations, non-terminals
1045+
1046+
.. |Tnameannot| mathdef:: \xref{appendix/custom}{text-nameannot}{\T{nameannot}}
1047+
.. |Tmodulenameannot| mathdef:: \xref{appendix/custom}{text-modulenameannot}{\T{modulenameannot}}
1048+
.. |Tfuncnameannot| mathdef:: \xref{appendix/custom}{text-funcnameannot}{\T{funcnameannot}}
1049+
.. |Tparamnameannot| mathdef:: \xref{appendix/custom}{text-paramnameannot}{\T{paramnameannot}}
1050+
.. |Tlocalnameannot| mathdef:: \xref{appendix/custom}{text-localnameannot}{\T{localnameannot}}
1051+
1052+
10311053
.. Embedding
10321054
.. ---------
10331055

interpreter/text/lexer.mll

+34-2
Original file line numberDiff line numberDiff line change
@@ -134,8 +134,11 @@ let float =
134134
| sign? "nan"
135135
| sign? "nan:" "0x" hexnum
136136
let string = '"' character* '"'
137-
let name = '$' (letter | digit | '_' | symbol)+
138-
let reserved = ([^'\"''('')'';'] # space)+ (* hack for table size *)
137+
138+
let id = (letter | digit | '_' | symbol)+
139+
let name = '$' id
140+
141+
let reserved = ';' | ([^'\"''('')'';'] # space)+ (* hack for table size *)
139142

140143
let ixx = "i" ("32" | "64")
141144
let fxx = "f" ("32" | "64")
@@ -353,6 +356,9 @@ rule token = parse
353356

354357
| name as s { VAR s }
355358

359+
| "(@"id { annot (Lexing.lexeme_start_p lexbuf) lexbuf; token lexbuf }
360+
| "(@" { error lexbuf "malformed annotation id" }
361+
356362
| ";;"utf8_no_nl*eof { EOF }
357363
| ";;"utf8_no_nl*'\n' { Lexing.new_line lexbuf; token lexbuf }
358364
| ";;"utf8_no_nl* { token lexbuf (* causes error on following position *) }
@@ -365,6 +371,32 @@ rule token = parse
365371
| utf8 { error lexbuf "malformed operator" }
366372
| _ { error lexbuf "malformed UTF-8 encoding" }
367373

374+
and annot start = parse
375+
| ")" { () }
376+
| "(" { annot (Lexing.lexeme_start_p lexbuf) lexbuf; annot start lexbuf }
377+
378+
| reserved { annot start lexbuf }
379+
| nat { annot start lexbuf }
380+
| int { annot start lexbuf }
381+
| float { annot start lexbuf }
382+
| id { annot start lexbuf }
383+
| name { annot start lexbuf }
384+
| string { annot start lexbuf }
385+
| '"'character*('\n'|eof) { error lexbuf "unclosed string literal" }
386+
| '"'character*['\x00'-'\x09''\x0b'-'\x1f''\x7f']
387+
{ error lexbuf "illegal control character in string literal" }
388+
| '"'character*'\\'_
389+
{ error_nest (Lexing.lexeme_end_p lexbuf) lexbuf "illegal escape" }
390+
391+
| (";;"utf8_no_nl*)? eof { error_nest start lexbuf "unclosed annotation" }
392+
| ";;"utf8_no_nl*'\n' { Lexing.new_line lexbuf; annot start lexbuf }
393+
| ";;"utf8_no_nl* { annot start lexbuf (* error on following position *) }
394+
| "(;" { comment (Lexing.lexeme_start_p lexbuf) lexbuf; annot start lexbuf }
395+
| space#'\n' { annot start lexbuf }
396+
| '\n' { Lexing.new_line lexbuf; annot start lexbuf }
397+
| eof { error_nest start lexbuf "unclosed annotation" }
398+
| _ { error lexbuf "malformed UTF-8 encoding" }
399+
368400
and comment start = parse
369401
| ";)" { () }
370402
| "(;" { comment (Lexing.lexeme_start_p lexbuf) lexbuf; comment start lexbuf }

0 commit comments

Comments
 (0)