Test that invalid UTF-8 byte sequences are rejected. #450

sunfishcode · 2017-03-30T19:17:23Z

WebAssembly/design#1016 in the design repository requires implementations to validate that import/export names are UTF-8. This PR contains test coverage for this feature only.

The tests may need to be modified depending on whether UTF-8 validation is implemented as a syntactic constraint or not, or other details, however they should offer a good starting point.

… names too.

rossberg · 2017-04-04T09:36:15Z

Looks good to me as far as the test cases go. However, on reflection it is most natural that these will be considered decoding errors, and that the AST itself represents names as vectors of code points. As you suspect, that implies that in the text format, errors will have to manifest themselves as immediate syntax errors. So to make the textual tests (the import/export ones) work you'll either need to turn them into individual .fail. files, or (preferably) express them in binary format like the custom section test, whose decoding happens separately from parsing.

But before the spec interpreter implements the UTF-8 restriction, these tests break CI on travis either way (and potentially downstream users running the test suite on waterfalls). For that reason alone I would suggest holding off landing until the spec has caught up. I'll try to get to it later this week.

rossberg · 2017-04-05T12:42:26Z

See #454 for the interpreter implementation. After having implemented it I noticed that a few cases aren't tested in this PR:

non-scalar code points (U+D800-U+DFFF)
out-of-range code points (larger than U+10FFFF)
overlong encodings (using more bytes than necessary)

It would be great if you could include a few tests for those as well.

kmiller68 · 2017-04-05T12:52:36Z

I may have missed it but it looks like there are no tests ensuring UTF-8s with a BOM fail to parse. It seems like that would be a good test too.

As a nit it might also be useful to put roughly why the UTF-8 should fail to validate on each of the tests.

sunfishcode · 2017-05-04T15:12:59Z

This is superseded by #468, which I believe addresses all the feedback here.

Test that invalid UTF-8 byte sequences are rejected.

ee6a07f

sunfishcode force-pushed the validate-utf8 branch from a27b23f to ee6a07f Compare March 30, 2017 19:23

Test that invalid UTF-8 byte sequences are rejected in custom section…

4cc3010

… names too.

rossberg mentioned this pull request Apr 5, 2017

Represent all names in UTF-8 #454

Merged

sunfishcode mentioned this pull request May 4, 2017

Test UTF-8 identifier validation. #468

Closed

sunfishcode closed this May 4, 2017

sunfishcode deleted the validate-utf8 branch May 5, 2017 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test that invalid UTF-8 byte sequences are rejected. #450

Test that invalid UTF-8 byte sequences are rejected. #450

sunfishcode commented Mar 30, 2017

rossberg commented Apr 4, 2017

rossberg commented Apr 5, 2017 •

edited

Loading

kmiller68 commented Apr 5, 2017

sunfishcode commented May 4, 2017

Test that invalid UTF-8 byte sequences are rejected. #450

Test that invalid UTF-8 byte sequences are rejected. #450

Conversation

sunfishcode commented Mar 30, 2017

rossberg commented Apr 4, 2017

rossberg commented Apr 5, 2017 • edited Loading

kmiller68 commented Apr 5, 2017

sunfishcode commented May 4, 2017

rossberg commented Apr 5, 2017 •

edited

Loading