Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html5gum does not detect lone surrogates #27

Open
untitaker opened this issue Jan 5, 2022 · 2 comments
Open

html5gum does not detect lone surrogates #27

untitaker opened this issue Jan 5, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@untitaker
Copy link
Owner

untitaker commented Jan 5, 2022

lone surrogates are invalid utf8, so html5gum 0.3.0, which takes &str/String, is not able to handle those.

after merging #25, html5gum will be able to read arbitrary bytes. at this point the expectation might be that lone surrogates produce error tokens, but they do not.

note: lone surrogates have no impact on parsing behavior. only some error tokens are missing from token stream.

@untitaker untitaker added the bug Something isn't working label Jan 5, 2022
@Ygg01
Copy link

Ygg01 commented May 25, 2022

Do you need to worry about lone surrogates? In order to use Vec<u8> you would need to convert it to an encoding via std::str::from_utf8

Only possible issue is if the stream is some non-UTF8 format.

@untitaker
Copy link
Owner Author

html5gum assumes that the incoming bytestream is mostly valid UTF-8. The spec says that lone surrogate should create an error, and currently they don't. I don't need to worry beyond that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants