Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gb18030 decoder: unwind from fourth byte when it's not a digit #111

Merged
merged 2 commits into from
May 15, 2017

Conversation

annevk
Copy link
Member

@annevk annevk commented May 11, 2017

Instead of always unwinding if there’s no code point when consuming the
fourth byte, only unwind when the fourth byte is not an ASCII digit.
This does mean that ASCII digits can be masked, but since ASCII digits
are not used as delimiter in any format this is highly unlikely to be
used in any attacks (and also matches existing implementations better).

Fixes #110.


Preview | Diff

Instead of always unwinding if there’s no code point when consuming the
fourth byte, only unwind when the fourth byte is not an ASCII digit.
This does mean that ASCII digits can be masked, but since ASCII digits
are not used as delimiter in any format this is highly unlikely to be
used in any attacks (and also matches existing implementations better).

Fixes #110.
@annevk annevk force-pushed the annevk/gb18030-fourth-byte branch from 93e057b to a0c98be Compare May 11, 2017 11:28
@annevk annevk changed the title gb18030 encoder: unwind from fourth byte when it's not a digit gb18030 decoder: unwind from fourth byte when it's not a digit May 11, 2017
@hsivonen
Copy link
Member

Works for me, but, editorially, I'd still prefer 0x30 to 0x39 checked only once and an early return before the range decode if the byte isn't in 0x30 to 0x39 inclusive.

@annevk
Copy link
Member Author

annevk commented May 11, 2017

Rewrote it.

@hsivonen
Copy link
Member

LGTM. Thank you.

@annevk annevk merged commit eedd518 into master May 15, 2017
@annevk annevk deleted the annevk/gb18030-fourth-byte branch May 15, 2017 09:44
annevk added a commit that referenced this pull request Aug 30, 2018
Due to an oversight in #111 the gb18030 decoder didn't always reset state before returning after a four-byte sequence.

Fixes #146.
annevk added a commit that referenced this pull request Aug 30, 2018
Due to an oversight in #111 the gb18030 decoder didn't always reset state before returning after a four-byte sequence.

Fixes #146.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants