-
-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: update gumbo utf8 decode #2735
Conversation
from https://bjoern.hoehrmann.de/utf-8/decoder/dfa/ which apparently saves a shift instruction for every byte. benchmarking doesn't find a discernible performance improvement, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be merged. This moves from the slightly more optimized version to the slightly less optimized version.
static inline uint32_t decode(uint32_t* state, uint32_t* codep, uint32_t byte) { | ||
uint32_t static inline | ||
decode(uint32_t* state, uint32_t* codep, uint32_t byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to put static inline
between the return type and the function name. I don't think any of the other functions do that.
*codep = | ||
(*state != UTF8_ACCEPT) | ||
? (byte & 0x3fu) | (*codep << 6) | ||
: (0xff >> type) & (byte); | ||
*codep = (*state != UTF8_ACCEPT) ? | ||
(byte & 0x3fu) | (*codep << 6) : | ||
(0xff >> type) & (byte); | ||
|
||
*state = utf8d[256 + *state + type]; | ||
*state = utf8d[256 + *state*16 + type]; | ||
return *state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the older, slower code. The *16
is the shift that Rich Felker eliminated by pre-multiplying
On 24th June 2010 Rich Felker pointed out that the state values in the transition table can be pre-multiplied with 16 which would save a shift instruction for every byte. D'oh! We actually just need 12 and can throw away the filler values previously in the table making the table 36 bytes shorter and save the shift in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, that's confusing. OK, will close.
What problem is this PR intended to solve?
Related to #2722
update gumbo utf8 decode() from https://bjoern.hoehrmann.de/utf-8/decoder/dfa/ which apparently saves a shift instruction for every byte. benchmarking doesn't find a discernible performance improvement, though.
Have you included adequate test coverage?
N/A
Does this change affect the behavior of either the C or the Java implementations?
N/A