-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: general C++ cleanup in node_url.cc #19598
Conversation
- Merge `domain` and `opaque` storage in URL parser: This just simplifies the code a bit, having multiple fields in an union with the same type is usually just overhead. - Add move variant of `URLHost::ToString()`: This helps avoid unnecessary string copy operations, especially since we control the lifetime of `URLHost` objects pretty well. - Use const refs in node_url.cc where appropriate - Remove or reduce overly generous `.reserve()` calls: These would otherwise keep a lot of unused memory lying around. - Return return values instead of unnecessary pointer arguments - Use more common/expressive variable names - Avoid macro use, reduce number of unnecessary JS strings: There’s no reason for `GET`, `GET_AND_SET` and `UTF8STRING` to be macros. Also, `GET` would previously create a JS string instance for each single call, even though the strings it was called with were compile-time constants. - Avoid unnecessary JS casts when the type of a value is known - Avoid (commonly unnecessary) copy for whitespace stripping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thank you for this!
Just wondering, do the simplifications have any tangible effects on the benchmarks? Never mind, didn't see the triggered benchmark CI.
@@ -930,7 +925,7 @@ void URLHost::ParseIPv4Host(const char* input, size_t length, bool* is_ipv4) { | |||
void URLHost::ParseOpaqueHost(const char* input, size_t length) { | |||
CHECK_EQ(type_, HostType::H_FAILED); | |||
std::string output; | |||
output.reserve(length * 3); | |||
output.reserve(length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TimothyGu i’m not sure whether the spec definition says that this must be ASCII even before percent encoding, but either way, I would personally expect it to be ASCII most or all of the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the tests still pass. :)
src/node_url.cc
Outdated
bool uflag = false; | ||
bool atflag = false; // Set when @ has been seen. | ||
bool square_bracket_flag = false; // Set inside of [...] | ||
bool inside_password_flag = false; // Set after a : after an username. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a data point, the URL Standard calls it passwordTokenSeenFlag, but seems like you figured out what it meant on yourself :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TimothyGu Idk, do you want me to change it? I think both of these are okay names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd slightly prefer the name in the spec for better correspondence for future readers.
Yes, and a pretty good one :) But I wanted to wait for the benchmark CI before I started putting numbers on this… ;) |
Benchmark CI finished. The results are as follows (omitting some files with no or statistically insignificant results):
That's some quite substantial speedups!
The only performance decreases look fluky. |
Landed in ae70e2b |
- Merge `domain` and `opaque` storage in URL parser: This just simplifies the code a bit, having multiple fields in an union with the same type is usually just overhead. - Add move variant of `URLHost::ToString()`: This helps avoid unnecessary string copy operations, especially since we control the lifetime of `URLHost` objects pretty well. - Use const refs in node_url.cc where appropriate - Remove or reduce overly generous `.reserve()` calls: These would otherwise keep a lot of unused memory lying around. - Return return values instead of unnecessary pointer arguments - Use more common/expressive variable names - Avoid macro use, reduce number of unnecessary JS strings: There’s no reason for `GET`, `GET_AND_SET` and `UTF8STRING` to be macros. Also, `GET` would previously create a JS string instance for each single call, even though the strings it was called with were compile-time constants. - Avoid unnecessary JS casts when the type of a value is known - Avoid (commonly unnecessary) copy for whitespace stripping PR-URL: #19598 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
- Merge `domain` and `opaque` storage in URL parser: This just simplifies the code a bit, having multiple fields in an union with the same type is usually just overhead. - Add move variant of `URLHost::ToString()`: This helps avoid unnecessary string copy operations, especially since we control the lifetime of `URLHost` objects pretty well. - Use const refs in node_url.cc where appropriate - Remove or reduce overly generous `.reserve()` calls: These would otherwise keep a lot of unused memory lying around. - Return return values instead of unnecessary pointer arguments - Use more common/expressive variable names - Avoid macro use, reduce number of unnecessary JS strings: There’s no reason for `GET`, `GET_AND_SET` and `UTF8STRING` to be macros. Also, `GET` would previously create a JS string instance for each single call, even though the strings it was called with were compile-time constants. - Avoid unnecessary JS casts when the type of a value is known - Avoid (commonly unnecessary) copy for whitespace stripping PR-URL: #19598 Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Merge
domain
andopaque
storage in URL parser:This just simplifies the code a bit, having multiple fields
in an union with the same type is usually just overhead.
Add move variant of
URLHost::ToString()
:This helps avoid unnecessary string copy operations, especially
since we control the lifetime of
URLHost
objects pretty well.Use const refs in node_url.cc where appropriate
Remove or reduce overly generous
.reserve()
calls:These would otherwise keep a lot of unused memory lying around.
Return return values instead of unnecessary pointer arguments
Use more common/expressive variable names
Avoid macro use, reduce number of unnecessary JS strings:
There’s no reason for
GET
,GET_AND_SET
andUTF8STRING
to bemacros. Also,
GET
would previously create a JS string instancefor each single call, even though the strings it was called
with were compile-time constants.
Avoid unnecessary JS casts when the type of a value is known
Avoid (commonly unnecessary) copy for whitespace stripping
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes