-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: Faster getenvW
and a standalone environment variable test
#23272
base: master
Are you sure you want to change the base?
Conversation
getenvW
and standalone environment variable testgetenvW
and a standalone environment variable test
976ece0
to
ebbf214
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice tests! I've got a bunch of comments, but they're mostly suggestions for more work, so feel free to ignore them as this seems like an improvement to me as-is.
Oh, one more suggestion: maybe expand the comment on getenvW
to say that it does linear searches and if the caller is going to frequently fetch environment variables, it may be worthwhile to use std.process.getEnvMap
to be able to query for them more efficiently?
My hunch is that you'd have to be querying them absurdly frequently before it starts mattering, and I'm assuming the lack of allocation in |
…uired This code previously added 4 NUL code units, but that was likely due to a misinterpretation of this part of the CreateProcess documentation: > A Unicode environment block is terminated by four zero bytes: two for the last string, two more to terminate the block. (four zero *bytes* means *two* zero code units) Additionally, the second zero code unit is only actually needed when the environment is empty due to a quirk of the CreateProcess implementation. In the case of a non-empty environment, there always ends up being two trailing NUL code units since one will come after the last environment variable in the block.
ebbf214
to
90fc9ce
Compare
Looks great to me! |
Tests all environment variable APIs in std.process
Both sliceTo and indexOfScalarPos use SIMD when available to speed up the search. On my x86_64 machine, this leads to getenvW being around 2-3x faster overall. Additionally, any future improvements to sliceTo/indexOfScalarPos will benefit getenvW.
90fc9ce
to
66dcebc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems okay from a quick glance. Feel free to merge when you're happy with it.
Inspired by #23265, I thought I'd try applying the same strategy to the Windows implementation. The strategy itself didn't end up having the same benefit, but optimizations were still found while exploring it.
Also adds a standalone test to make sure the functionality remains the same.
Both sliceTo and indexOfScalarPos use SIMD when available to speed up the search. On my x86_64 machine, this leads to getenvW being around 2-3x faster overall.
Additionally, any future improvements to sliceTo/indexOfScalarPos will benefit getenvW.
Benchmark code
(presumably, the magnitude of any speedup would also (on average) increase as the number of environment variables in the environment increases)
Initially, I tried using the same early return loop strategy as #23265, but later realized that the switch to using
sliceTo
to find theNUL
terminator was accounting for pretty much all of the speed gains. The early return loop version can be found here:squeek502@573e707
The version currently in this PR is slightly faster and easier to understand IMO, so that's what I went with.
benchmark results comparing the implementations
benchenv-sliceto-indexof.exe
is the implementation in this PR currently,benchenv-loop.exe
is squeek502@573e707