-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve performance and reduce allocations of UUID methods #96
Conversation
c3c8d45
to
5f181ae
Compare
Codecov ReportBase: 100.00% // Head: 100.00% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## master #96 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 4 4
Lines 411 448 +37
=========================================
+ Hits 411 448 +37
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@cameracker @theckman I see you've both recently contributed to and reviewed PRs for this repo. Are you the right people to ping for a review here? |
Yep. I may have some time this coming weekend to go over this. Not sure about Cameron. |
Thanks the for quick response and no hurries on the review, I just wanted to make sure someone was aware of this PR. |
@theckman any chance you have time for a review and if not is there someone else I should ping? Thanks! |
b468ef4
to
4665a99
Compare
Bump, this is a great improvement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functional changes seem reasonable and the benchmark improvements are definitely a win. I had some non-functional feedback, though.
|
||
// Parse parses the UUID stored in the string text. Parsing and supported | ||
// formats are the same as UnmarshalText. | ||
func (u *UUID) Parse(s string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make Parse()
delegate to UnmarshalText()
or vice versa? The conversion from string
to []byte
or []byte
to string
is almost always optimized out by the compiler and this is non-trivial code to be duplicating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion from string to []byte or []byte to string is almost always optimized out by the compiler and this is non-trivial code to be duplicating.
In some cases this is true and the runtime can stack allocate or use a temp buffer for the string
to []byte
conversion (see: stringtoslicebyte()
), but that is not true here since ([]byte)(s)
escapes to heap (you can check this by making the suggested change and running go build -i -gcflags='-m'
).
TLDR: delegating to UnmarshalText
would require an allocation and a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, delegating to UnmarshalText
could be DOS vector if this was used to parse untrusted input (a malicious actor could pass very large "UUID" causing the application to allocate large amounts of memory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that this is a fairly complex block of code that's identical to UnmarshalText()
other than one processes string
and the other []byte
. It would be better in the long run to not have the duplication. What's the result if UnmarshalText()
delegates to Parse()
instead?
Also, since the logic is the same, any DOS vector in UnmarshalText()
exists here as well. Looking at the code, though, neither path allocates memory based on the input. Each loops over 32 characters of the input data after cleaning up "decorations" like braces, etc. and writes to corresponding locations in the already-allocated [16]byte
that is the underlying value of the UUID
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that this is a fairly complex block of code that's identical to UnmarshalText() other than one processes string and the other []byte. It would be better in the long run to not have the duplication.
I don't like the duplication either, but this is the only way to do this without allocating. The fact that we have 100% test coverage considerably alleviates these concerns IMO. In the long run this could be replaced by a generic function that takes both strings and byte slices, but that should only be done once Go 1.17 is no longer in wide use.
What's the result if UnmarshalText() delegates to Parse() instead?
Same result and same issues around creating a copy of untrusted input.
Also, since the logic is the same, any DOS vector in UnmarshalText() exists here as well.
The DOS vector only exists if we delegate the call to UnmarshalText/Parse
from Parse/UnmarshalText
and have to allocate to create a copy of untrusted input.
@@ -90,7 +91,7 @@ type fromStringTest struct { | |||
} | |||
|
|||
// Run runs the FromString test in a subtest of t, named by fst.variant. | |||
func (fst fromStringTest) Run(t *testing.T) { | |||
func (fst fromStringTest) TestFromString(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this method renamed? The convention with these tests is that each one has a Run()
method that does the needful.
It would be preferable to add a 2nd t.Run()
than rename the method and add a separate TestUnmarshalText()
method with a single t.Run()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here, but the method was renamed because I added TestUnmarshalText
to the fromStringTest
type (and IMHO neither method should exist and the logic should actually live in the top-level test functions).
Since we can't have two Run()
methods, we are testing two unique code paths, and duplicating this type and it's test cases (just to have a Run()
method) seems like an undue maintenance burden. I thought it would be better to add the TestUnmarshalText
method and rename Run
to TestFromString
.
It would be preferable to add a 2nd t.Run() than rename the method and add a separate TestUnmarshalText() method with a single t.Run().
IMHO, invoking the UnmarshalText
tests separately from the top-level TestUnmarshalText
function makes for cleaner test output (e.g. TestUnmarshalText/Valid/Canonical
. If we had two Run
methods then the output (depending on how its implemented) would look something like:
TestFromString
TestFromString/Valid
TestFromString/Valid/Canonical
...
# UnmarshalText (test names are repeated)
TestFromString
TestFromString/Valid
TestFromString/Valid/Canonical
Or if we name the subtest "UnmarshalText" it'll look like:
TestFromString
TestFromString/Valid
TestFromString/Valid/Canonical
...
TestFromString/UnmarshalText
TestFromString/UnmarshalText/Valid
TestFromString/UnmarshalTextValid/Canonical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I don't have a strong opinion either. My comment was mostly about sticking to the existing conventions, which pre-dated me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've convinced me. After another pass over the code I agree with the change, if for slightly different reasons (fromStringTest
is the only type here with a Run()
method and it's there to wrap executing the batch of tests. we're doing 2 batches so 2 names does make sense).
@@ -83,27 +83,30 @@ func (u *NullUUID) Scan(src interface{}) error { | |||
return u.UUID.Scan(src) | |||
} | |||
|
|||
var nullJSON = []byte("null") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why declare a package-level variable that's only referenced once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullJSON
is declared here so that NullUUID.MarshalJSON()
does not allocated when it nil/invalid. This is common practice: encoding/json/decode.go#L602
.
|
||
if err := json.Unmarshal(b, &u.UUID); err != nil { | ||
return err | ||
if n := len(b); n >= 2 && b[0] == '"' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this removing leading and trailing quotes? If so, it's quite unintuitive to my eye.
bytes.Trim(b, "\"")
is functionally equivalent and would be much less "clever" (as in wouldn't require an explanation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes.Trim(b, "\"")
would make invalid inputs that have either extraneous quotes or a quote on only one side valid since it would transform ""UUID""
to UUID
and "UUID
to UUID
.
} | ||
n := NullUUID{UUID: u, Valid: true} | ||
for i := 0; i < b.N; i++ { | ||
n.MarshalJSON() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarking gotcha: the compiler is allowed to optimize this call away since nothing is using the return values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that is true. This benchmark takes 72ns so it's clearly calling MarshalJSON
and the Go standard library rarely use the result strings/strings_test.go#L332-L350)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't suggesting that it's not calling the method, only that it's possible that the entire call could be eliminated by some future release of the compiler and that's 100% allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but it would be a breaking change since it would invalidate a large number of existing benchmarks. So if introduced it would likely exempt benchmarks (and tests) from any such optimization (you have to do the same when writing a benchmark suite in C/C++ where the compilers are much more aggressive #pragma GCC ...
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm basing this on the A note on compiler optimizations section of this blog from Dave Cheney. The compiler choosing to eliminate that call is within the spec and, since the toolchain isn't covered by the Go1 compatibility guarantee, it wouldn't even be considered breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you're entirely correct the spec does not cover the toolchain and TIL in some cases the compiler will elide the function call unless the result it used (I checked this using a simple square(x int) int
function, which being "pure" in the C sense is probably why the compiler decided it could be elided - though interestingly it does not elide the loop).
I'm happy to change this to assign to a dummy variable.
4665a99
to
80d0618
Compare
This commit improves the performance and reduces the number of allocs of most UUID methods and adds a new UUID.Parse() method for parsing string encoded UUIDs. Parsing string encoded UUIDs is now 2x faster and no longer allocates. The NullUUID MarshalJSON() and UnmarshalJSON() methods have also been improved and no longer call out to json.Unmarshal. The UUID.Format method has been improved for common cases. Benchmark results: ``` goos: linux goarch: amd64 pkg: github.com/gofrs/uuid cpu: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz name old time/op new time/op delta UnmarshalText/canonical-16 47.7ns ± 0% 23.1ns ± 1% -51.56% (p=0.000 n=8+10) UnmarshalText/urn-16 47.8ns ± 1% 23.0ns ± 1% -51.85% (p=0.000 n=10+10) UnmarshalText/braced-16 47.8ns ± 1% 23.1ns ± 1% -51.66% (p=0.000 n=9+10) ParseV4-16 86.5ns ±10% 23.0ns ± 5% -73.44% (p=0.000 n=9+10) NullMarshalJSON/Valid-16 308ns ±21% 64ns ±23% -79.18% (p=0.000 n=10+10) NullMarshalJSON/Invalid-16 41.8ns ± 2% 1.5ns ± 4% -96.51% (p=0.000 n=10+10) Format/s-16 151ns ± 3% 132ns ± 3% -12.65% (p=0.000 n=10+10) Format/S-16 305ns ± 2% 157ns ± 4% -48.70% (p=0.000 n=10+10) Format/q-16 217ns ±12% 139ns ± 6% -35.71% (p=0.000 n=10+10) Format/x-16 170ns ±13% 120ns ± 2% -29.41% (p=0.000 n=10+10) Format/X-16 324ns ±11% 142ns ± 0% -56.24% (p=0.000 n=10+10) Format/v-16 156ns ± 4% 143ns ± 9% -8.51% (p=0.000 n=10+10) Format/+v-16 155ns ± 3% 141ns ± 8% -9.11% (p=0.000 n=10+10) Format/#v-16 894ns ± 1% 860ns ± 0% -3.84% (p=0.000 n=10+10) String-16 70.1ns ±36% 71.9ns ±22% ~ (p=0.393 n=10+10) FromBytes-16 1.81ns ± 0% 1.82ns ± 2% ~ (p=0.369 n=8+10) FromString/canonical-16 94.3ns ±20% 22.1ns ± 0% -76.53% (p=0.000 n=10+10) FromString/urn-16 93.7ns ±11% 22.4ns ± 0% -76.07% (p=0.000 n=10+10) FromString/braced-16 87.1ns ± 5% 22.6ns ± 0% -74.04% (p=0.000 n=9+10) MarshalBinary-16 0.20ns ± 3% 0.20ns ± 1% ~ (p=0.515 n=10+10) MarshalText-16 115ns ±25% 19ns ± 3% -83.09% (p=0.000 n=10+9) name old alloc/op new alloc/op delta UnmarshalText/canonical-16 0.00B 0.00B ~ (all equal) UnmarshalText/urn-16 0.00B 0.00B ~ (all equal) UnmarshalText/braced-16 0.00B 0.00B ~ (all equal) ParseV4-16 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) NullMarshalJSON/Valid-16 160B ± 0% 48B ± 0% -70.00% (p=0.000 n=10+10) NullMarshalJSON/Invalid-16 8.00B ± 0% 0.00B -100.00% (p=0.000 n=10+10) Format/s-16 48.0B ± 0% 48.0B ± 0% ~ (all equal) Format/S-16 96.0B ± 0% 48.0B ± 0% -50.00% (p=0.000 n=10+10) Format/q-16 96.0B ± 0% 48.0B ± 0% -50.00% (p=0.000 n=10+10) Format/x-16 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) Format/X-16 112B ± 0% 32B ± 0% -71.43% (p=0.000 n=10+10) Format/v-16 48.0B ± 0% 48.0B ± 0% ~ (all equal) Format/+v-16 48.0B ± 0% 48.0B ± 0% ~ (all equal) Format/#v-16 128B ± 0% 16B ± 0% -87.50% (p=0.000 n=10+10) String-16 48.0B ± 0% 48.0B ± 0% ~ (all equal) FromBytes-16 0.00B 0.00B ~ (all equal) FromString/canonical-16 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) FromString/urn-16 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) FromString/braced-16 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) MarshalBinary-16 0.00B 0.00B ~ (all equal) MarshalText-16 96.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta UnmarshalText/canonical-16 0.00 0.00 ~ (all equal) UnmarshalText/urn-16 0.00 0.00 ~ (all equal) UnmarshalText/braced-16 0.00 0.00 ~ (all equal) ParseV4-16 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) NullMarshalJSON/Valid-16 4.00 ± 0% 1.00 ± 0% -75.00% (p=0.000 n=10+10) NullMarshalJSON/Invalid-16 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) Format/s-16 1.00 ± 0% 1.00 ± 0% ~ (all equal) Format/S-16 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Format/q-16 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Format/x-16 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Format/X-16 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10) Format/v-16 1.00 ± 0% 1.00 ± 0% ~ (all equal) Format/+v-16 1.00 ± 0% 1.00 ± 0% ~ (all equal) Format/#v-16 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) String-16 1.00 ± 0% 1.00 ± 0% ~ (all equal) FromBytes-16 0.00 0.00 ~ (all equal) FromString/canonical-16 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) FromString/urn-16 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) FromString/braced-16 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) MarshalBinary-16 0.00 0.00 ~ (all equal) MarshalText-16 2.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) ```
80d0618
to
2a97ddf
Compare
@dylan-bourque if everything looks good, could I get another approval from @LeonanCarvalho or any other maintaner? |
I'm not a maintainer, I'm just an enthusiast, your changes to looks good to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Apologies for nit picking but I'm a big proponent of clear over clever and these kinds of optimizations have a strong tendency to lose clarity.
@@ -90,7 +91,7 @@ type fromStringTest struct { | |||
} | |||
|
|||
// Run runs the FromString test in a subtest of t, named by fst.variant. | |||
func (fst fromStringTest) Run(t *testing.T) { | |||
func (fst fromStringTest) TestFromString(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've convinced me. After another pass over the code I agree with the change, if for slightly different reasons (fromStringTest
is the only type here with a Run()
method and it's there to wrap executing the batch of tests. we're doing 2 batches so 2 names does make sense).
@cameracker (or anyone else with write access) could I get this reviewed/merged? |
@charlievieth I'll give it a look over today with the plan to merge. Thanks so much <3 |
This commit improves the performance and reduces the number of allocs of
most UUID methods and adds a new UUID.Parse() method for parsing string
encoded UUIDs.
Parsing string encoded UUIDs is now 2x faster and no longer allocates.
The NullUUID MarshalJSON() and UnmarshalJSON() methods have also been
improved and no longer call out to json.Unmarshal. The UUID.Format
method has been improved for common cases.
Benchmark results: