-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: freedefer performance oddity #18923
Comments
My assembly is rusty, but this seems suspicious (from
|
Looks like that's a call to duffzero. The xorps is just zeroing a source register for duffzero to use. |
I don't think xorps is guilty. Slow case takes more stack space and has extra call to duffcopy. I suspect, that there is some extra copying going on. |
@randall77, ping. /cc @josharian |
I cannot reproduce this problem on either linux or darwin. On a simple tight defer benchmark I get about 10% of the time in freedefer, and 0 samples of that are on the XORPS. That all seems normal to me. I think to make any more progress we're going to need some way to reproduce the issue. Here's my repro attempt:
I'm going to punt this to 1.10. |
This still reproduces for me on both go1.8.3 and tip (b0d592c) on Darwin. This profile is from tip:
Is there some other debug data I can provide? I can also provide the exact steps I'm performing. This might be a clue,
|
Exact steps never hurt. Sounds like the way forward (for 1.10) is probably just to replace with explicit zeroing. I also see that I think that whether time is spent in typememclr vs duffzero depends on what's happening with the garbage collector at the time. Perhaps the XORPS mystery is just due to event skid? |
Perhaps related to the recent 332719f by @ianlancetaylor. |
The line has been replaced by explicit zeroing for other reasons in https://golang.org/cl/64070. I'll close this but feel free to reopen if you want to continue to investigate the general issue. |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?What did you do?
While profiling cockroachdb I noticed
runtime.freedefer
consuming a surprising amount of time (this is from a 30s profile and, yes, we make a lot of cgo calls):Examining where the time is going within
freedefer
shows:_defer
is a simple structure of 7 fields. How is clearing the structure possibly taking that long? As an experiment, I tweaked this code to "manually" clear each field:With this change
freedefer
consumes 110ms of time for the exact same workload.Is this a real problem or is there some sort of profile oddity going on that is pointing blame at the
*d = _defer{}
line incorrectly? Seems like something real as the above change produces a small improvement onBenchmarkDefer
:The above diff is still doing too much work as many of the fields are already clear or will be overwritten by the caller of
newdefer
:Which results in:
Despite the repeatability of the above I'm still dubious about this change as I don't have any explanation for why it makes a difference.
The text was updated successfully, but these errors were encountered: