-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize testee memory allocation #215
Conversation
After this, https://go-review.googlesource.com/c/go/+/164960 and https://go-review.googlesource.com/c/go/+/164961 will also help. |
go-fuzz/testee.go
Outdated
@@ -182,22 +182,51 @@ retry: | |||
// This goroutine also collects crash output. | |||
ticker := time.NewTicker(time.Second) | |||
const N = 1 << 20 | |||
data := make([]byte, N) | |||
buf := make([]byte, 4096) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are saying this does not allocate? That's because... we store stdoutPipe as concrete type rather than interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this should cause stack growth and copy, right? As far as I remember goroutines used to start with 512b stack.
Stacks are not garbage collected, but that still can produce some overhead.
Could you test a fuzz function that does not produce any output with this change versus replacing this whole goroutine with t.outputC <- nil
? If that shows any noticeable difference then we could start with a small read buffer as well.
go-fuzz/testee.go
Outdated
data = make([]byte, N) | ||
copy(data, tmp[:filled]) | ||
} | ||
copy(data[filled:], buf[:n]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could avoid copying the data as well, but I am not sure it matters (if the main intention is to avoid the "no output" case).
Nice! I like efficient standard library! |
Have you considered plumbing sync.Pool somewhere here? Also, potentially we could tune the 1MB buffer size and 10K iters restarts. These were chosen pretty much by a fair dice roll very early in the development. There is also another interesting option of not collecting output at all... unless the binary has crashed. Just as we run sonar binary only when we actually need its results, we could collect output only when triaging/minimizing a crash. Another interesting possibility is to enlarge pipe buffer then just read the buffer size with ioctl(FIONREAD) and only if we see that the buffer is close to full, we start doing anything. Otherwise we can simply keep it in the pipe itself. That may have some portability issues, though. |
I didn't use sync.Pool here because it tends to not work well for managing a few large allocations (where large is measured relative to heap size). Discussion at golang/go#22950. Ideally we could fix golang/go#29696 and golang/go#18802 and then build a sync.Pool replacement that was better suited to that kind of use. :) (Also interesting is that the real cost here isn't so much the allocation as the memclr.) However, re-reading your comments, it occurs to me that we can just hang a giant buffer off of I'll try that approach and report back.
If the I do plan to revisit the 10k iters. I want to do two things:
|
These are small, but they get used a lot. Also, binary.Read and binary.Write go through reflect, which allocates. Not a lot, but this is very high volume.
For a fast-running Fuzz function, we might restart a testee multiple times a second. This giant allocation gets expensive in that scenario. It is also a significant slowdown once go-fuzz starts finding crashers. Makes testee restarts cheaper is also an important step towards supporting Fuzz functions that are not idempotent, e.g. because they are hacked into an existing large code base, such as cmd/compile. Instead of allocating 1mb on every restart, allocate a single re-usable buffer up front. We don't even need to clear it between runs, which makes this particularly cheap and effective.
The io.Reader contract specifies that Read may return n > 0 and err != nil, and that in that scenario, the caller should process those n bytes. This is rare enough that it probably wasn't causing an actual problem. Fix it nevertheless; one fewer latent bug.
67d1929
to
77a1f72
Compare
Done. Much nicer. Thanks for pushing back.
Future work. :) |
@@ -184,8 +190,7 @@ retry: | |||
// deadlock we periodically read out stdout. | |||
// This goroutine also collects crash output. | |||
ticker := time.NewTicker(time.Second) | |||
const N = 1 << 20 | |||
data := make([]byte, N) | |||
data := buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eeee
this buffer is returned from TestBinary.test and then leaked to crasherQueue
we can't just do this
we can do something like this, but we can reuse the buffer only on periodic testee restart, on crashes we can either handoff the buffer to caller (assuming crashes are not so frequent, or if they are perf does not matter) or make a copy of the used part of the buffer and then reuse the original buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize, I don't follow. A few lines down, around line 215, I see:
trimmed := make([]byte, filled)
copy(trimmed, data)
t.outputC <- trimmed
So it seems like data
/buffer
doesn't leak. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right!
I assumed we send the buffer to the caller as-is. I guess I did it to not pin the large buffer... perhaps...
As a data point, before this PR, fuzzing github.com/dvyukov/go-fuzz-corpus/png maxes out on my machine at around 16k execs/sec. After this PR, that increases to about 26k execs/sec.
Please review carefully; the "avoid the giant buffer" commit in particular is subtle.