-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/fuzz: revisit use of shared memory-mapped files for marshaled inputs #48163
Comments
Checking in on this issue as it's labeled a release blocker for Go 1.18. Is there any update? |
@jayconrod do you still think this is a release blocker? Reading it over, it feels more like an optimization that we might not necessarily need right now, but there may be context I'm not seeing here. |
This probably doesn't need to be a release blocker. #48731 is related though, that's mostly the one I'm worried about. |
Unassigning, since I'm leaving. I did benchmark whether there would be a significant slowdown to passing inputs through RPC vs pipes: there wasn't a measurable difference for inputs of ~100 bytes. The marshaling / unmarshaling cost is far higher, so I don't think communication overhead is a reason to implement this or not. Reading entries from corpus files in workers is a good idea regardless. Then the coordinator doesn't need to read them at all. #48731 is my main concern. We need to be able to reconstruct the entry that caused a problem, whether we're fuzzing or minimizing. Writing the entry to shared memory before every call to the fuzz target is too expensive because of the marshaling overhead, so we need to be able to rebuilt it from the input entry and some known sequence of transformations. For minimization, I think we should store the sequence of transformations in shared memory. For normal fuzzing, we already store initial RNG state and a count, and that works well enough. In any case, we do still need shared memory, but we could use a lot less of it since we don't need to store inputs there. |
Change https://golang.org/cl/356229 mentions this issue: |
Change https://go.dev/cl/393660 mentions this issue: |
Currently, the fuzzing coordinator process creates a 100MB temporary file for each worker process. Both the coordinator and worker read and write the file via a shared memory map. The file is currently used for 1) a few "header" fields such as iteration count and PRNG state, and 2) passing marshaled inputs for fuzzing in both directions.
We still need to use shared memory for the header fields. If a worker process terminates unexpectedly, the coordinator needs to be able to reconstruct the input that caused the crash using the initial input (sent from the coordinator), the call count, and the PRNG state.
However, it's not strictly necessary to write marshaled inputs to shared memory. It would be simpler to pass these through the pipes we use for RPCs. If we only supported unmarshaled byte slices, there would be a performance advantage to writing and mutating those directly in shared memory without incurring the cost of marshaling and pipe I/O. Since we need to marshal inputs anyway, it's not clear the extra complexity is worthwhile.
We should investigate the performance difference and pass inputs through pipes if it's not too bad. This would simplify our implementation, and would let us use much smaller shared memory files.
Additionally, for inputs that can be read from files, such as those in testdata or the cache, we can pass file names over pipes instead of reading, and sending that data over pipes. The coordinator doesn't need to hold that data in virtual memory at all.
The text was updated successfully, but these errors were encountered: