Multi-core searching and shrinking #16

silentbicycle · 2017-08-19T14:08:55Z

Now that optional forking per-call is implemented, theft could support using multiple worker processes to concurrently search. For nontrivial property tests, searching for failures and shrinking them should parallelize well -- the main serializing point would be handling cases where multiple workers have concurrently found different failures, or when multiple tactics successfully shrink. For the former, the hashes for the other failures could be pushed back and shrunk immediately after shrinking the first completes (with a check for duplication). For the latter, it should favor the lower tactic ID.

This will need (at least) the following changes:

Making the queue of trial hashes an explicit thing that can be stepped, rather than something implicit in the control flow of theft_run.
Making a queue of hashes to shrink (either saving the instances, or, preferably, reconstructing them on demand) rather than just storing shrinking context in theft_trial_info and local variables in theft_shrink.c.
Restructuring the forking and poll loop in theft_call.c so that it checks whether it should fork a new worker, whether there is output from any worker(s), whether any have terminated, and whether it is searching or shrinking. This should be modeled as a state machine. Ideally, this design will cleanly account for when forking isn't used at all.
Capture and buffer stdout from each worker, so their output is not interleaved. (stderr could be buffered just to prefix with a pid and print immediately, or left alone.)
Add a command protocol, over a pipe. This will probably be used for setting failure IDs, exiting with a particular theft_trial_res status, and a few other things. This should be modeled as a distributed state machine, with per-worker context.
Clarifying which hooks are called from which process.

On the upside, there would not be any shared mutable state -- just shared state with copy-on-write -- so all decisions could be kept in the main process. It wouldn't need to coordinate concurrent updates to the bloom filter or queues.

The text was updated successfully, but these errors were encountered:

silentbicycle · 2017-08-19T14:54:21Z

The property functions will run on worker processes, but type_info callbacks will need to run on the main process due to COW. This could be a bottleneck, particularly in cases where the property function itself is relatively quick.

Aside from user callbacks, the theft runner spends virtually all of its time in the random number generator. If that ends up being a significant bottleneck when running with several workers, it may be worth experimenting with moving the RNG out of the main process.

silentbicycle added the enhancement label Aug 19, 2017

silentbicycle self-assigned this Aug 19, 2017

silentbicycle added this to the v0.5.0 milestone Oct 8, 2017

silentbicycle mentioned this issue Oct 8, 2017

expect_exit #26

Open

silentbicycle added the multicore label Jan 6, 2018

silentbicycle mentioned this issue Mar 28, 2018

xoroshiro128+ seeding problems, was: The first call to theft_random_choice(t, 4) returns 2 most of the time. #39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-core searching and shrinking #16

Multi-core searching and shrinking #16

silentbicycle commented Aug 19, 2017 •

edited

Loading

silentbicycle commented Aug 19, 2017

Multi-core searching and shrinking #16

Multi-core searching and shrinking #16

Comments

silentbicycle commented Aug 19, 2017 • edited Loading

silentbicycle commented Aug 19, 2017

silentbicycle commented Aug 19, 2017 •

edited

Loading