New process architecture for benchexec in container mode #875

PhilippWendler · 2022-11-07T12:49:17Z

Currently, the architecture of benchexec in the localexecution.py module is such that it starts a number of worker threads depending on --numOfThreads and each of the worker threads repeatedly executes runs using RunExecutor. In container mode, execution of a run involves calling clone() to create a subprocess.

This architecture has several problems:

Glibc's clone() is not safely usable in processes with more than one thread and can produce deadlocks (BenchExec subprocess hangs in __malloc_fork_lock_parent #656). We currently have a workaround for this, but it is not a full solution.
Since Python 3.8, the API documentation requests that subprocesses are only created from the main thread, not from worker threads as we do. We have not encountered problems related to this so far, but it could happen.
With a high (3-digit) number of threads we have not seen the expected throughput. While no profiling has been performed yet it is plausible that this is caused by the fact that all the pre- and postprocessing of runs (e.g., log analysis, writing results) is performed in Python threads, of which only one can be active at the same time due to the GIL.

(Note that all these problems do not affect users of runexec / containerexec, where the run execution is started from a single-threaded process.)

So in the long term it would probably be good to change this architecture. There at least two potential solutions:

Switch from worker threads to worker processes just like the multiprocessing module. Each worker process would be single threaded.
Have one designated (single-threaded) subprocess that is created in the beginning and whose sole responsibility is to spawn all further subprocesses on request. (Android uses this and calls it the Zygote process.)

Instead of clone() one can also use unshare() and os.fork() for creating a container, which should be safer, but due to the way how unshare() works with PID namespaces this would involve yet another process per run and probably complicate process handling even more than any of the other alternatives.

Things to consider:

Whether and how this affects and works for cases where benchexec is not called as a command-line tool, but executed as part of a larger Python program (that may have created threads before benchexec is even loaded).
The subprocess that we start for each run needs to be cloned from a process that already has all the required modules loaded, because this process is inside the container for the run and might not have access to the Python interpreter's files on disk.
How communication is possible with the process that hosts the tool-info module if more than one worker process needs to communicate, or if each worker process should also get its own separate process with an instance of the tool-info module.
The fact that preprocessing, actual run execution, and postprocessing is serialized within each worker thread and there is no overlap (i.e., the next run is not already being executed while a previous run is postprocessed) is by design. Otherwise we would have to reserve some cores for the postprocessing threads, which would lead to asymmetric core assignments. However, the fact that postprocessings of parallel threads compete for the GIL is not desired.

The text was updated successfully, but these errors were encountered:

PhilippWendler added the container related to container mode label Nov 7, 2022

PhilippWendler mentioned this issue Nov 7, 2022

BenchExec subprocess hangs in __malloc_fork_lock_parent #656

Open

PhilippWendler mentioned this issue Feb 13, 2024

Add tool_pid argument to parent_setup_fn #990

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New process architecture for benchexec in container mode #875

New process architecture for benchexec in container mode #875

PhilippWendler commented Nov 7, 2022

New process architecture for benchexec in container mode #875

New process architecture for benchexec in container mode #875

Comments

PhilippWendler commented Nov 7, 2022