In addition to single actor train, it supports distributed reinforcement learning (synchronous and asynchronous both). To implement distributed reinforcement learning, we use ray(In particular, to allow actors to interact in parallel) and multiprocessing. See the flowchart and timeline for each script(single, sync and async). Flowchart shows the flow of data between processes between components. Timeline shows the work progress and data communication between processes.
In a single actor train script, there is main process and manage process. In the main process, a single agent interacts with env to collect transition data and trains network from it. In the manage process, evaluates with the latest network to get a score, and records this score and the results of training in the main process.
Sync distributed train script also has main process and manage process. In the main process, multiple actors interact in parallel at the same time to collect transition data and learner trains model from it. In the manage process, evaluates with the latest model to get a score, and records this score and the results of training in the main process.
Async distributed train script has interact process, main process and manage process. In the interact process, multiple actors interact in parallel to collect transition data. Unlike the sync distributed train script, each actor interacts asynchronously. More specifically, in the async distributed train script, when actors interact, data is transferred only for actors that have completed within a specific time. In the main process, the learner trains the model through the transition data. In the manage process, evaluates with the latest model to get a score, and records this score and the results of training in the main process.
reference: manager/distributed_manager.py, process