[SIGCOMM 24] RedTE: A MARL-based distributed traffic engineering system, with a control loop latency of < 100𝑚s, while achieving performance comparable to centralized TE systems. RedTE's innovation is the modeling of TE as a distributed cooperative multi-agent problem, and we design a novel multi-agent deep reinforcement learning algorithm to solve it, which enables each agent to make globally informed decisions solely based on local information.
For more details, please refer to our paper from ACM SIGCOMM'24.
Fei Gui, Songtao Wang, Dan Li, Li Chen, Kaihui Gao, Congcong Min, Yi Wang, "RedTE: Mitigating Subsecond Traffic Bursts with Real-time and Distributed Traffic Engineering", ACM SIGCOMM 2024, Sydney, Australia.
Choose topologies: GEANT (23, 36) and Abi (12, 15).
How to choose:
When changing the topology, simply modify the ${topoName}
in the training (train.sh) and inference (valid.sh) scripts.
Run the command:
bash train.sh # (train.sh will loop call run_train.sh)
-
Run in the background.
-
Log information from the run is stored in
../train_abi_log/
. -
Intermediate training results (performance ratio) are saved in the folder
../log/log/hyper1-hyper2-hyper3..-hyperx
, controlled by the--stamp_type
parameter in run_train.sh.
Run the command:
bash valid.sh # (valid.sh will continuously loop run_valid.sh)
In addition to the parameters used in training, an extra parameter ckpt_idx
will be introduced to traverse all checkpoints for each set of parameters.
Test performance results are saved in ../DRLTE/log/validRes/
, controlled by the --stamp_type
parameter in run_test.sh.
Additionally, test_epoch=1
and test_episode=500
are used to control the total number of inference test steps.
All input files are located in DRLTE/inputs/
.
-
File One:
\${topoName}\_pf\_trueTM\_train4000.txt
: Records the optimal solution (maximum link utilization) obtained from linear programming. This value is used as the denominator for calculating the reward.topoName
indicates the topology name, stored under the currenttopoName
. This file needs to be specified in the run script:lpPerformFile=../inputs/\${topoName}\_pf\_train4000.txt
. -
File Two:
\${topoName}\_train4000
: Records candidate paths and traffic matrices. ThetopoName
indicates the topology name, stored under the currenttopoName
. This file also needs to be specified in the run script:file_name=\${topoName}\_train4000
. -
File Three: Topology file. This needs to be specified in the run script:
topoName=GEA
.
there are some codes which are lost in this version of RedTE, which latter maybe uploaded if founded.