This repository contains the training, testing and simulator code used to build the DUMBO system from the research paper Taming the Elephants: Affordable Flow Length Prediction in the Data Plane.
DUMBO is a versatile networked system that integrates a lightweight traffic classifier to enhance several downstream tasks in the data plane (e.g., packet scheduling, inter-arrival times distribution estimation, flow length estimation). The main idea of DUMBO is to segregate elephants and mice flows to address them separately, hence saving memory and improving performance over standard baselines.
This document serves as a guide to install and use the DUMBO system on real traffic traces.
Follow these instructions to quickly set up the repository and reproduce the experiments on Linux (Ubuntu version >= 22).
-
Dependencies
-
$ sudo apt-get install wireshark-common
-
Install Python 3.9 outside of any virtual environment
$ sudo apt update $ sudo apt install python3.9 $ python --version
-
Install and setup Rust
- Use
v1.76.0-nightly
and check your version:
$ cargo --version
- Install the
libpython3.9-dev
package on your system:
$ sudo apt install libpython3.9-dev
- Deactivate any virtual environment and build the repository:
$ cargo build -r
- Use
-
Create the required Anaconda environments
$ chmod +x ./setup_conda.sh $ ./setup_conda.sh
-
-
Data
Download the traces (see Traffic traces below). Uncompress and store the *.pcap
files in the appropriate folder:
./data/caida/pcap/equinix-chicago.dirA.20160121-{hour}.UTC.anon.pcap
./data/mawi/pcap/20190409{hour}.pcap
./data/uni/pcap/univ2.pcap
- Scheduling simulator
Clone and patch the YAPS simulator repository
$ git clone -n https://github.com/NetSys/simulator.git
$ cd simulator
$ git checkout -b scheduling_DUMBO 179b64e
$ git apply < ../scheduling_DUMBO.patch
$ cd ..
- Run
Run the pipeline to reproduce the experiments:
$ chmod +x ./run.sh
$ ./run.sh caida # Includes trade-off analysis
$ ./run.sh mawi
$ ./run.sh uni
$ chmod +x ./run_update_stresstest.sh
$ ./run_update_stresstest.sh # Requires complete caida and mawi runs
- Plot
Plot the results using the notebooks in ./plots/
Here are the data used in the experiments.
- Trace: equinix Chicago dir.A 2016-01-21 13:00 - 13:59
- Link: https://www.caida.org/catalog/datasets/passive_dataset_download/ (approval required by CAIDA)
- Trace: 2019-04-09 18:30 - 19:45
- Link: https://mawi.wide.ad.jp/mawi/ditl/ditl2019/
- Trace: UNI2 2010-01-22 20:02 - 22:40
- Link: https://pages.cs.wisc.edu/~tbenson/IMC_DATA/univ2_trace.tgz
You can find additional technical documentation about the simulators in ./README_SIMULATOR.md
and ./README_DEV.md
.
If you have found this paper useful, please cite us using:
@article{dumbo2024,
title={Taming the Elephants: Affordable Flow Length Prediction in the Data Plane},
author={Azorin, Raphael and Monterubbiano, Andrea and Castellano, Gabriele and Gallo, Massimo and Pontarelli, Salvatore and Rossi, Dario},
journal={Proceedings of the ACM on Networking},
volume={2},
number={CoNEXT1},
articleno = {5},
numpages={24},
year={2024},
publisher={ACM New York, NY, USA}
}
We would like to thank the authors of pHost and of the YAPS simulator as well as the author of the MetaCost learning implementation.