TLCBench

Benchmark scripts for TVM

Docker

Content

Requirement
Intel CPU
NVIDIA GPU

Requirement

Tested with
TVM commit id: 91e07e1f3a7 (Feb. 5, 2021)
mxnet==1.7.0
gluonnlp==0.10.0

Intel CPU

Results on AWS c5.9xlarge (Intel Xeon Platinum 8124m @ 3.00GHz 18-core)

AutoTVM

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            5.40 ms             (0.08 ms)
mobilenet_v2       1            1.33 ms             (0.05 ms)
bert               1            31.31 ms            (0.11 ms)
-------------------------------------------------------------

AutoScheduler

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            5.30 ms             (0.05 ms)
mobilenet_v2       1            0.91 ms             (0.02 ms)
bert               1            16.52 ms            (0.16 ms)
-------------------------------------------------------------

Benchmark All Networks

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for all networks.

Commands for AutoTVM

python3 benchmark_autotvm.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Benchmark One Network

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for one network. You can replace "resnet_50" below with "mobilenet_v2" or "bert".

Commands for AutoTVM

python3 benchmark_autotvm.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"  --logdir saved_logs/latest

Tuning

The following commands perform auto-tuning for one or all networks and save tuning logs to directory tmp_logs. After tuning, you can use these logs to run benchmark by using benchmark commands above and replace the last argument with --logdir tmp_logs

Commands for AutoTVM

# Tune one network
python3 tune_autotvm.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"
# Tune all networks
python3 tune_autotvm.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"

Commands for AutoScheduler

# Tune one network
python3 tune_autoscheduler.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"
# Tune all networks
python3 tune_autoscheduler.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"

Nvidia GPU

Results on AWS g4dn.4xlarge (NVIDIA T4)

AutoTVM

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            3.54 ms             (0.02 ms)
mobilenet_v2       1            0.74 ms             (0.00 ms)
bert               1            89.06 ms            (1.22 ms)
-------------------------------------------------------------

AutoScheduler

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            2.90 ms             (0.01 ms)
mobilenet_v2       1            0.57 ms             (0.00 ms)
bert               1            9.95 ms             (0.01 ms)
-------------------------------------------------------------

Benchmark All Networks

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for all networks.

Commands for AutoTVM

python3 benchmark_autotvm.py --network all --target "cuda -model=t4" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network all --target "cuda -model=t4" --logdir saved_logs/latest

Benchmark One Network

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for one network. You can replace "resnet_50" below with "mobilenet_v2" or "bert".

Commands for AutoTVM

python3 benchmark_autotvm.py --network resnet_50 --target "cuda -model=t4" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network resnet_50 --target "cuda -model=t4"  --logdir saved_logs/latest

Tuning

The following commands perform auto-tuning for one or all networks and save tuning logs to directory tmp_logs. After tuning, you can use these logs to run benchmark by using benchmark commands above and replace the last argument with --logdir tmp_logs

Commands for AutoTVM

# Tune one network
python3 tune_autotvm.py --network resnet_50 --target "cuda -model=t4"
# Tune all networks
python3 tune_autotvm.py --network all --target "cuda -model=t4"

Commands for AutoScheduler

# Tune one network
python3 tune_autoscheduler.py --network resnet_50 --target "cuda -model=t4"
# Tune all networks
python3 tune_autoscheduler.py --network all --target "cuda -model=t4"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TLCBench

Docker

Content

Requirement

Intel CPU

Results on AWS c5.9xlarge (Intel Xeon Platinum 8124m @ 3.00GHz 18-core)

Benchmark All Networks

Benchmark One Network

Tuning

Nvidia GPU

Results on AWS g4dn.4xlarge (NVIDIA T4)

Benchmark All Networks

Benchmark One Network

Tuning

Files

README.md

Latest commit

History

README.md

File metadata and controls

TLCBench

Docker

Content

Requirement

Intel CPU

Results on AWS c5.9xlarge (Intel Xeon Platinum 8124m @ 3.00GHz 18-core)

Benchmark All Networks

Benchmark One Network

Tuning

Nvidia GPU

Results on AWS g4dn.4xlarge (NVIDIA T4)

Benchmark All Networks

Benchmark One Network

Tuning