Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Dependency Update] Bump up cuDNN & NCCL version #15142

Merged
merged 3 commits into from
Jun 16, 2019

Conversation

stu1130
Copy link
Contributor

@stu1130 stu1130 commented Jun 4, 2019

Description

un three models ResNet50 with ImageNet & LSTM with PTB & MLP with MNIST
Performance shown below
Environment: P3.16xlarge Deep Learning Base AMI
Codebase: commit 1540a84 for CUDA 9/9.2/10 1540a84 for CUDA 10
I also applied the #14837 PR change
The unit of thoughput is samples/per second
Each throughput is calcuated by average of 5 runs

ResNet

model: Resnet50
dataset: Imagenet
number of gpu: 8
epochs: 3 (only to test throughput)
preprocess command: sudo pip install gluoncv==0.2.0b20180625
command: python mxnet_benchmark/train_imagenet.py --use-rec --batch-size 128 --dtype float32 —num-data-workers 40 —num-epochs 3 —gpus 0,1,2,3,4,5,6,7 --lr 0.05 --last-gamma —mode symbolic —model resnet50_v1b —rec-train /home/ubuntu/data/train-passthrough.rec —rec-train-idx /home/ubuntu/data/train-passthrough.idx —rec-val /home/ubuntu/data/val-passthrough.rec —rec-val-idx /home/ubuntu/data/val-passthrough.idx
github repo: https://github.com/rahul003/deep-learning-benchmark-mirror.git*

CUDA + MKLDNN

Throughput Tables cuDNN 7.6.0/NCCL 2.4.7 cuDNN 7.5.1/NCCL 2.3.4 Perforamnce Difference
CUDA 10.1 2806.35499 2817.18815 -0.385%
CUDA 10 2826.54083 2831.54405 -0.178%
CUDA 9.2 2812.30931 2832.36803 -0.708%
CUDA 9.0 2783.51629 2815.83939 -1.148%

Reference(only 3 times run)
without MKLDNN

Throughput Tables cuDNN 7.6.0/NCCL 2.4.2
CUDA 10.1 2832.42231
CUDA 10 2838.54
CUDA 9.2 2838.424
CUDA 9.0 2833.86458

LSTM

model: LSTM
dataset: PTB(Penn Treebank)
number of gpu: 1
epochs: 10
command:
python2 benchmark_driver.py --framework mxnet --task-name mkl_lstm_ptb_symbolic --num-gpus 1 --epochs 10 --metrics-suffix test --kvstore local
python word_language_model/lstm_bucketing.py —num-hidden 650 —num-embed 650 —gpus 0 --epochs 10 --kv-store local

CUDA + MKLDNN

Throughput Tables cuDNN 7.6.0/NCCL 2.4.2 cuDNN 7.5.1/NCCL 2.3.4 Perforamnce Difference
CUDA 10.1 1018.89083 1015.61785 0.322%
CUDA 10 852.80333 847.98222 0.569%
CUDA 9.2 1011.61122 1005.25185 0.632%
CUDA 9.0 992.34674 1002.59081 -1.021%

The CUDA 10 have a performance regression issue, please see #14725 to find more details.

Reference(only 3 times run)
without MKLDNN

Throughput Tables cuDNN 7.6.0/NCCL 2.4.2
CUDA 10.1 1010.1654
CUDA 10 846.05572
CUDA 9.2 1007.27178
CUDA 9.0 978.18158

MLP

model: 3 dense layers with num_hidden=64 and relu as activation
dataset: MNIST
number of gpu: 1
epochs: 10
command:
python2 benchmark_runner.py —framework mxnet —metrics-policy mlp —task-name mlp —metrics-suffix test —num-gpus 1 —command-to-execute 'python3 mlp.py' —data-set mnist

CUDA + MKLDNN

Throughput Tables cuDNN 7.6.0/NCCL 2.4.2 cuDNN 7.5.1/NCCL 2.3.4 Perforamnce Difference
CUDA 10.1 4438.0091 4422.72478 0.346%
CUDA 10 4433.65315 4638.73873 -4.421%
CUDA 9.2 4439.18763 4425.37599 0.312%
CUDA 9.0 4505.45334 4421.82611 1.891%

Reference(only 3 times run)
without MKLDNN

Throughput Tables cuDNN 7.6.0/NCCL 2.4.2
CUDA 10.1 4515.74059
CUDA 10 4349.40602
CUDA 9.2 4492.37239
CUDA 9.0 4211.6375

Comments

@szha @lanking520

@stu1130 stu1130 requested a review from szha as a code owner June 4, 2019 01:12
@stu1130 stu1130 changed the title bump up cudnn version [Dependency Update] bump up cudnn version Jun 4, 2019
@stu1130 stu1130 changed the title [Dependency Update] bump up cudnn version [Dependency Update] Bump up cudnn version Jun 4, 2019
@piyushghai
Copy link
Contributor

@stu1130 Can you look into the CI failures ?

@mxnet-label-bot Add[pr-awaiting-review, Backend]

@marcoabreu marcoabreu added Backend Issues related to the backend of MXNet pr-awaiting-review PR is waiting for code review labels Jun 4, 2019
@stu1130 stu1130 changed the title [Dependency Update] Bump up cudnn version [Dependency Update] Bump up cuDNN & NCCL version Jun 13, 2019
@stu1130 stu1130 changed the title [Dependency Update] Bump up cuDNN & NCCL version [WIP][Dependency Update] Bump up cuDNN & NCCL version Jun 13, 2019
@stu1130 stu1130 changed the title [WIP][Dependency Update] Bump up cuDNN & NCCL version [Dependency Update] Bump up cuDNN & NCCL version Jun 16, 2019
@szha szha merged commit c4ea674 into apache:master Jun 16, 2019
@stu1130 stu1130 deleted the bump_up_cudnn branch June 16, 2019 20:27
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* bump up cudnn version

* downgrade tensorRT to 7.5

* bump up NCCL 2.4.7
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants