[Dependency Update] CUDA10.1 Support #14887

stu1130 · 2019-05-06T04:54:43Z

Description

Upgrade the CUDA 10.1 with latest cuDNN 7.5.1 & NCCL 2.4.2

Checklist

Run three models ResNet50 with ImageNet & LSTM with PTB & MLP with MNIST
Performance shown below
Environment: P3.16xlarge Deep Learning Base AMI
Codebase: commit d87bd2a
The unit of thoughput is samples/per second
Each throughput is calcuated by average of 5 runs

ResNet

model: Resnet50
dataset: Imagenet
number of gpu: 8
epochs: 90 (since the regression we found recently only have significant impact on large epochs)
preprocess command: sudo pip install gluoncv==0.2.0b20180625
command: python mxnet_benchmark/train_imagenet.py --use-rec --batch-size 128 --dtype float32 --num-data-workers 40 --num-epochs 3 --gpus 0,1,2,3,4,5,6,7 --lr 0.05 --last-gamma --mode symbolic --model resnet50_v1b --rec-train /home/ubuntu/data/train-passthrough.rec --rec-train-idx /home/ubuntu/data/train-passthrough.idx --rec-val /home/ubuntu/data/val-passthrough.rec --rec-val-idx /home/ubuntu/data/val-passthrough.idx
github repo: https://github.com/rahul003/deep-learning-benchmark-mirror.git

Throughput	CUDA10.1 cuDNN 7.5.1/NCCL 2.4.2	CUDA 10 cuDNN 7.5.1/NCCL 2.4.2	Perforamnce Difference
with MKLDNN	2817.18815	2791.17069	0.932%
without MKLDNN	2821.88889	2798.57505	0.833%

LSTM

model: LSTM
dataset: PTB(Penn Treebank)
number of gpu: 1
epochs: 10
command:
python2 benchmark_driver.py --framework mxnet --task-name mkl_lstm_ptb_symbolic --num-gpus 1 --epochs 10 --metrics-suffix test --kvstore local
python word_language_model/lstm_bucketing.py —num-hidden 650 —num-embed 650 —gpus 0 --epochs 10 --kv-store local

Throughput	CUDA10.1 cuDNN 7.5.1/NCCL 2.4.2	CUDA 10 cuDNN 7.5.1/NCCL 2.4.2	Perforamnce Difference
with MKLDNN	1015.61785	869.05555(Performance Regression)	16.865%
without MKLDNN	1015.01455	830.68338(Performance Regression)	22.190%

The CUDA 10 have a performance regression issue, please see #14725 to find more details.

MLP

I changed the MLP script so the performance might be a liitle worse than before
model: 3 dense layers with num_hidden=64 and relu as activation
dataset: MNIST
number of gpu: 1
epochs: 10
command:
python2 benchmark_runner.py —framework mxnet —metrics-policy mlp —task-name mlp —metrics-suffix test —num-gpus 1 —command-to-execute 'python3 mlp.py' —data-set mnist

Throughput	CUDA10.1 cuDNN 7.5.1/NCCL 2.4.2	CUDA 10 cuDNN 7.5.1/NCCL 2.4.2	Perforamnce Difference
with MKLDNN	4422.72478	4403.72596	0.431%
without MKLDNN	4342.19752	4329.25058	0.299%

Comments

@szha @lanking520 @eric-haibin-lin

tools/setup_gpu_build_tools.sh

anirudhacharya · 2019-05-06T17:17:25Z

@mxnet-label-bot add [pr-awaiting-review]

lanking520 · 2019-05-06T23:35:49Z

General thoughts, do you think it is nessary for us to have some real-time benchmarking on the performance once we do some upgrade like this?

stu1130 · 2019-05-08T17:27:55Z

@lanking520 Sorry for late response. real-time benchmarking here means test performance in CI System? I would prefer running the bechmark across CUDA 9/9.2/10/10.1 on nightly build

lanking520 · 2019-05-09T18:10:29Z

@perdasilva I saw a recent commit done by you to downgrade the CUDA version. Do you think this number is promising and we push it to master?

lanking520

LGTM since the performance number seemed to be promising.

[Dependency Update] CUDA10.1 Support

stu1130 requested a review from szha as a code owner May 6, 2019 04:54

szha reviewed May 6, 2019

View reviewed changes

tools/setup_gpu_build_tools.sh Show resolved Hide resolved

marcoabreu added the pr-awaiting-review PR is waiting for code review label May 6, 2019

stu1130 changed the title ~~[Dependency Update] CUDA10.1 Support~~ [WIP][Dependency Update] CUDA10.1 Support May 6, 2019

stu1130 changed the title ~~[WIP][Dependency Update] CUDA10.1 Support~~ [Dependency Update] CUDA10.1 Support May 8, 2019

lanking520 approved these changes May 9, 2019

View reviewed changes

stu1130 mentioned this pull request May 15, 2019

[Discussion] 1.5.0 Roadmap #14619

Closed

stu1130 added 2 commits May 16, 2019 11:08

upgrade to CUDA 10.1

5b8b25f

address the comment

dbf6041

stu1130 force-pushed the publish_cuda10_1 branch from e39ac04 to dbf6041 Compare May 16, 2019 18:08

retrigger CI

ef584b1

szha merged commit 7b48c24 into apache:master May 21, 2019

stu1130 mentioned this pull request May 22, 2019

[Dependency Update] Bump up the CI Nvidia docker to CUDA 10.1 #14986

Merged

3 tasks

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

Merge pull request apache#14887 from stu1130/publish_cuda10_1

aaa25c3

[Dependency Update] CUDA10.1 Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dependency Update] CUDA10.1 Support #14887

[Dependency Update] CUDA10.1 Support #14887

stu1130 commented May 6, 2019 •

edited

Loading

anirudhacharya commented May 6, 2019

lanking520 commented May 6, 2019

stu1130 commented May 8, 2019

lanking520 commented May 9, 2019

lanking520 left a comment •

edited

Loading

[Dependency Update] CUDA10.1 Support #14887

[Dependency Update] CUDA10.1 Support #14887

Conversation

stu1130 commented May 6, 2019 • edited Loading

Description

Checklist

ResNet

LSTM

MLP

Comments

anirudhacharya commented May 6, 2019

lanking520 commented May 6, 2019

stu1130 commented May 8, 2019

lanking520 commented May 9, 2019

lanking520 left a comment • edited Loading

Choose a reason for hiding this comment

stu1130 commented May 6, 2019 •

edited

Loading

lanking520 left a comment •

edited

Loading