Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uploading images #24

Closed
ngam opened this issue May 11, 2022 · 9 comments
Closed

uploading images #24

ngam opened this issue May 11, 2022 · 9 comments

Comments

@ngam
Copy link
Owner

ngam commented May 11, 2022

I will try to upload some images later this week. We can at least document the process for interested community members if they have access to V100 or A100 GPUs and want some more performance!

Originally posted by @ngam in pangeo-data/pangeo-docker-images#320 (comment)

@ngam
Copy link
Owner Author

ngam commented May 11, 2022

@weiji14, let me know if you get a chance to test them.

docker pull ngam00/ngc-pt-pangeo

note 00 above in username; alternatively when it finishes uploading to the gh registry:

docker pull ghcr.io/ngam/ngc-pt-pangeo

missing packages from these images are here: #21. I haven't had a chance to run any benchmarks yet, but I will looking into that soon...

@ngam ngam changed the title I will try to upload some images later this week. We can at least document the process for interested community members if they have access to V100 or A100 GPUs and want some more performance! uploading images May 11, 2022
@weiji14
Copy link

weiji14 commented May 13, 2022

Cool, thanks @ngam, I'll try and give this a spin on my GPU over the weekend. Is there a good benchmark you'd recommend to test this on? Preferrably something light that takes <16GB of GPU RAM.

@ngam
Copy link
Owner Author

ngam commented May 16, 2022

Sorry I didn't respond here... Not really sure about benchmarks, I usually really only run my own models and usually in tensorflow

let me know if you managed to get something going

@weiji14
Copy link

weiji14 commented May 16, 2022

Ok, found an easy-ish benchmark script at https://github.com/cresset-template/cresset/blob/7762a947ff567003befbab3d217364f9fcf98b67/benchmark.py. To run it, do:

git clone https://github.com/cresset-template/cresset.git
cd cresset/

Below are the tests I ran on an NVIDIA RTX A5000 Laptop GPU, only thing I changed was the docker image (ghcr.io/ngam/ngc-pt-pangeo vs pangeo/pytorch-notebook:2022.05.1).

NGC-based ghcr.io/ngam/ngc-pt-pangeo

docker run -it --rm \
           --gpus all \
           --volume $PWD:/home/jovyan \
           ghcr.io/ngam/ngc-pt-pangeo \
           python /home/jovyan/benchmark.py

Results:

=============
== PyTorch ==
=============

NVIDIA Release 22.04 (build 36527063)
PyTorch Version 1.12.0a0+bd13bc6

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

Python Version: 3.8.13
PyTorch Version: 1.12.0a0+bd13bc6
PyTorch CUDA Version: 11.6
PyTorch cuDNN Version: 8400
PyTorch Architecture List: ('sm_52', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_86')
GPU Device Name: NVIDIA RTX A5000 Laptop GPU
GPU Compute Capability: 8.6
NVIDIA Driver Version: 510.60.02
Automatic Mixed Precision Enabled: False.
TorchScript Enabled: False.
                                                                                               
Model: r3d_18.
Input shapes: ((1, 3, 64, 128, 128),).
Average time:  39.796 milliseconds.
Total time:  41 seconds.
                                                                                               
Model: Transformer.
Input shapes: ((1, 512, 512), (1, 512, 512)).
Average time:   5.212 milliseconds.
Total time:   5 seconds.
                                                                                               
Model: resnet50.
Input shapes: ((2, 3, 512, 512),).
Average time:  16.112 milliseconds.
Total time:  16 seconds.
                                                                                               
Model: vgg19.
Input shapes: ((1, 3, 512, 512),).
Average time:  16.649 milliseconds.
Total time:  17 seconds.
                                                                                               
Model: fcn_resnet50.
Input shapes: ((1, 3, 512, 512),).
Average time:  23.188 milliseconds.
Total time:  24 seconds.
                                                                                               
Model: deeplabv3_resnet50.
Input shapes: ((1, 3, 512, 512),).
Average time:  27.268 milliseconds.
Total time:  28 seconds.
                                                                                               
Model: retinanet_resnet50_fpn.
Input shapes: ((1, 3, 512, 512),).
Average time:  41.549 milliseconds.
Total time:  43 seconds.

Pangeo's image pangeo/pytorch-notebook:2022.05.1

docker run -it --rm \
           --gpus all \
           --volume $PWD:/home/jovyan \
           pangeo/pytorch-notebook:2022.05.10 \
           python /home/jovyan/benchmark.py
Python Version: 3.9.12
PyTorch Version: 1.11.0
PyTorch CUDA Version: 11.2
PyTorch cuDNN Version: 8201
PyTorch Architecture List: ('sm_35', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_50')
GPU Device Name: NVIDIA RTX A5000 Laptop GPU
GPU Compute Capability: 8.6
NVIDIA Driver Version: 510.60.02
Automatic Mixed Precision Enabled: False.
TorchScript Enabled: False.
                                                                                               
Model: r3d_18.
Input shapes: ((1, 3, 64, 128, 128),).
Average time:  37.156 milliseconds.
Total time:  38 seconds.
                                                                                               
Model: Transformer.
Input shapes: ((1, 512, 512), (1, 512, 512)).
Average time:   5.373 milliseconds.
Total time:   6 seconds.
                                                                                               
Model: resnet50.
Input shapes: ((2, 3, 512, 512),).
Average time:  18.056 milliseconds.
Total time:  18 seconds.
                                                                                               
Model: vgg19.
Input shapes: ((1, 3, 512, 512),).
Average time:  17.039 milliseconds.
Total time:  17 seconds.
                                                                                               
Model: fcn_resnet50.
Input shapes: ((1, 3, 512, 512),).
Average time:  27.493 milliseconds.
Total time:  28 seconds.
                                                                                               
Model: deeplabv3_resnet50.
Input shapes: ((1, 3, 512, 512),).
Average time:  31.933 milliseconds.
Total time:  33 seconds.
                                                                                               
Model: retinanet_resnet50_fpn.
Input shapes: ((1, 3, 512, 512),).
Average time:  43.319 milliseconds.
Total time:  44 seconds.

Differences

See https://www.diffchecker.com/ZTpD1Par. Not exactly an apples to apples comparison as there are lots of library version mismatches (e.g. CUDA 11.6 vs CUDA 11.2, CUDNN 8400 vs CUDNN 8201, etc), but in general the differences seem a bit minor.

Other than the ResNet18 model where pangeo/pytorch-notebook was faster than ghcr.io/ngam/ngc-pt-pangeo by 3 seconds, it seems like ghcr.io/ngam/ngc-pt-pangeo is faster for the other models (generally more deeper/complicated ones). Biggest difference was for deeplabv3_resnet50, where ghcr.io/ngam/ngc-pt-pangeo took 33 seconds, while pangeo/pytorch-notebook took 28 seconds, or a difference of 5 seconds.

I'd be tempted to update the Pangeo notebook with newer CUDA/CUDNN/Pytorch versions to make the comparison fair before saying confidently that the NGC containers win out, but the NGC-based one is definitely in the lead right now 😃

@ngam
Copy link
Owner Author

ngam commented May 16, 2022

Yes, but I'm glad it is only minor! I think what we can do is try harder to push the conda-forge feedstocks to copy the NGC builds... I'm already doing that with tensorflow

@weiji14
Copy link

weiji14 commented May 16, 2022

Yeah, but like you said, those tiny differences might add up. Say if someone was training a neural network for 1 hour, 10sec saved per minute would mean 10x60 = 600 seconds or 10 minutes less time per hour. If you expand that to 1 day/24 hours, then that's 240 minutes or 4 hours saved!

If you can pin the ngc-pt-pangeo docker image's to pytorch=1.11.0 (down from 1.12.0a0+bd13bc6), I can try to work on updating the CUDA version on the pytorch-notebook image to CUDA 11.6, then we can maybe get a fairer benchmark comparison.

@ngam
Copy link
Owner Author

ngam commented May 16, 2022

1.12.0a0+bd13bc6

this weird pin is from NGC... https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_22-04.html#rel_22-04

@weiji14
Copy link

weiji14 commented May 16, 2022

1.12.0a0+bd13bc6

this weird pin is from NGC... https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_22-04.html#rel_22-04

Interesting, so they are pinning specific Pytorch commits?!! I'm usually ok with bleeding edge software, but not sure if this is ok for general Pangeo users 😅

@ngam
Copy link
Owner Author

ngam commented May 17, 2022

Yeah, but like you said, those tiny differences might add up. Say if someone was training a neural network for 1 hour, 10sec saved per minute would mean 10x60 = 600 seconds or 10 minutes less time per hour. If you expand that to 1 day/24 hours, then that's 240 minutes or 4 hours saved!

You're absolutely right on this btw. Also, take into account two additional points: 1) toy models are double edged swords, they're somewhat optimized which relatively light. I suspect for an actual researcher who ends up paying close attention to performance, the saved time will be a little more. So, I don't want to discount this premise, it is very important --- this is what drove me to do this to begin with :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants