Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[FEATURE]Migrating all CD pipelines to Ninja build + fix cu112 CD pipeline #19974

Merged
merged 19 commits into from
Mar 12, 2021

Conversation

access2rohit
Copy link
Contributor

@access2rohit access2rohit commented Mar 3, 2021

Description

Migrates all cd build to ninja from make
fixes CUDA11.2 pipeline for v1.x.
updates docker test images to ubuntu18 and fixes depedencies.
Fixes build issues with cmake
provides correct path for some make builds
Makes CD generated binaries compliant with Apache

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Testing

CD pipeline is working: https://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/mxnet-cd-pipeline-1.x-rohit/

@mxnet-bot
Copy link

Hey @access2rohit , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [sanity, miscellaneous, windows-gpu, edge, centos-cpu, unix-gpu, centos-gpu, windows-cpu, unix-cpu, website, clang]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@access2rohit access2rohit changed the base branch from master to v1.x March 3, 2021 02:58
@access2rohit access2rohit changed the title Cd ninja Migrating all CD pipelines to Ninja build + fix cu112 pipeline Mar 3, 2021
@access2rohit access2rohit changed the title Migrating all CD pipelines to Ninja build + fix cu112 pipeline [DO NOT MERGE]Migrating all CD pipelines to Ninja build + fix cu112 pipeline Mar 3, 2021
@access2rohit access2rohit changed the title [DO NOT MERGE]Migrating all CD pipelines to Ninja build + fix cu112 pipeline [DO NOT MERGE][v1.x]Migrating all CD pipelines to Ninja build + fix cu112 pipeline Mar 3, 2021
@access2rohit access2rohit reopened this Mar 3, 2021
@access2rohit access2rohit changed the title [DO NOT MERGE][v1.x]Migrating all CD pipelines to Ninja build + fix cu112 pipeline [DO NOT MERGE][v1.x]Migrating all CD pipelines to Ninja build + fix cu112 CD pipeline Mar 3, 2021
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 4, 2021
cmake/Modules/FindCUDNN.cmake Outdated Show resolved Hide resolved
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Mar 4, 2021
@access2rohit access2rohit force-pushed the cd_ninja branch 2 times, most recently from f323a76 to eb14659 Compare March 5, 2021 00:11
@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 5, 2021
@lanking520 lanking520 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Mar 11, 2021
@access2rohit
Copy link
Contributor Author

access2rohit commented Mar 11, 2021

Apparently, unable to download MNIST dataset from http://yann.lecun.com/exdb/mnist/ so added backup URL todata.sh to stablize unix-gpu CI.

@lanking520 lanking520 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Mar 11, 2021
@access2rohit
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Mar 11, 2021
@access2rohit
Copy link
Contributor Author

access2rohit commented Mar 11, 2021

Will disable flaky tests #20011 in this PR too if the current CI run fails.

@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 11, 2021
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 11, 2021
Copy link
Contributor

@Zha0q1 Zha0q1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@Zha0q1 Zha0q1 merged commit fc37c75 into apache:v1.x Mar 12, 2021
access2rohit added a commit to access2rohit/incubator-mxnet that referenced this pull request Mar 12, 2021
…eline (apache#19974)

* migrating cd builds to ninja + removing static links to nvidia libs and leagacy cuda versions

* installing NCCL manually for cuda11.2 container

* set MSHADOW_USE_CUDNN=1 in CMakelists of mshadow to build properly for CUDNN support

* adding coverage to cd requirements file to fix cu100, cu101 and cu102 tests

* updating cd_test containers to ubuntu 18

* adding cmake config for linux native and adding USE_KV_STORE in linux_cpu

* updating zmq builds to statically link to libmxnet.so

* updating toolchains for r, clang and llvm for ubuntu18. OpenBlas Static link for 'distribution' build type only. Fix caffe build to use openCV 3. Remove leagacy Clang 3.9 from CI

* fix versions for pip install in ubuntu_core_sh add new search path for cuDNN

* finxing cudnn link problem for CUDA<=11.0

* adding library paths for libjpegturbo and lapack to fix failing CI on ubuntu 18 images

* removing ASAN integration test from miscellaneous CI as its not required

* fix lapack path for gpu builds

* correctly installing libjpegturbo for ubuntu 18

* updating docker images of r,jekyll,julia etc test containers+ fix java version to 8

* installing libomp.so

* removing debug test as its not required. Code clean-up

* adding alternate URL source for MNIST dataset as original website is down

* skipping flaky tests issue tracked apache#20011

Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>
mseth10 added a commit that referenced this pull request Mar 14, 2021
…20015)

* [BACKPORT]Enable CUDA 11.0 on nightly + CUDA 11.2 on pip (#19295)(#19764) (#19930)

* Enable CUDA 11.0 on nightly development builds (#19295)

Remove CUDA 9.2 and CUDA 10.0

* [PIP] add build variant for cuda 11.2 (#19764)

* adding ci docker files for cu111 and cu112

* removing previous CUDA make versions and adding support for cuda11.2

Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com>
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* [FEATURE]Migrating all CD pipelines to Ninja build + fix cu112 CD pipeline (#19974)

* migrating cd builds to ninja + removing static links to nvidia libs and leagacy cuda versions

* installing NCCL manually for cuda11.2 container

* set MSHADOW_USE_CUDNN=1 in CMakelists of mshadow to build properly for CUDNN support

* adding coverage to cd requirements file to fix cu100, cu101 and cu102 tests

* updating cd_test containers to ubuntu 18

* adding cmake config for linux native and adding USE_KV_STORE in linux_cpu

* updating zmq builds to statically link to libmxnet.so

* updating toolchains for r, clang and llvm for ubuntu18. OpenBlas Static link for 'distribution' build type only. Fix caffe build to use openCV 3. Remove leagacy Clang 3.9 from CI

* fix versions for pip install in ubuntu_core_sh add new search path for cuDNN

* finxing cudnn link problem for CUDA<=11.0

* adding library paths for libjpegturbo and lapack to fix failing CI on ubuntu 18 images

* removing ASAN integration test from miscellaneous CI as its not required

* fix lapack path for gpu builds

* correctly installing libjpegturbo for ubuntu 18

* updating docker images of r,jekyll,julia etc test containers+ fix java version to 8

* installing libomp.so

* removing debug test as its not required. Code clean-up

* adding alternate URL source for MNIST dataset as original website is down

* skipping flaky tests issue tracked #20011

Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* update cudnn from 7 to 8 for cu102 (#19506)

* update cudnn from 7 to 8 for cu102 (#19522)

* downloading MNIST dataset from alternate URL (#20014)

Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* fixing CI issue with v1.8.x

* addressing review comments

Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com>
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>
Co-authored-by: Manu Seth <22492939+mseth10@users.noreply.github.com>
mseth10 added a commit to mseth10/incubator-mxnet that referenced this pull request Mar 15, 2021
…pache#20015)

* [BACKPORT]Enable CUDA 11.0 on nightly + CUDA 11.2 on pip (apache#19295)(apache#19764) (apache#19930)

* Enable CUDA 11.0 on nightly development builds (apache#19295)

Remove CUDA 9.2 and CUDA 10.0

* [PIP] add build variant for cuda 11.2 (apache#19764)

* adding ci docker files for cu111 and cu112

* removing previous CUDA make versions and adding support for cuda11.2

Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com>
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* [FEATURE]Migrating all CD pipelines to Ninja build + fix cu112 CD pipeline (apache#19974)

* migrating cd builds to ninja + removing static links to nvidia libs and leagacy cuda versions

* installing NCCL manually for cuda11.2 container

* set MSHADOW_USE_CUDNN=1 in CMakelists of mshadow to build properly for CUDNN support

* adding coverage to cd requirements file to fix cu100, cu101 and cu102 tests

* updating cd_test containers to ubuntu 18

* adding cmake config for linux native and adding USE_KV_STORE in linux_cpu

* updating zmq builds to statically link to libmxnet.so

* updating toolchains for r, clang and llvm for ubuntu18. OpenBlas Static link for 'distribution' build type only. Fix caffe build to use openCV 3. Remove leagacy Clang 3.9 from CI

* fix versions for pip install in ubuntu_core_sh add new search path for cuDNN

* finxing cudnn link problem for CUDA<=11.0

* adding library paths for libjpegturbo and lapack to fix failing CI on ubuntu 18 images

* removing ASAN integration test from miscellaneous CI as its not required

* fix lapack path for gpu builds

* correctly installing libjpegturbo for ubuntu 18

* updating docker images of r,jekyll,julia etc test containers+ fix java version to 8

* installing libomp.so

* removing debug test as its not required. Code clean-up

* adding alternate URL source for MNIST dataset as original website is down

* skipping flaky tests issue tracked apache#20011

Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* update cudnn from 7 to 8 for cu102 (apache#19506)

* update cudnn from 7 to 8 for cu102 (apache#19522)

* downloading MNIST dataset from alternate URL (apache#20014)

Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* fixing CI issue with v1.8.x

* addressing review comments

Co-authored-by: waytrue17 <52505574+waytrue17@users.noreply.github.com>
Co-authored-by: Sheng Zha <szha@users.noreply.github.com>
Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>
Co-authored-by: Manu Seth <22492939+mseth10@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants