Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[v1.x] Add new CI pipeline for building and testing with cuda 11.0. #19149

Merged
merged 12 commits into from
Sep 17, 2020

Conversation

josephevans
Copy link
Contributor

@josephevans josephevans commented Sep 15, 2020

Description

This PR adds a new pipeline to CI for testing builds under Cuda 11.0.

The new pipeline ("unix-gpu-cu110") is triggered by the full-build when the sanity build completes.

@mxnet-bot
Copy link

Hey @josephevans , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [miscellaneous, sanity, unix-gpu, centos-gpu, clang, website, windows-cpu, unix-cpu, edge, windows-gpu, centos-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@josephevans josephevans changed the title [WIP] [v1.x] Add new CI pipeline for testing Cuda 11.0 builds [v1.x] Add new CI pipeline for building and testing with cuda 11.0. Sep 15, 2020
@josephevans
Copy link
Contributor Author

@leezu @ChaiBapchya Can you guys please review?

ci/docker/runtime_functions.sh Show resolved Hide resolved
COPY runtime_functions.sh /work/

WORKDIR /work/mxnet
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/compat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember facing issues with cuda/compat directory while migrating from G3 to G4. Do we need this? @ptrendx @leezu confirm.
It was only TVM OP which needed that right? And since we are disabling it we shouldn't need compat dir.

Correct me if wrong.

Copy link
Contributor

@samskalicky samskalicky Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure we need this:

/usr/local/cuda-11.0/compat/libcuda.so.450.51.06
/usr/local/cuda-11.0/compat/libcuda.so
/usr/local/cuda-11.0/compat/libcuda.so.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the cpp-package fails to build (due to unable to find libcuda.so.1, which libmxnet.so is linked against.)

I could also disable the cpp-package portion of the build, since it's actually not being used in the test pipeline steps.

Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@josephevans
Copy link
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

Copy link
Contributor

@DickJC123 DickJC123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the build configuration files are not my strong suit, this LGTM. The passing CI is a strong indication of correctness here. I'd like to see this merged as an important next step in finalizing the 1.8 release.

@samskalicky samskalicky merged commit 620d058 into apache:v1.x Sep 17, 2020
@josephevans josephevans deleted the cuda11_v1.x branch September 17, 2020 16:32
@sandeep-krishnamurthy
Copy link
Contributor

Thank you very much @josephevans :-) :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants