Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[master] CI/CD updates to be more stable #20740

Merged
merged 4 commits into from
Dec 16, 2021

Conversation

josephevans
Copy link
Contributor

@josephevans josephevans commented Nov 12, 2021

This PR fixes a few issues with CI/CD:

  1. When multiple processes are attempting to install a pip package at the same time, there is a race condition that causes them to fail intermittently. Since website s3 push and publish is not run inside a container, just use the awscli installed in the jenkins slave (which is up-to-date.)
  2. Remove the onednn repository after installing onednn. This change prevents all CI pipelines from failing in case the onednn repository gets corrupt (or sync issues), since any apt calls will fail.
  3. Update CUDA architectures built for Windows. Include 7.5 for Turing (which are on g4 instances,) so we can migrate to these instances for Windows CI.

@mxnet-bot
Copy link

Hey @josephevans , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-cpu, sanity, windows-gpu, unix-cpu, windows-cpu, miscellaneous, website, unix-gpu, centos-gpu, clang, edge]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Nov 12, 2021
@mseth10 mseth10 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Dec 8, 2021
… use the awscli installed in the jenkins slave (which is updated.) When multiple processes are attempting to install a pip package at the same time, there is a race condition that causes them to fail often.
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 8, 2021
Copy link
Contributor

@waytrue17 waytrue17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@josephevans josephevans merged commit b555b54 into apache:master Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants