Skip to content

Mellanox Open MPI CI: optimized git checkout step to reduce CI duration (v4.0.x) #7457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

artemry-nv
Copy link

Signed-off-by: Artem Ryabov artemry@mellanox.com

Signed-off-by: Artem Ryabov <artemry@mellanox.com>
@ompiteam-bot
Copy link

Can one of the admins verify this patch?

@rhc54
Copy link
Contributor

rhc54 commented Feb 22, 2020

ok to test

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/ibm-ompi/88b480fb98dfd2bdac939f023e1f8fe6

@artemry-nv
Copy link
Author

CC @amaslenn

@jsquyres
Copy link
Member

bot:ibm:retest

@jsquyres
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gpaulsen @hppritcha Given that the Mellanox AZP stuff was committed to the v4.0.x branch, you might want this one, too -- it's an optimization that reduces the CI run time.

@gpaulsen
Copy link
Member

Yes, thanks. We'll take this one post v4.0.3

@gpaulsen
Copy link
Member

Jeff reached out and let us know that we already took this PR into v4.0.x, so we should consider taking this sooner than later.
I want to understand the Mellanox CI failure first however. I'm reaching out to @jladd-mlnx and @artemry-mlnx via email to ask for help.

@artemry-nv
Copy link
Author

Mellanox Open MPI CI failure is expected (detailed log file) - it's because of Open MPI behavioral changes in #7202 which were reflected in the CI scripts mellanox-hpc/jenkins_scripts#92 (see this comment mellanox-hpc/jenkins_scripts#92 (comment)).
Need to fork CI tests specifically to Open MPI v4.0.x (there's no plan to port #7202 to v4.0.x - #7202 (comment)) and switch CI to them in v4.0.x - I'll deal with this.

@jsquyres
Copy link
Member

@artemry-mlnx Gotcha. FWIW, I maintain different config files for Cisco's testing of different Open MPI branches for exactly this reason (the behavior and/or CLI params changes over time).

@jsquyres jsquyres added this to the v4.0.3 milestone Feb 24, 2020
@jladd-mlnx
Copy link
Member

bot:mellanox:retest

@artemry-nv
Copy link
Author

@jladd-mlnx
Mellanox Open MPI v4.0.x CI will pass only after #7468
Pay attention that bot:mellanox:retest is not supported by Azure Pipelines so please use /azp run to restart Mellanox CI (see README for details).

@jladd-mlnx
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@hppritcha hppritcha removed this from the v4.0.3 milestone Mar 4, 2020
@gpaulsen
Copy link
Member

gpaulsen commented Mar 5, 2020

Hmm. This is failing on perhaps the only test that passes --am /__w/2/ompi/test_amca.conf to mpirun. there's another test that passes --tune /__w/2/ompi/test_amca.conf. @artemry-mlnx could you please take a look?

@gpaulsen
Copy link
Member

gpaulsen commented Mar 5, 2020

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@artemry-nv
Copy link
Author

@gpaulsen
Mellanox Open MPI CI for v4.0.x passes after #7468

@artemry-nv
Copy link
Author

Could anyone please merge this PR?

@jsquyres jsquyres added this to the v4.0.4 milestone Mar 5, 2020
@jsquyres
Copy link
Member

jsquyres commented Mar 5, 2020

Milestone wasn't set -- the v4.0 RMs probably didn't see this PR.

@hppritcha hppritcha merged commit e00fc61 into open-mpi:v4.0.x Mar 6, 2020
@artemry-nv artemry-nv deleted the artemry-mlnx/reduce_mellanox_ci_time_v4 branch March 6, 2020 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants