-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add openmpi 4.0.5 toolchain for VAN1 #8850
Conversation
User Support Ticket(s) or Story Referenced: SPAR-969
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
@e10harvey, @trilinos/framework |
|
||
if [ "$ATDM_CONFIG_NODE_TYPE" == "OPENMP" ] ; then | ||
unset OMP_PLACES | ||
unset OMP_PROC_BIND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I remember @rppawlo saying this OMP_PROC_BIND
is something we want to leave up to the user rather than set (or unset) across the board. Might be wrong, though—memory is foggy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be inclined to agree with you, but it should only affect runtime behavior, not build-time, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: Experiments with Trilinos showed that unsetting these improved the performance of running the Trilinos test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, but I think it was unsetting these improved testing, where you're firing off a bunch of small things at once, but degraded performance of larger runs where you're launching one big thing. Hence the desire to leave it up to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence the desire to leave it up to the user.
Right, but out of the box, this should work for the Trilinos test suite or someone will need to deal with a bunch of timing out Trilinos tests.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @sebrowne. On van1-tx2, can you run and check:
mkdir Trilinos-pr8850
cd Trilinos-pr8850
ln -s /path/to/Trilinos/cmake/std/atdm/ctest-s-local-test-driver.sh
nohup env ./ctest-s-local-test-driver.sh all &>ctest-s-local-test-driver.out &
Before merging?
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ e10harvey ]! |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: sebrowne |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ e10harvey ]! |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
@e10harvey it failed, and looks unrelated to my change:
|
@sebrowne, It's not finding driver files for the new supported builds in: https://github.com/trilinos/Trilinos/tree/develop/cmake/ctest/drivers/atdm/van1-tx2/drivers. You can create these via |
Related to my epic SEPW-215 |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
4 similar comments
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebrowne, the additions look reasonable. I think all that is needed is to add the driver files:
Trilinos/cmake/ctest/drivers/atdm/van1-tx2/drivers/
Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt.sh
Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg.sh
(just copy them from the existing files Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_opt.sh
) and then run the following commands to test things:
# Check that all of the builds listed in the 'van1-txt/all_supported_builds.sh' have \
# driver files and the basic configuration works for all.
$ env Trilinos_PACKAGES=Kokkos \
./ctest-s-local-test-driver.sh all
# Run the new builds completely for all packages to see what it looks like on CDash
$ ./ctest-s-local-test-driver.sh \
van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt.sh \
van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg.sh
See ctest-s-local-test-driver.sh.
P.S. I can put in a PR that updates the instructions for adding a new configuration and what needs to be tested.
P.S. @sebrowne, just to let you know, I am happy to review updates to the ATDM Trilinos configuration when asked but I can't approve them to be merged. Someone on the @trilinos/framework needs to do that because they are responsible to traige problems triggered by such changes when they show up on CDash that result from such changes. (Or you can remove the nighty builds for this configuration that posts to CDash then that would not be an issue anymore but there obvious downsides to that. You can do that by not even listing these new builds in the van1-tx2/all_supported_builds.sh
file.)
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
1 similar comment
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
@bartlettroscoe I added the drivers, but the current WCID in use on that machine doesn't allow me to test them. |
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Is there a cdash link with results? |
Grumble I can't find it, though I feel like there should have been (probably my mistake). I'll go re-run the test driver and see what happens. |
I'm testing it now, but I'm unable to reset the proxies, which appear to prohibit the results from posting to CDash |
Console output shows all builds passed. |
@sebrowne, looks like you only ran Kokkos tests. Can you please run:
? |
Expanded below. There are a lot of failing tests (over 1400 failing out of 2400 tests). Something is seriously wrong here. DETAILS (click to expand)
|
I'm trying to run again, since they failed on all of the combinations I doubt it was this change |
@sebrowne, why is your proxies not working so you can't submit to CDash? Can we fix that? If we see results on CDash, it will be easier to see what is going wrong. |
I tried adjusting the proxies in my environment, the system always attempts to use the same one regardless of what is set |
@e10harvey, @trilinos/framework, I am also getting CDash submit failures as well from 'stria' running:
I am seeing CTest -S output showing that it is using a default proxy:
Somehow this is working with the Jenkins jobs that run and submit to CDash as shown, for example, here: We need to figure out why ctest -S is picking up this proxy and how to tell ctest to stop doing that. |
This helps to debug problems with http_proxy and HTTP_PROXY being set which breaks submitting to the Trilinos CDash site.
Something changed in CTest or on 'stria' so that the HTTP_PROXY var being set by default in the env load is getting picked up by CDash.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
NOTE: I fixed this in commit 39becf8 on this branch but it appears that someone already fixed this in commit baf11ea on 3/8/2021 by @e10harvey. I will resolve the conflict so this can merge. |
…rilinos#8850) I resolved the conflict in the file: * cmake/ctest/drivers/atdm/utils/setup_env.sh that fixed the same issue with HTTP_PROXY on the 'develop' branch in commit baf11ea. This now allows submitting to CDash from 'stria'.
@sebrowne, can you pull this updated branch and try running:
? It should submit to CDash now. At least it is working for me as shown at: |
@sebrowne, the reason there are so many non-passing tests is that there are build errors. See: Looks like these are caused by the build_stats wrapper. I will disabled the build stats wrapper in that build and fire off again. |
FYI: I am running:
and it is submitting to CDash but now we are getting a bunch of license server errors shown at: showing:
Seems that too many people are trying to run builds on 'stria' nodes at the same time. (See CDOFA-116). |
FYI: Today I again ran:
and it posted to CDash at: showing:
with the non-passing tests shown at: showing:
Almost all of these tests are already failing in other configurations but a couple may not be. (Someone will need to triage those once this nightly build is set up and posting to CDash). @trilinos/framework, IMHO, the running of the CC: @tcfisher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @sebrowne and @bartlettroscoe. Please see a couple minor items. Going forward, please use the tip of develop; there are many changes in flight which cannot always be recalled on a repo of this size so it's best to use the tip of develop. I will discuss using the auto-tester to run tests when changes to the atdm
subdir are made with @trilinos/framework.
Full van1-tx2 tests will be posted to cdash later on 04/09.
echo | ||
echo "Printing all of the proxy env vars:" | ||
set | grep -i proxy= | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove.
echo | ||
echo "Unsetting HTTP_PROXY and http_proxy env vars for submit to CDash" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Please make any changes you want. I will leave the rest to you and the framework team. |
Closing this since #9001 has merged. |
User Support Ticket(s) or Story Referenced: SPAR-969
@trilinos/framework