-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ATDM/ats1: Drop intel-18 support and fix intel-19 environment #8495
ATDM/ats1: Drop intel-18 support and fix intel-19 environment #8495
Conversation
- Drop support for intel-18. - Update intel-19 support to use mpich 7.7.15, gcc 8.3.0, and git 2.21.0.
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: e10harvey |
You’re going to want @bathmatt’s review on this one. IIRC, we didn’t see problems with Intel 19 running CTest in our pipelines, but rather the problems manifested running larger problems. I may be wrong there—@bathmatt can correct me. On the other hand, if this is only for the |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_8.3.0 # 3065 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_serial # 693 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_gcc_7.2.0_debug # 1187 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_intel_17.0.1 # 8544 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_cuda_9.2 # 6258 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_clang_10.0.0 # 1390 (click to expand)
Console Output (last 100 lines) : Trilinos_pullrequest_python_3 # 4141 (click to expand)
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: e10harvey |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, thanks for working on this! Just a few things we need to do before merging (some of which are mentioned in comments below):
- Update
ats1/custom_builds.sh
to listintel
in the keyword search list and update the error message to mention(intel-19, intel, default)
as the other names for this. - Need to remove logic from
ats1/environment.sh
that deals with"$ATDM_CONFIG_COMPILER" == "DEFAULT"
. It is impossible to have that valule. - Use the default
sparc-cmake/3.18.1
module (just deletemodule load cmake/3.14.6
) - Need to run the full set of builds and tests for all the packages. I.e. need to run
./ctest-s-local-test-driver.sh all
and let it complete. (We need to know if there are mass test failures before we can let these submit to theSpecialized
CDash group. - Really we should test against SPARC as well since this build really is in service of SPARC.
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: e10harvey |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_serial
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0_debug
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_10.1.105
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
@e10harvey, looking at: showing (sorted by "Start Time"):
There is something really wrong with these builds causing thousands of not-run tests that will overwhelm the cdash_analyze_and_report.py emails. Can you please make these builds "Experimental" for now so they don't overwhelm the cdash_analyze_and_report.py emails until the build errors can be resolved? |
@e10harvey, the situation is not a bad as I thought. Here is the summary from the early early email today:
Because the Not-Run tests are sorted into their own table, they don't mess up our ability to look at the 'twoif=117' tests (which you can see on CDash in this query). But of these 117 failing tests, 79 of them are Timeouts as shown in this query. Therefore, given the state of these builds and the mass number of timeouts and Not-Run tests, I think we should move these builds to "Experimental" for now since this is just too much chaos. |
@bartlettroscoe I will make these Experimental. Note that I was not seeing this strange behavior when starting the tests manually with 1deb0db last night -- see https://testing.sandia.gov/cdash/index.php?project=Trilinos&parentid=8495004. |
Thanks!
Something strange going on here. But not surprising given the nature of machines like 'mutrino' and ATS-1 and the fact that we have not run on it in 3 months after a major upgrade. NOTE: I have asked for some help from Kitware in: to track down why there are missing results on CDash. But we can look into this in more detail to see what is happening. |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
@bartlettroscoe: I did not make the builds experimental due to finding and addressing a quota issue. Here are the cdash results from last night. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. I can't find any obvious issues that short stop the merge of this to 'develop'. (And given the fact that the old 'ats1' configuration was completely broken by the upgrade of 'mutrino' in October, there is zero risk in pushing just about anything for 'ats1' really.)
I wonder if --exclude=nid00021,nid00020
is still needed? In any case, who cares.
This is not a reason not to merge this PR but I am a little concerned about all of the SEACAS tests shown failing in shown in this query for the tests:
SEACASExoIIv2for32_exodus_nc4_unit_tests
SEACASExoIIv2for32_exodus_unit_tests
SEACASExoIIv2for32_exodus_unit_tests_nc4_env
SEACASExodus_exodus_nc4_unit_tests
SEACASExodus_exodus_unit_tests
SEACASExodus_exodus_unit_tests_nc4_env
SEACASExodus_for_exodus_unit_tests
SEACASIoss_exodus32_to_unstructured_cgns
SEACASIoss_exodus64_to_unstructured_cgns
SEACASIoss_generated32_to_unstructured_cgns
SEACASIoss_generated64_to_unstructured_cgns
SEACASIoss_pamgen_to_unstructured_cgns
and the builds:
Trilinos-atdm-ats1-hsw_intel-19.0.4_mpich-7.7.15_openmp_static_opt
Trilinos-atdm-ats1-knl_intel-19.0.4_mpich-7.7.15_openmp_static_opt
(NOTE: We don't run any tests in the 'dbg' builds so they could be failing there too.)
These show errors like shown here showing:
Wed Jan 13 05:18:45 2021: [unset]:_pmi_alps_init:alps_get_placement_info returned with error -1
Wed Jan 13 05:18:45 2021: [unset]:_pmi_init:_pmi_alps_init returned -1
Wed Jan 13 05:18:45 2021: [unset]:_pmi_alps_init:alps_get_placement_info returned with error -1
Wed Jan 13 05:18:45 2021: [unset]:_pmi_init:_pmi_alps_init returned -1
We say a failure like this in #2815 on this same system.
Did any of the failing SPARC tests show errors like this? I can't find any detail in ATDV-409 for why those SPARC tests failed. But it seems like if there was something majorly wrong with SEACAS with this build configuration then we would be seeing a lot more failing SPARC tests than just 1 test.
We can triage these later and see what Greg S. thinks about these.
# Boost 1.65.1 settings | ||
export BOOST_ROOT=${sparc_tpl_prefix_path}/${system_name}-${node_arch}/boost-1.72.0/00000000/${system_name}-${node_arch}_intel-19.0.4 | ||
|
||
# Hdf5 1.10.5 settings | ||
export HDF5_ROOT=${sparc_tpl_prefix_path}/${system_name}-${node_arch}/hdf5-1.10.5/00000000/${system_name}-${node_arch}_intel-19.0.4_mpich-7.7.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So nice how much got stripped out by using the sparc-dev modules directly.
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ bartlettroscoe ]! |
Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged |
Merge on Pull Request# 8495: IS A SUCCESS - Pull Request successfully merged |
@bartlettroscoe: Thanks again for your detailed review! I updated atdv-409 with some more info on this. |
Related to:
|
Fix ats1 environment:
See ATDV-409 for more details.
How was this tested?
Results are shown on CDash at:
Full build and test
See full build and test results at https://testing.sandia.gov/cdash/index.php?project=Trilinos&begin=2021-01-10&end=2021-01-11&filtercount=1&showfilters=1&field1=buildname&compare1=63&value1=ats1.