Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add openmpi 4.0.5 toolchain for VAN1 #8850

Closed
wants to merge 6 commits into from

Conversation

sebrowne
Copy link
Contributor

@sebrowne sebrowne commented Mar 8, 2021

User Support Ticket(s) or Story Referenced: SPAR-969

@trilinos/framework

User Support Ticket(s) or Story Referenced: SPAR-969
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
WARNING: NO REVIEWERS HAVE BEEN REQUESTED FOR THIS PULL REQUEST!

@sebrowne
Copy link
Contributor Author

sebrowne commented Mar 8, 2021

@jmgate @bartlettroscoe

@bartlettroscoe
Copy link
Member

@e10harvey, @trilinos/framework


if [ "$ATDM_CONFIG_NODE_TYPE" == "OPENMP" ] ; then
unset OMP_PLACES
unset OMP_PROC_BIND
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remember @rppawlo saying this OMP_PROC_BIND is something we want to leave up to the user rather than set (or unset) across the board. Might be wrong, though—memory is foggy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to agree with you, but it should only affect runtime behavior, not build-time, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Experiments with Trilinos showed that unsetting these improved the performance of running the Trilinos test suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, but I think it was unsetting these improved testing, where you're firing off a bunch of small things at once, but degraded performance of larger runs where you're launching one big thing. Hence the desire to leave it up to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hence the desire to leave it up to the user.

Right, but out of the box, this should work for the Trilinos test suite or someone will need to deal with a bunch of timing out Trilinos tests.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS BEEN REVIEWED, BUT NOT ACCEPTED OR REQUIRES CHANGES

e10harvey
e10harvey previously approved these changes Mar 9, 2021
Copy link
Contributor

@e10harvey e10harvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @sebrowne. On van1-tx2, can you run and check:

mkdir Trilinos-pr8850
cd Trilinos-pr8850
ln -s /path/to/Trilinos/cmake/std/atdm/ctest-s-local-test-driver.sh
nohup env ./ctest-s-local-test-driver.sh all &>ctest-s-local-test-driver.out &

Before merging?

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ e10harvey ]!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 3853
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_serial

  • Build Num: 1420
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_debug

  • Build Num: 1909
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 9256
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_cuda_10.1.105

  • Build Num: 709
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_clang_10.0.0

  • Build Num: 2071
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 4745
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Using Repos:

Repo: TRILINOS (sebrowne/Trilinos)
  • Branch: develop
  • SHA: 039d42b
  • Mode: TEST_REPO

Pull Request Author: sebrowne

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_8.3.0

  • Build Num: 3853
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_serial

  • Build Num: 1420
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_gcc_7.2.0_debug

  • Build Num: 1909
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_intel_17.0.1

  • Build Num: 9256
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_cuda_10.1.105

  • Build Num: 709
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_clang_10.0.0

  • Build Num: 2071
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4

Build Information

Test Name: Trilinos_pullrequest_python_3

  • Build Num: 4745
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 8850
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH develop
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA 039d42b
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 0f69cc4


CDash Test Results for PR# 8850.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ e10harvey ]!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR...

@sebrowne
Copy link
Contributor Author

sebrowne commented Mar 9, 2021

@e10harvey it failed, and looks unrelated to my change:

***
*** ./ctest-s-local-test-driver.sh
***

ATDM_TRILINOS_DIR = '/lustre/sebrown/Trilinos'

Load some env to get python, cmake, etc ...

Hostname 'stria-login2' matches known ATDM host 'stria-login2' and system 'van1-tx2'
Setting compiler and build options for build-name 'default'
Using ARM ATSE compiler stack ARM-20.0_OPENMPI-4.0.2 to build DEBUG code with Kokkos node type SERIAL

The following have been reloaded with a version change:
  1) arm/20.1 => arm/20.0


Currently Loaded Modules Matching: openmpi
  1) openmpi4/4.0.5

 


The following have been reloaded with a version change:
  1) openmpi4/4.0.5 => openmpi4/4.0.2


The following have been reloaded with a version change:
  1) armpl/20.1.0 => armpl/20.0.0


The following have been reloaded with a version change:
  1) git/2.26.2 => git/2.19.2


Running builds:
    van1-tx2_arm-20.0_openmpi-4.0.2_openmp_static_opt
    van1-tx2_arm-20.0_openmpi-4.0.2_openmp_static_dbg
    van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_opt
    van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_dbg
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg

***
*** ERROR: The driver script:
***
***   /lustre/sebrown/Trilinos/cmake/ctest/drivers/atdm/van1-tx2/drivers/Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt.sh
***
*** for the specified build:
***
***   van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt
***
*** does not exist!
***

***
*** ERROR: The driver script:
***
***   /lustre/sebrown/Trilinos/cmake/ctest/drivers/atdm/van1-tx2/drivers/Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg.sh
***
*** for the specified build:
***
***   van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg
***
*** does not exist!
***

Aborting the script and not running any builds!

@e10harvey
Copy link
Contributor

e10harvey commented Mar 9, 2021

@e10harvey it failed, and looks unrelated to my change:

@sebrowne, It's not finding driver files for the new supported builds in: https://github.com/trilinos/Trilinos/tree/develop/cmake/ctest/drivers/atdm/van1-tx2/drivers. You can create these via cp Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_opt.sh Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt.sh -- and similar for dbg.

@bartlettroscoe
Copy link
Member

Related to my epic SEPW-215

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR...

4 similar comments
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR...

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR...

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR...

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR...

@trilinos-autotester trilinos-autotester added the AT: STALE Added by the PR autotester if too much time has elapsed since the last successful PR test iteration label Mar 15, 2021
@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

Copy link
Member

@bartlettroscoe bartlettroscoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebrowne, the additions look reasonable. I think all that is needed is to add the driver files:

Trilinos/cmake/ctest/drivers/atdm/van1-tx2/drivers/
    Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt.sh
    Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg.sh

(just copy them from the existing files Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_opt.sh) and then run the following commands to test things:

# Check that all of the builds listed in the 'van1-txt/all_supported_builds.sh' have \
# driver files and the basic configuration works for all.
$ env Trilinos_PACKAGES=Kokkos \
  ./ctest-s-local-test-driver.sh all

# Run the new builds completely for all packages to see what it looks like on CDash
$ ./ctest-s-local-test-driver.sh \
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt.sh \
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg.sh

See ctest-s-local-test-driver.sh.

P.S. I can put in a PR that updates the instructions for adding a new configuration and what needs to be tested.

P.S. @sebrowne, just to let you know, I am happy to review updates to the ATDM Trilinos configuration when asked but I can't approve them to be merged. Someone on the @trilinos/framework needs to do that because they are responsible to traige problems triggered by such changes when they show up on CDash that result from such changes. (Or you can remove the nighty builds for this configuration that posts to CDash then that would not be an issue anymore but there obvious downsides to that. You can do that by not even listing these new builds in the van1-tx2/all_supported_builds.sh file.)

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

1 similar comment
@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest....

@sebrowne
Copy link
Contributor Author

@bartlettroscoe I added the drivers, but the current WCID in use on that machine doesn't allow me to test them.

@trilinos-autotester trilinos-autotester removed the AT: STALE Added by the PR autotester if too much time has elapsed since the last successful PR test iteration label Mar 18, 2021
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS NOT BEEN REVIEWED YET!

@e10harvey
Copy link
Contributor

I believe my testing shows that this is good to go now.

Is there a cdash link with results?

@sebrowne
Copy link
Contributor Author

sebrowne commented Apr 1, 2021

Grumble I can't find it, though I feel like there should have been (probably my mistake). I'll go re-run the test driver and see what happens.

@sebrowne
Copy link
Contributor Author

sebrowne commented Apr 5, 2021

I'm testing it now, but I'm unable to reset the proxies, which appear to prohibit the results from posting to CDash

@sebrowne
Copy link
Contributor Author

sebrowne commented Apr 6, 2021

pr.log

@sebrowne
Copy link
Contributor Author

sebrowne commented Apr 6, 2021

Console output shows all builds passed.

@bartlettroscoe
Copy link
Member

@sebrowne, looks like you only ran Kokkos tests. Can you please run:

$ ./ctest-s-local-test-driver.sh all

?

@sebrowne
Copy link
Contributor Author

sebrowne commented Apr 6, 2021

pr2.log

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Apr 6, 2021

pr2.log

Expanded below. There are a lot of failing tests (over 1400 failing out of 2400 tests). Something is seriously wrong here.

DETAILS (click to expand)
Running builds:
    van1-tx2_arm-20.0_openmpi-4.0.2_openmp_static_opt
    van1-tx2_arm-20.0_openmpi-4.0.2_openmp_static_dbg
    van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_opt
    van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_dbg
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg

Tue Apr  6 09:26:08 MDT 2021

Running Jenkins driver Trilinos-atdm-van1-tx2_arm-20.0_openmpi-4.0.2_openmp_static_opt.sh ...

    See log file Trilinos-atdm-van1-tx2_arm-20.0_openmpi-4.0.2_openmp_static_opt/smart-jenkins-driver.out

real  68m36.403s
user  3m46.209s
sys 3m38.730s

39% tests passed, 1470 tests failed out of 2426

Tue Apr  6 10:34:44 MDT 2021

Running Jenkins driver Trilinos-atdm-van1-tx2_arm-20.0_openmpi-4.0.2_openmp_static_dbg.sh ...

    See log file Trilinos-atdm-van1-tx2_arm-20.0_openmpi-4.0.2_openmp_static_dbg/smart-jenkins-driver.out

real  51m28.110s
user  3m49.431s
sys 3m38.247s

40% tests passed, 1450 tests failed out of 2412

Tue Apr  6 11:26:12 MDT 2021

Running Jenkins driver Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_opt.sh ...

    See log file Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_opt/smart-jenkins-driver.out

real  69m14.829s
user  3m41.312s
sys 3m43.010s

40% tests passed, 1467 tests failed out of 2426

Tue Apr  6 12:35:27 MDT 2021

Running Jenkins driver Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_dbg.sh ...

    See log file Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.3_openmp_static_dbg/smart-jenkins-driver.out

real  44m56.496s
user  3m46.894s
sys 3m39.101s

40% tests passed, 1449 tests failed out of 2412

Tue Apr  6 13:20:24 MDT 2021

Running Jenkins driver Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt.sh ...

    See log file Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt/smart-jenkins-driver.out

real  72m3.569s
user  3m43.832s
sys 3m50.376s

40% tests passed, 1467 tests failed out of 2426

Tue Apr  6 14:32:27 MDT 2021

Running Jenkins driver Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg.sh ...

    See log file Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg/smart-jenkins-driver.out

real  50m31.156s
user  3m46.533s
sys 3m35.068s

40% tests passed, 1449 tests failed out of 2412

Tue Apr  6 15:22:58 MDT 2021

Done running all of the builds!

@sebrowne
Copy link
Contributor Author

sebrowne commented Apr 6, 2021

I'm trying to run again, since they failed on all of the combinations I doubt it was this change

@bartlettroscoe
Copy link
Member

I'm testing it now, but I'm unable to reset the proxies, which appear to prohibit the results from posting to CDash

@sebrowne, why is your proxies not working so you can't submit to CDash? Can we fix that? If we see results on CDash, it will be easier to see what is going wrong.

@sebrowne
Copy link
Contributor Author

sebrowne commented Apr 6, 2021

I tried adjusting the proxies in my environment, the system always attempts to use the same one regardless of what is set

@bartlettroscoe
Copy link
Member

@e10harvey, @trilinos/framework, I am also getting CDash submit failures as well from 'stria' running:

$ unset HTTPS_PROXY
$ unset HTTP_PROXY
$ unset http_proxy
$ unset https_proxy

$ env Trilinos_PACKAGES=Kokkos ./ctest-s-local-test-driver.sh \
      van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt \
      van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg

I am seeing CTest -S output showing that it is using a default proxy:

   Use HTTP Proxy: <redacted>
   Send to group: Experimental
   SubmitURL: http://testing.sandia.gov/cdash/submit.php?project=Trilinos
   Submit failed, waiting 3 seconds...
   Retry submission: Attempt 1 of 5
   Submit failed, waiting 3 seconds...
   ...

Somehow this is working with the Jenkins jobs that run and submit to CDash as shown, for example, here:

We need to figure out why ctest -S is picking up this proxy and how to tell ctest to stop doing that.

This helps to debug problems with http_proxy and HTTP_PROXY being set which
breaks submitting to the Trilinos CDash site.
Something changed in CTest or on 'stria' so that the HTTP_PROXY var being set
by default in the env load is getting picked up by CDash.
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS NOT BEEN REVIEWED YET!

@bartlettroscoe
Copy link
Member

NOTE: I fixed this in commit 39becf8 on this branch but it appears that someone already fixed this in commit baf11ea on 3/8/2021 by @e10harvey.

I will resolve the conflict so this can merge.

…rilinos#8850)

I resolved the conflict in the file:

* cmake/ctest/drivers/atdm/utils/setup_env.sh

that fixed the same issue with HTTP_PROXY on the 'develop' branch in commit
baf11ea.

This now allows submitting to CDash from 'stria'.
@bartlettroscoe
Copy link
Member

@sebrowne, can you pull this updated branch and try running:

$ ./ctest-s-local-test-driver.sh \
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt \
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg

? It should submit to CDash now. At least it is working for me as shown at:

@bartlettroscoe
Copy link
Member

@sebrowne, the reason there are so many non-passing tests is that there are build errors. See:

Looks like these are caused by the build_stats wrapper. I will disabled the build stats wrapper in that build and fire off again.

@bartlettroscoe
Copy link
Member

FYI: I am running:

$ env Trilinos_ENABLE_BUILD_STATS=OFF \
   ./ctest-s-local-test-driver.sh \
     van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt \
     van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg

and it is submitting to CDash but now we are getting a bunch of license server errors shown at:

showing:

clang-9: error: Failed to check out a license. See below for more details.
clang-9: note: If you need further help, provide this complete error report to your supplier or support-hpc-sw@arm.com.
 - Product information location: /opt/arm/arm-linux-compiler-20.1_Generic-AArch64_RHEL-7_aarch64-linux/sw-mappings
 - Toolchain location: /opt/arm/arm-linux-compiler-20.1_Generic-AArch64_RHEL-7_aarch64-linux/llvm-bin

clang-9: note:  - Checkout feature: compiler
 - Feature version: 15.20200409
 - ALMS error code: -114
 - ALMS error message: Timed out while contacting server

Seems that too many people are trying to run builds on 'stria' nodes at the same time. (See CDOFA-116).

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Apr 8, 2021

FYI: Today I again ran:

$ env Trilinos_ENABLE_BUILD_STATS=OFF \
    ./ctest-s-local-test-driver.sh \
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt \
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg

***
*** ./ctest-s-local-test-driver.sh
***

...

Running builds:
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt
    van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg

Thu Apr  8 09:10:33 MDT 2021

Running Jenkins driver Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt.sh ...

    See log file Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt/smart-jenkins-driver.out

real    21m31.472s
user    6m4.819s
sys     7m9.958s

99% tests passed, 4 tests failed out of 2431

Thu Apr  8 09:32:04 MDT 2021

Running Jenkins driver Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg.sh ...

    See log file Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg/smart-jenkins-driver.out

real    70m45.035s
user    6m9.247s
sys     6m50.718s

99% tests passed, 10 tests failed out of 2417

Thu Apr  8 10:42:49 MDT 2021

Done running all of the builds!

and it posted to CDash at:

showing:

Site Build Name Conf Error Conf Warn Conf Test Time Build Error Build Warn Build Test Time Test Not Run Test Fail Test Pass Test Time Test Proc Time Start Test Time Labels
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp 0 0 4m 11s 0 0 9m 40s 0 10 2407 55m 54s 13h 8m 50s 2 hours ago (31 labels)
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt-exp 0 0 4m 14s 0 1 2m 54s 0 4 2427 13m 25s 3h 19m 59s 3 hours ago (31 labels)

with the non-passing tests shown at:

showing:

Site Build Name Test Name Status Time Proc Time Details Build Time Processors
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp Belos_CustomSolverFactory_MPI_4 Failed 3s 380ms 13s 520ms Completed (Failed) 2021-04-08T09:32:28 MDT 4
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp Intrepid2_unit-test_Projection_Serial_Test_Convergence_HEX_MPI_1 Failed 10m 140ms 10m 140ms Completed (Timeout) 2021-04-08T09:32:28 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp ROL_example_PDE-OPT_helmholtz_example_02_MPI_1 Failed 10m 140ms 10m 140ms Completed (Timeout) 2021-04-08T09:32:28 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp ROL_example_PDE-OPT_navier-stokes_example_01_MPI_4 Failed 10m 130ms 40m 520ms Completed (Timeout) 2021-04-08T09:32:28 MDT 4
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp ROL_test_vector_StdArrayInterface_MPI_1 Failed 2s 780ms 2s 780ms Completed (Failed) 2021-04-08T09:32:28 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp SEACASExodus_for_exodus_unit_tests Failed 1s 600ms 1s 600ms Completed (Failed) 2021-04-08T09:32:28 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt-exp SEACASExodus_for_exodus_unit_tests Failed 1s 250ms 1s 250ms Completed (Failed) 2021-04-08T09:10:55 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp SEACASExoIIv2for32_exodus_nc4_unit_tests Failed 870ms 870ms Completed (Failed) 2021-04-08T09:32:28 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt-exp SEACASExoIIv2for32_exodus_nc4_unit_tests Failed 950ms 950ms Completed (Failed) 2021-04-08T09:10:55 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp SEACASExoIIv2for32_exodus_unit_tests Failed 510ms 510ms Completed (Failed) 2021-04-08T09:32:28 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt-exp SEACASExoIIv2for32_exodus_unit_tests Failed 570ms 570ms Completed (Failed) 2021-04-08T09:10:55 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp SEACASExoIIv2for32_exodus_unit_tests_nc4_env Failed 870ms 870ms Completed (Failed) 2021-04-08T09:32:28 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt-exp SEACASExoIIv2for32_exodus_unit_tests_nc4_env Failed 900ms 900ms Completed (Failed) 2021-04-08T09:10:55 MDT 1
stria Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg-exp ShyLU_NodeHTS_hts_test_1 Failed 19s 480ms 38s 960ms Completed (Failed) 2021-04-08T09:32:28 MDT 2

Almost all of these tests are already failing in other configurations but a couple may not be. (Someone will need to triage those once this nightly build is set up and posting to CDash).

@trilinos/framework, IMHO, the running of the ctest-s-local-test-driver.sh script, posting results to CDash and looking at the Trilinos results (like I show above) should be the job on someone on the Trilinos side who is familiar with this process. Once a customer like @sebrowne has posted the initial PR like this that updates the basics configuration files under the cmake/std/atdm/ directory, they should be done. If it is found through running ctest-s-local-test-driver.sh and posting to CDash that there is some major problem with the configuration, then this can be kicked back to the customer who is suggesting this change and ask them to see if there is a problem with the configuration. That seems like a reasonable and sustainable separation of responsibilities. (If that was done, them someone on the Trilinos side would have immediately seen the proxy problem and resolved it since it already got resolved in 'develop' independently in commit baf11ea.)

CC: @tcfisher

Copy link
Contributor

@e10harvey e10harvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @sebrowne and @bartlettroscoe. Please see a couple minor items. Going forward, please use the tip of develop; there are many changes in flight which cannot always be recalled on a repo of this size so it's best to use the tip of develop. I will discuss using the auto-tester to run tests when changes to the atdm subdir are made with @trilinos/framework.

Full van1-tx2 tests will be posted to cdash later on 04/09.

Comment on lines +18 to +21
echo
echo "Printing all of the proxy env vars:"
set | grep -i proxy=

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove.

Comment on lines +35 to +37
echo
echo "Unsetting HTTP_PROXY and http_proxy env vars for submit to CDash"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS BEEN REVIEWED, BUT NOT ACCEPTED OR REQUIRES CHANGES

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Apr 9, 2021

Please make any changes you want. I will leave the rest to you and the framework team.

@e10harvey
Copy link
Contributor

Closing this since #9001 has merged.

@e10harvey e10harvey closed this Apr 14, 2021
@jjellio jjellio mentioned this pull request Apr 21, 2021
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants