Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trilinos PR testing is running with reduced CUDA testing #10147

Closed
jwillenbring opened this issue Feb 1, 2022 · 10 comments
Closed

Trilinos PR testing is running with reduced CUDA testing #10147

jwillenbring opened this issue Feb 1, 2022 · 10 comments
Labels
autotester Issues related to the autotester. CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. impacting: tests The defect (bug) is primarily a test failure (vs. a build failure) MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. system: gpu type: bug The primary issue is a bug in Trilinos code or tests

Comments

@jwillenbring
Copy link
Member

Bug Report

@trilinos/developers

Due to system issues, the base CUDA build current running at pull request time is slightly modified from the typical CUDA build, and there is currently no UVM=off CUDA build running at PR time. Restoring the standard set of builds will require hardware fixes that are in flight.

Longer term, we are looking to migrate these builds to different hardware.

@jwillenbring jwillenbring added type: bug The primary issue is a bug in Trilinos code or tests impacting: tests The defect (bug) is primarily a test failure (vs. a build failure) system: gpu autotester Issues related to the autotester. labels Feb 1, 2022
@jwillenbring jwillenbring pinned this issue Feb 1, 2022
@bartlettroscoe
Copy link
Member

@trilinos/framework, can you please remind us what internal CDash site that the Trilinos PR CUDA builds on 'vortex' are being pushed to? I know it is not testing.sandia.gov/cdash/ but I don't remember that the internal CDash site this is. There is no indications with the autotester output like in #10216 where this other CDash site is. The summary says:

Test Name: _cuda_10.1.243

  • Build Num: 375
  • Status: ERROR

but there is no indication where those results are posted to.

@jhux2
Copy link
Member

jhux2 commented Feb 21, 2022

@bartlettroscoe I think it's https://trilinos-cdash.sandia.gov.

@cgcgcg
Copy link
Contributor

cgcgcg commented Feb 21, 2022

If that is indeed the correct link, then I can't access it. Can we make sure that results are getting posted to a location that all developers can see?

@e10harvey
Copy link
Contributor

If that is indeed the correct link, then I can't access it. Can we make sure that results are getting posted to a location that all developers can see?

Yes. This is only temporary until the system issues (reported via #10147) affecting the cuda 10.1.105 builds are resolved.

@jwillenbring
Copy link
Member Author

Today we enabled a new CUDA 11, UVM off build in PR and dev->master testing. We have seen some issues with this build running to completion after having tested it yesterday and successfully running with 2690 tests passing, 0 failing, and no build failures. We will continue to monitor this new build.

@jwillenbring
Copy link
Member Author

There are a couple of tests that are failing randomly with some regularity, and there are some things we would still like to turn on with this build, but the core problem for this issue has been resolved - we have a UVM OFF build back in PR, so I am closing this issue.

@jwillenbring jwillenbring unpinned this issue Jul 28, 2022
@lucbv
Copy link
Contributor

lucbv commented Jul 28, 2022

@jwillenbring do we have an issue to track the problem with CDash access that is reported by @cgcgcg above? If not I will re-open this issue as not fully completed.

@cgcgcg
Copy link
Contributor

cgcgcg commented Jul 29, 2022

I'm not aware of another ticket. Re-opening as not fixed.

@cgcgcg cgcgcg reopened this Jul 29, 2022
@github-actions
Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Jul 29, 2023
@github-actions
Copy link

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autotester Issues related to the autotester. CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. impacting: tests The defect (bug) is primarily a test failure (vs. a build failure) MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. system: gpu type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

6 participants