-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Four MueLu tests broken in CI build in push to MueLu on 2/19/2018 #2264
Comments
Wait, looking at PR #2171, it appears that the automated PR testing says that this branch passed as of a day ago (i.e. 2/18/2018). How can that be the case? For one thing, the PR testing should have been broken for all branches that changed any code in Stokhos or upstream from Stokhos since 2/15/2018 as per #2254. Given the nature of the failures shown for these tests, I can't see how any build involving these tests can be passing. For example, the test shows a failure:
That is a ParameterList error. That is not some floating point rounding error or something. Therefore, every build should generate this same error. To protect developers still using the checkin-test-sems.sh script, I will disable these tests just for that system. But if this also fails in any of the the ATDM builds tomorrow morning, we will need to back out this merge merge commit tomorrow. Otherwise, the @trilinos/framework team should investigate how the PR testing could report this branch as passing when it should have been reported as failing (as per #2254). |
This will fix the standard CI build but not any other build. These tests will run in every other build they were running before. Build/Test Cases Summary Enabled Packages: MueLu Disabled Packages: PyTrilinos,Claps,TriKota 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=76,notpassed=0 (20.46 min)
I just pushed the commit a332932:
We should see the standard CI build clear up next CI iteration. We will see what happens with the other builds. |
These four
(see the (Yea, that feature that CASL paid for works!) But as I predicted, these tests are failing in about every other build of Trilinos, including all of the "Clean" builds as shown at: (NOTE: That same query is broken on the testing.sandia.gov site :-( ) This means that all of the automated PR builds that include changes to MueLu or packages upstream to MueLu should fail. Again I ask, how is it that the automated PR testing for this branch say that it passed and then to merge it and have it fail everywhere? |
@bartlettroscoe I have issued a new pull request (#2266) to fix these dashboard failures. I think the main reason for this failure is that the auto-tester is not yet pushing results to a dashboard accessible by everyone. I could have realized that a package or test was not properly being tested had been able to read what the tester had done. |
You did not do anything wrong. You were following the process like you were told to follow and that process somehow allowed broken tests to get merged to 'develop'. Since the current automated PR testing process is opaque, we can only guess what happened. Therefore, I would recommend that until the automated PR testing process is displaying results on CDash so we know exactly what is being tested, that it might be better to do that final merge and push using the checkin-test-sems.sh script using the tip on merging and pushing topic branches at: But in the meantime, every build of Trilinos that runs MueLu tests is currently broken (which should break every auto PR test involving changes to MueLu or upstream packages if it is doing what it should). Therefore, we either need to back out your merge commit or fix the failing tests ASAP. NOTE: If you want to fix and then push with the checkin-test-sems.sh script, you will first need to revert the targeted disable of these tests in your local branch. You can do this with:
Then you should be able to reproduce the test failures using:
And then you can fix them and verify that they are fixed by rerunning that build (see the instructions on locally running that build at https://github.com/trilinos/Trilinos/wiki/Policies-%7C-Safe-Checkin-Testing#reproducing-failures-manually). Please let me know if you need any help with this if even it is to just revert your merge commit so you can work on the fix for this offline. |
@bartlettroscoe, I ran the checkin script with all tests enabled and it passed fine, although for some reason I did not need to follow the procedure that you outlined... here is the output of my checkin, as you can see both E and T petra unit-test are passing. I did these tests in my own branch before merging the pull request, see 9b2061c. |
@lucbv, that does not seem possible given what we are seeing on CDash. The change made in 9b2061c looks to be unrelated to the failures we are seeing on CDash like at: I just did:
and now I am running:
I will post the results here when finished. |
But note that the following commits have been pushed the the 'develop' branch since the nightly tests ran last night:
Did one of these commits fix the failing tests? Me thinks one of those commits from @jhux2 might have fixed these tests. We will see. If so, that would explain why you are now seeing passing tests. |
@bartlettroscoe I think that you want to look at 605e805, I think that it should fix the issue with the unit-tests, it was also included today's pull request. The previous commit I was pointing you to was so you can see the checkin script message saying the tests have passed. |
Yup, it passed:
Looks like @jhux2 fixed the failing tests. I will push the revert of the commit a332932. But none of this explains how the PR testing reported in #2171 passed but resulted in these tests failing. Someone must try to figure that out so that the process and/or tools can be changed to avoid this in the future. I will bet almost anything that if you checkout the merge commit 22e7f91 locally and then run:
that it will show these four tests as failing. |
I just pushed the commit:
which showed these 4 tests passing. Now we wait to see these four tests showing up again in the CI build and watch to see that all of the other builds are fixed w.r.t. to these failing tests. |
@bartlettroscoe it seems that my pull request fixed these issues as the dashboard is now clean. I am closing this issue. Thanks for the help |
And we see these four tests have been added back to the standard CI build in the CI iteration this morning at: |
Just to summarize, this failure was caused due to a violation of the additive test assumption of branches due to a 10 day time lag between when the branch was tested and when it was merged. See #2171 (comment) for details. The need to address this in the auto PR system was added to #2312. |
CC: @trilinos/muelu, @trilinos/framework
Next Action Status
The original failure was caused due to a violation of the additive test assumption of branches due to a 10 day time lag between when the branch was tested and when it was merged. See #2171 (comment) and #2312.
Description
The push of the merge commit:
broke the standard CI build as shown at:
which involved breaking the four tests:
This was the only merge commit pulled in the CI iteration as shown at:
Therefore, I will back out this merge commit and run the checkin-test-sems.sh script to fix this ASAP (but I will only run the MueLu test to speed this up). Since the CI build passed before this merge commit was pushed, that should be enough testing. Otherwise, everyone's automated PR testing for changes to MueLu or upsstream from MueLu will fail because of this.
The text was updated successfully, but these errors were encountered: