Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve testing for updating from 'develop' branch to 'master' branch by querying CDash results #380

Closed
bartlettroscoe opened this issue May 21, 2016 · 14 comments
Labels
Framework tasks Framework tasks (used internally by Framework team) resolved: wontfix The development team cannot or will not address this issue story The issue corresponds to a Kanban Story (vs. Epic or Task)

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented May 21, 2016

Next Action Status:

Waiting for #370 to be completed before getting started on this ...

Blocked By: #370

CC: @jwillenbring, @bmpersc, @mhoemmen, @trilinos/framework

Description:

The Story #370 implemented the initial transition to the 'develop'/'master' branch workflow using just a simple automated job that just does one build with the checkin-test.py script and then pushes the updated 'master' branch. This is a follow-on Story to put the infrastructure and the process in place to beef up the level of testing that needs to pass before updating the 'master' branch. The approach will be to take advantage of the new CDash API that allows you to download CDash query results as a JSON data-structure and then use a Python script to inspect it and make sure all of the targeted builds and packages all passed.

This has already been implemented for CASL VERA in the Python script vera_cdash_pass_fail.py. This script is run as:

$ cd VERA/
$ ./cmake/ctest/drivers/vera_cdash_pass_fail.py --date=2016-04-28

…

Getting data from:

  https://casl-dev.ornl.gov/testing/api/v1/index.php?project=VERA&date=2016-04-28&filtercount=4&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=Nightly&field2=subprojects&compare2=92&value2=VUQCore&field3=subprojects&compare3=92&value3=VUQDemos

VERA builds failed!

Error, the build {u'buildname': u'Linux-GCC-4.8.3-MPI_RELEASE_SHARED_HEAVY', u'test': {u'fail': 2, u'timefull': 44707, u'time': u'12h 25m 7s ', u'notrun': 0, u'pass': 1750}, … } failed!
…
FINAL: One of the VERA or VERADriver builds on 2016-04-28 FAILED

This script uses the reusable Python module:

  tribits/ci_support/CDashQueryPassFail.py

The script vera_cdash_pass_fail.py does one query of the VERA CDash project and one of the VERADriver CDash project (created using the TriBITS Dashboard Driver (TDD) system). The VERA CDash query is shown above and it is examined to make sure that all of the expected builds are present and that there are no failing configures, builds or tests. The VERADriver query is done to make sure that all of the outer CTest driver builds ran and there was no failures (e.g. timeouts). Without querying the VERADriver project to make sure that none of the CTest drivers timed out, then having all passing VERA builds is not sufficient (because packages may have never been tested).

Because Trilinos uses Jenkins to run CTest build drivers, a query of Jenkins may be needed to ensure that none of the selected builds timed out. However, it appears that Jenkins supports a remote API so developing queries of Jenkins should be possible as well.

Tasks:

  1. Upgrade the Trilinos CDash server to a current version that supports the new CDash API interface. (NOTE: This might also require upgrading the server hardware and env so that it can handle the required volume of Trilinos CDash submits.)
  2. Develop a Python script trilinos_cdash_pass_fail.py that will run and inspect one or more queries of Trilinos CDash project and return false if any of these queries show failures. Also, do queries of the Jenkins CDash drivers to make sure that the outer CTest driver invocations don't timeout or otherwise fail.
  3. Select a (small) initial set of Trilinos builds and set of packages in one or more CDash queries that must pass before updating the 'master' branch from the 'develop' branch. (NOTE: Adding these extra builds are likely to be separate stories. The story will be just to set up the infrastructure to query CDash correctly and get just a small number of extra builds in the initial set. Every new build you add creates "value" and can therefore be its own Story.)
@bartlettroscoe bartlettroscoe added Framework tasks Framework tasks (used internally by Framework team) story The issue corresponds to a Kanban Story (vs. Epic or Task) labels May 21, 2016
@bartlettroscoe
Copy link
Member Author

The below email indicates that we should have at least one OSX build in the set of queried build on CDash.

There are several other builds that will need to be added as well obviously. As stated above, adding these extra builds are likely to be separate stories. The story will be just to set up the infrastructure to query CDash correctly and get just a small number of extra builds in the initial set.


From: Trilinos-developers [mailto:trilinos-developers-bounces@trilinos.org] On Behalf Of Tobias Wiesner
Sent: Tuesday, May 17, 2016 10:32 AM
To: trilinos-developers@trilinos.org
Subject: Re: [Trilinos-developers] [EXTERNAL] Mac clang build error in nightly build

Thanks, Stefan.
It's my fault. I will fix it right away.

Tobias
On 05/17/2016 07:14 AM, Domino, Stefan Paul wrote:
Greetings,

Did anyone else see this build error? This occurred on Mac clang this morning. At this point, I have not looked at my other tests.

Regards,

Stefan

/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/src/CrsMatrix/Xpetra_CrsMatrixFactory.hpp:105:25: error: 
      allocating an object of abstract class type 'TpetraCrsMatrix<double, int,
      long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial,
      Kokkos::HostSpace> >'
        return rcp( new TpetraCrsMatrix<Scalar, LocalOrdinal, GlobalOrdi...
                        ^
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/sup/Matrix/Xpetra_CrsMatrixWrap.hpp:121:37: note: 
      in instantiation of member function 'Xpetra::CrsMatrixFactory<double, int,
      long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial,
      Kokkos::HostSpace> >::Build' requested here
    matrixData_ = CrsMatrixFactory::Build(rowMap, NumEntriesPerRowToAllo...
                                    ^
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/sup/Utils/Xpetra_MatrixMatrix.hpp:1559:23: note: 
      in instantiation of member function 'Xpetra::CrsMatrixWrap<double, int,
      long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial,
      Kokkos::HostSpace> >::CrsMatrixWrap' requested here
          C = rcp(new Xpetra::CrsMatrixWrap<SC,LO,GO,NO>(A.getRowMap(), ...
                      ^
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/src/CrsMatrix/Xpetra_CrsMatrix.hpp:226:18: note: 
      unimplemented pure virtual method 'leftScale' in 'TpetraCrsMatrix'
    virtual void leftScale (const Vector<Scalar, LocalOrdinal, GlobalOrd...
                 ^
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/src/CrsMatrix/Xpetra_CrsMatrix.hpp:229:18: note: 
      unimplemented pure virtual method 'rightScale' in 'TpetraCrsMatrix'
    virtual void rightScale (const Vector<Scalar, LocalOrdinal, GlobalOr...
                 ^
In file included from /Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/sup/Utils/Xpetra_IteratorOps.cpp:47:
In file included from /Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/sup/Utils/Xpetra_IteratorOps.hpp:52:
In file included from /Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/sup/Matrix/Xpetra_Matrix.hpp:60:
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/src/CrsMatrix/Xpetra_CrsMatrixFactory.hpp:89:25: error: 
      allocating an object of abstract class type 'TpetraCrsMatrix<double, int,
      long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial,
      Kokkos::HostSpace> >'
        return rcp( new TpetraCrsMatrix<Scalar, LocalOrdinal, GlobalOrdi...
                        ^
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/sup/Matrix/Xpetra_CrsMatrixWrap.hpp:111:37: note: 
      in instantiation of member function 'Xpetra::CrsMatrixFactory<double, int,
      long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial,
      Kokkos::HostSpace> >::Build' requested here
    matrixData_ = CrsMatrixFactory::Build (rowMap, maxNumEntriesPerRow, pftype);
                                    ^
/Users/naluIt/gitHubWork/nightlyBuildAndTest/Trilinos/packages/xpetra/sup/Utils/Xpetra_MatrixMatrix.hpp:1577:23: note: 
      in instantiation of member function 'Xpetra::CrsMatrixWrap<double, int,
      long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial,
      Kokkos::HostSpace> >::CrsMatrixWrap' requested here
          C = rcp(new Xpetra::CrsMatrixWrap<SC,LO,GO,NO>(A.getRowMap(), ...
                      ^
2 errors generated.

@mhoemmen
Copy link
Contributor

Thanks Ross!

Select a (small) initial set of Trilinos builds and set of packages in one or more CDash queries that must pass before updating the 'master' branch from the 'develop' branch.

I proposed in #370 that these builds should enable as many TPLs as possible, since enabling a TPL in Trilinos generally enables more code than it disables. (Amesos(2) and Zoltan(2) have good examples -- each TPL has interface code and tests.) We should at least enable the latest or "most common" versions of all the TPLs that we claim to support in the following list:

https://trilinos.org/about/tpl-version-compatibility/

@bmpersc
Copy link
Contributor

bmpersc commented May 23, 2016

My biggest concern with having all of the jobs enable as many TPLs as possible is that makes it very easy for optional dependencies to become required dependencies. I agree we should do some testing with as many TPLs as is reasonable, but we should also have some builds that are minimal to protect against dependency creep.

@mhoemmen
Copy link
Contributor

My biggest concern with having all of the jobs enable as many TPLs as possible is that makes it very easy for optional dependencies to become required dependencies. I agree we should do some testing with as many TPLs as is reasonable, but we should also have some builds that are minimal to protect against dependency creep.

That's a good point. It would make sense to have both a minimal TPL build (BLAS, LAPACK, MPI), and a "typical set of enables for apps" build.

@bartlettroscoe
Copy link
Member Author

Here is another build we might consider adding the the criteria to update from 'develop' to 'master'.


-----Original Message-----
From: tpetra-developers-bounces@software.sandia.gov [mailto:tpetra-
developers-bounces@software.sandia.gov] On Behalf Of Hoemmen, Mark
Sent: Tuesday, May 31, 2016 5:22 PM
To: Bettencourt, Matthew; tpetra-developers@software.sandia.gov
Subject: Re: [Tpetra-Developers] compiler that works for tpetr now?

I need to finish a thing first (due today) before I work on this.
mfh

On 5/31/16, 3:21 PM, "tpetra-developers-bounces@software.sandia.gov on
behalf of Hoemmen, Mark" <tpetra-developers-bounces@software.sandia.gov
on behalf of mhoemme@sandia.gov> wrote:

We’re on it.
mfh

On 5/31/16, 2:46 PM, "Bettencourt, Matthew" mbetten@sandia.gov wrote:

I've noticed that 4.9.[23] no longer compiler tpetra, what are people
recommending?dir-opt-openmp (gcc) 28 $ ninja -j 1 [1/1646] Building
CXX object
packages/tpetra/core/src/CMakeFiles/tpetra.dir/Tpetra_Experimental_Blo
ckCrsMatrix_DOUBLE_INT_INT_OPENMP.cpp.o
FAILED:
packages/tpetra/core/src/CMakeFiles/tpetra.dir/Tpetra_Experimental_Blo
ckCrsMatrix_DOUBLE_INT_INT_OPENMP.cpp.o

/projects/install/rhel6-x86_64/sems/compiler/gcc/4.9.3/openmpi/1.10.1/
bin/mpicxx -I.
-I/home/mbetten/Trilinos/Trilinos/packages/tpetra/core/src
-I/home/mbetten/Trilinos/Trilinos/packages/tpetra/core/src/kokkos_refa
ctor -Ipackages/tpetra/core/src -Ipackages/tpetra/kernels/src
-I/home/mbetten/Trilinos/Trilinos/packages/tpetra/kernels/src
-I/home/mbetten/Trilinos/Trilinos/packages/tpetra/kernels/src/impl
-I/home/mbetten/Trilinos/Trilinos/packages/tpetra/kernels/src/stage/gr
aph
-I/home/mbetten/Trilinos/Trilinos/packages/tpetra/kernels/src/stage/gr
aph/impl -I/home/mbetten/Trilinos/Trilinos/packages/teuchos/comm/src
-I/home/mbetten/Trilinos/Trilinos/packages/teuchos/parameterlist/src
-Ipackages/teuchos/core/src
-I/home/mbetten/Trilinos/Trilinos/packages/teuchos/core/src
-Ipackages/kokkos/core/src
-I/home/mbetten/Trilinos/Trilinos/packages/kokkos/core/src
-I/projects/install/rhel6-x86_64/sems/tpl/boost/1.59.0/gcc/4.9.3/base/
include
-Ipackages/kokkos/algorithms/src
-I/home/mbetten/Trilinos/Trilinos/packages/kokkos/algorithms/src
-Ipackages/kokkos/containers/src
-I/home/mbetten/Trilinos/Trilinos/packages/kokkos/containers/src
-I/home/mbetten/Trilinos/Trilinos/packages/tpetra/classic/LinAlg
-I/home/mbetten/Trilinos/Trilinos/packages/tpetra/classic/NodeAPI
-Ipackages/tpetra/classic/NodeAPI -Ipackages/tpetra/classic/src
-I/home/mbetten/Trilinos/Trilinos/packages/tpetra/classic/src
-Ipackages/teuchos/kokkoscomm/src
-I/home/mbetten/Trilinos/Trilinos/packages/teuchos/kokkoscomm/src
-Ipackages/teuchos/kokkoscompat/src
-I/home/mbetten/Trilinos/Trilinos/packages/teuchos/kokkoscompat/src
-I/home/mbetten/Trilinos/Trilinos/packages/teuchos/remainder/src
-Ipackages/teuchos/remainder/src
-I/home/mbetten/Trilinos/Trilinos/packages/teuchos/numerics/src
-Ipackages/epetra/src
-I/home/mbetten/Trilinos/Trilinos/packages/epetra/src -std=c++11
-fopenmp -O3 -MMD -MT
packages/tpetra/core/src/CMakeFiles/tpetra.dir/Tpetra_Experimental_Blo
ckCrsMatrix_DOUBLE_INT_INT_OPENMP.cpp.o
-MF
packages/tpetra/core/src/CMakeFiles/tpetra.dir/Tpetra_Experimental_Blo
ckCrsMatrix_DOUBLE_INT_INT_OPENMP.cpp.o.d
-o
packages/tpetra/core/src/CMakeFiles/tpetra.dir/Tpetra_Experimental_Blo
ckCrsMatrix_DOUBLE_INT_INT_OPENMP.cpp.o
-c
packages/tpetra/core/src/Tpetra_Experimental_BlockCrsMatrix_DOUBLE_IN
T
INT_OPENMP.cpp
In file included from
packages/tpetra/core/src/Tpetra_Experimental_BlockCrsMatrix_DOUBLE_IN
T_INT_OPENMP.cpp:71:0:
/home/mbetten/Trilinos/Trilinos/packages/tpetra/core/src/Tpetra_Experime
ntal_BlockCrsMatrix_def.hpp:
In lambda function:
/home/mbetten/Trilinos/Trilinos/packages/tpetra/core/src/Tpetra_Experime
ntal_BlockCrsMatrix_def.hpp:1464:15:
internal compiler error: in gimplify_var_or_parm_decl, at gimplify.c:1741
} ); // for each workset of rows
^
0x926bb6 gimplify_var_or_parm_decl
../.././gcc/gimplify.c:1741
0x92aaaf gimplify_expr(tree_node**, gimple_statement_base**,
gimple_statement_base**, bool (
)(tree_node_), int)
../.././gcc/gimplify.c:8058
0x9280e6 gimplify_modify_expr
../.././gcc/gimplify.c:4527
0x929b6e gimplify_expr(tree_node**, gimple_statement_base**,
gimple_statement_base**, bool ()(tree_node), int)
../.././gcc/gimplify.c:7627
0x92fc24 gimplify_stmt
../.././gcc/gimplify.c:5373
0x92fc24 gimplify_and_add
../.././gcc/gimplify.c:385
0x92fc24 gimplify_init_ctor_eval
../.././gcc/gimplify.c:3558
0x927492 gimplify_init_constructor
../.././gcc/gimplify.c:3904
0x927db1 gimplify_modify_expr_rhs
../.././gcc/gimplify.c:4167
0x927ee4 gimplify_modify_expr
../.././gcc/gimplify.c:4486
0x929b6e gimplify_expr(tree_node**, gimple_statement_base**,
gimple_statement_base**, bool ()(tree_node), int)
../.././gcc/gimplify.c:7627
0x929c1a gimplify_target_expr
../.././gcc/gimplify.c:5304
0x929c1a gimplify_expr(tree_node**, gimple_statement_base**,
gimple_statement_base**, bool ()(tree_node), int)
../.././gcc/gimplify.c:7994
0x929f7e gimplify_addr_expr
../.././gcc/gimplify.c:4833
0x929f7e gimplify_expr(tree_node**, gimple_statement_base**,
gimple_statement_base**, bool ()(tree_node), int)
../.././gcc/gimplify.c:7673
0x92c1da gimplify_arg(tree_node**, gimple_statement_base**, unsigned int)
../.././gcc/gimplify.c:2211
0x92cbaa gimplify_call_expr
../.././gcc/gimplify.c:2395
0x929a00 gimplify_expr(tree_node**, gimple_statement_base**,
gimple_statement_base**, bool ()(tree_node), int)
../.././gcc/gimplify.c:7598
0x92c286 gimplify_stmt(tree_node**, gimple_statement_base**)
../.././gcc/gimplify.c:5373
0x92ad04 gimplify_cleanup_point_expr
../.././gcc/gimplify.c:5149
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
ninja: build stopped: subcommand failed.
dir-opt-openmp (gcc) 29 $


Tpetra-Developers mailing list
Tpetra-Developers@software.sandia.gov
https://software.sandia.gov/mailman/listinfo/tpetra-developers


Tpetra-Developers mailing list
Tpetra-Developers@software.sandia.gov
https://software.sandia.gov/mailman/listinfo/tpetra-developers

@crtrott
Copy link
Member

crtrott commented May 31, 2016

The issue is compiler versions in this case. At some point last week stuff got broken for GCC 4.7.2 with an internal compiler error because Mark was using only 4.9.2 to check in. Now its the other way around, he used 4.7.2 to check in and didn't notice that it broke 4.9 and newer with internal compiler errors.

What that clearly means is that we need multi compiler tests rather sooner than later.

@agsalin
Copy link
Contributor

agsalin commented May 31, 2016

I would like to have an Albany build (flavor TBD) to be part of the criteria for moving Trilinos code from develop to master. Albany catches issues with header installation, issues to do with headers being included in multiple files, namespace collisions, unexpected non-backward-compatible changes, CMake issues with seacas and stk, and sometimes provides code coverage beyond individual package tests.

I believe that Albany catches the vast majority of issues that slip through the current process and break Nalu.

@nmhamster
Copy link
Contributor

@bartlettroscoe you might want to speak to Aaron Levine (@allevin) in CSRI since he has working Jenkins remote querying scripts which work with test checks and automated Pull Request generation from devel to master branches. This was developed for the SST project and has been successfully running for around 4 months already.

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Jun 1, 2016

I would like to have an Albany build (flavor TBD) to be part of the criteria for moving Trilinos code from develop to master.

@agsalin, that would be good if this could be made solid. First, someone would need to set up an automated build of Trilinos for the exact configuration of Trilinos used by Albany and post to the Trilinos CDash dashboard. It would be good if this exact build was then used to install Trilinos and the Albany build(s) would build against that installed version of Trilinos. Then that Trilinos build (or builds) could also be queried in the trilinos_version_pass_fail.py script. The problem that we would have is what criteria would we use for Albany? Does Albany keep 100% passing tests? If it does not, then how do you know if Albany is okay? For SIERRA/Trilinos integration back in 2008, I had to write a script that compared the runtest output for SIERRA with the current and the new Trilinos versions and then do the update only if there where no new test SIERRA failures. (If Trilinos was updated in SIERRA only when SIERRA had 100% passing tests, then Trilinos would never have been updated.) Something similar would need to be implemented for Albany or currently failing Albany tests would block the update of the Trilinos 'master' branch when there was nothing really wrong with Trilinos. I think to make this robust and not unnecessarily block the updates of Trilinos 'master', this could be a lot of work (it was for SIERRA). I think that might be enough work that it be worth creating a separate GitHub issue for this. Does that make sense?

@bartlettroscoe
Copy link
Member Author

@bartlettroscoe you might want to speak to Aaron Levine (@allevin) in CSRI since he has working Jenkins remote querying scripts which work with test checks and automated Pull Request generation from devel to master branches. This was developed for the SST project and has been successfully running for around 4 months already.

@nmhamster, it would be good what was set up and how hard it is to work with the GitHub and Jenkins APIs. I will be at SNL/ABQ next week. Perhaps we can arrange a meeting then? I know @bmpersc and @jwillenbring would be interested since they mentioned this at the Trilinos Leaders Meeting today.

@agsalin
Copy link
Contributor

agsalin commented Jun 1, 2016

@bartlettroscoe (re 2 comments ago): The master branch of Albany on Linux is at or near 100% much of the time, and many of the failures we do get are from Trilinos. However, there are a few times per month where master gets broken based on an Albany commit.

I suggest that, as a starting point for the automated Trilinos develop->master process, we get Albany testing into the scripts but, at the beginning, don't let an Albany failure cancel the update of master. If I and a few others can get an email with the Albany test results, we can do triage and figure out if we are detecting a problem with Trilinos or not. Over time, we can work to make the process robust enough to where we are comfortable making the automated Trilinos update contingent on success of Albany. (I expect we might need to either do what you did with Sierra, or create a "stable" branch that only gets updated from master when 100% pass.)

@bartlettroscoe
Copy link
Member Author

@william76, this issue was opened a long time ago that proposed using the CDash API to determine pass/fail for a bunch of builds to update the 'master' branch from the 'develop' branch like we discussed on Monday 2/26/2018.

@william76
Copy link
Contributor

@bartlettroscoe, ah ok. Thanks!

I think we'll be waiting on the newer CDash since I can get the SHA1 hashes of the builds on the dashboard... once that's in place, I already have about 99% of what I'd need in a script to check the set of builds that we care about for promoting dev->master and could set up an automatic process pretty easily.

@bartlettroscoe
Copy link
Member Author

The @trilinos/framework team went with the PR testing infrastructure.

Closing as wontfix

@bartlettroscoe bartlettroscoe added the resolved: wontfix The development team cannot or will not address this issue label Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework tasks Framework tasks (used internally by Framework team) resolved: wontfix The development team cannot or will not address this issue story The issue corresponds to a Kanban Story (vs. Epic or Task)
Projects
None yet
Development

No branches or pull requests

7 participants