We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug Using tandem p2 and bp5 example for the repository, I get an error at first time step:
di73yeq4@login03:/hppfs/work/pn49ha/di73yeq4/tandem/examples/tandem/3d> head 3451988.tandem.out -n 200 num_nodes: 4 ntasks: 192 ___ ___ _____ ___ ___ ___ / /\ /__/\ / /::\ / /\ /__/\ / /\ / /::\ \ \:\ / /:/\:\ / /:/_ | |::\ / /:/ / /:/\:\ \ \:\ / /:/ \:\ / /:/ /\ | |:|:\ / /:/ / /:/~/::\ _____\__\:\ /__/:/ \__\:| / /:/ /:/_ __|__|:|\:\ / /::\ /__/:/ /:/\:\/__/::::::::\\ \:\ / /://__/:/ /:/ /\/__/::::| \:\ /__/:/\:\\ \:\/:/__\/\ \:\~~\~~\/ \ \:\ /:/ \ \:\/:/ /:/\ \:\~~\__\/ \__\/ \:\\ \::/ \ \:\ ~~~ \ \:\/:/ \ \::/ /:/ \ \:\ \ \:\\ \:\ \ \:\ \ \::/ \ \:\/:/ \ \:\ \__\/ \ \:\ \ \:\ \__\/ \ \::/ \ \:\ \__\/ \__\/ \__\/ \__\/ tandem version ee87ac9 stack size limit = 2048 MiB Worker affinity 0---------|----------|----------|----------|--------8-|----------| ----------|----------|----------|------ Multigrid P-levels: 1 2 TS ts_checkpoint.storage_type limited TS ts_checkpoint.save_directory checkpoint TS ts_checkpoint.freq_step 1000 TS ts_checkpoint.freq_cputime 3.0000e+01 TS ts_checkpoint.freq_physical_time 1.0000e+10 TS ts_checkpoint.storage_limited_size 2 [checkpoint] directory created DOFs (domain): 1891590 DOFs (fault): 167796 Mesh size: 71.6532 sigma_n = 11.0811 |tau| = 13525.3 psi = -0.220103 L = 0 U = 2924.74 F(L) = 13525.3 sigma_n = 196.612 |tau| = 26418.9 psi = -0.993655 L = 0 U = 5712.89 F(L) = 26418.9 F(U) = 1.61031e-12 sigma_n = 54.621 |tau| = 105097 psi = -6.47109 L = 0 U = 22726.5 F(L) = 105097 F(U) = 5.31919e-12 terminate called after throwing an instance of 'std::logic_error' sigma_n = 41.6383 |tau| = 13866.2 psi = -0.204948 L = 0 U = 2998.47 F(L) = 13866.2 F(U) = 7.89669e-14 terminate called after throwing an instance of 'std::logic_error' sigma_n = 19.8669 |tau| = 14586.5 psi = -0.25234 L = 0 U = 3154.22 F(L) = 14586.5 F(U) = 6.96785e-13 what(): F(a) and F(b) must have different sign. F(U) = 8.03797e-13 terminate called after throwing an instance of 'std::logic_error' sigma_n = 58.7748 |tau| = 16364 psi = -0.525257 L = 0 U = 3538.6 F(L) = 16364 F(U) = 7.50726e-13 terminate called after throwing an instance of 'std::logic_error' sigma_n = 51.8792 |tau| = 15802.3 psi = -0.306186 L = 0 U = 3417.13 F(L) = 15802.3 F(U) = 1.0331e-12 what(): F(a) and F(b) must have different sign. terminate called after throwing an instance of 'std::logic_error' what(): F(a) and F(b) must have different sign. terminate called after throwing an instance of 'std::logic_error' what(): F(a) and F(b) must have different sign. terminate called after throwing an instance of 'std::logic_error' what(): F(a) and F(b) must have different sign. what(): F(a) and F(b) must have different sign. what(): F(a) and F(b) must have different sign. srun: error: i01r01c05s07: task 134: Aborted (core dumped) srun: launch/slurm: _step_signal: Terminating StepId=3451988.0 slurmstepd: error: *** STEP 3451988.0 ON i01r01c05s05 CANCELLED AT 2024-07-17T11:37:51 *** [148]PETSC ERROR: ------------------------------------------------------------------------
Expected behavior no error To Reproduce Steps to reproduce the behavior:
I'm running BP5.toml based on this branch #72 (at commit ee87ac9) which is a few commits on top of #59
spack installed on supermuc NG with:
spack install -j 30 tandem@tscp polynomial_degree=2 domain_dimension=3
Here is a list of the dependencies of tandem, and there specs:
di73yeq4@login03:/hppfs/work/pn49ha/di73yeq4/tandem/examples/tandem/3d> spack spec -I tandem@tscp polynomial_degree=2 domain_dimension=3 Input spec -------------------------------- - tandem@tscp domain_dimension=3 polynomial_degree=2 Concretized -------------------------------- - tandem@tscp%gcc@12.2.0~cuda~ipo~libxsmm~python~rocm build_system=cmake build_type=Release domain_dimension=3 generator=make min_quadrature_order=0 polynomial_degree=2 arch=linux-sles15-skylake_avx512 [^] ^cmake@3.26.3%gcc@12.2.0~doc+ncurses+ownlibs~qt build_system=generic build_type=Release arch=linux-sles15-skylake_avx512 [^] ^ncurses@6.4%gcc@12.2.0~symlinks+termlib abi=none build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^pkgconf@1.8.0%gcc@12.2.0 build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^openssl@1.1.1t%gcc@12.2.0~docs~shared build_system=generic certs=mozilla arch=linux-sles15-skylake_avx512 [^] ^ca-certificates-mozilla@2023-01-10%gcc@12.2.0 build_system=generic arch=linux-sles15-skylake_avx512 [^] ^perl@5.36.0%gcc@12.2.0+cpanm+open+shared+threads build_system=generic arch=linux-sles15-skylake_avx512 [^] ^berkeley-db@18.1.40%gcc@12.2.0+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc arch=linux-sles15-skylake_avx512 [^] ^eigen@3.4.0%gcc@12.2.0~ipo build_system=cmake build_type=RelWithDebInfo generator=make arch=linux-sles15-skylake_avx512 [^] ^cmake@3.26.3%gcc@12.2.0~doc+ncurses+ownlibs~qt build_system=generic build_type=Release arch=linux-sles15-skylake_avx512 [^] ^openssl@1.1.1t%gcc@12.2.0~docs~shared build_system=generic certs=mozilla arch=linux-sles15-skylake_avx512 [^] ^perl@5.36.0%gcc@12.2.0+cpanm+open+shared+threads build_system=generic arch=linux-sles15-skylake_avx512 [^] ^gdbm@1.23%gcc@12.2.0 build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^readline@8.2%gcc@12.2.0 build_system=autotools patches=bbf97f1 arch=linux-sles15-skylake_avx512 [^] ^gmake@4.4.1%gcc@12.2.0~guile build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^gmake@4.4.1%gcc@12.2.0~guile build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^intel-oneapi-mpi@2021.9.0%gcc@12.2.0+envmods~external-libfabric~generic-names~ilp64 build_system=generic arch=linux-sles15-skylake_avx512 [^] ^lua@5.4.4%gcc@12.2.0~pcfile+shared build_system=makefile fetcher=curl arch=linux-sles15-skylake_avx512 [^] ^curl@8.0.1%gcc@12.2.0~gssapi~ldap~libidn2~librtmp~libssh~libssh2~nghttp2 build_system=autotools libs=shared,static tls=openssl arch=linux-sles15-skylake_avx512 [^] ^readline@8.2%gcc@12.2.0 build_system=autotools patches=bbf97f1 arch=linux-sles15-skylake_avx512 [^] ^unzip@6.0%gcc@12.2.0 build_system=makefile arch=linux-sles15-skylake_avx512 [^] ^metis@5.1.0%gcc@12.2.0~gdb+int64~ipo~real64+shared build_system=cmake build_type=Release generator=make patches=4991da9,93a7903,b1225da arch=linux-sles15-skylake_avx512 [^] ^cmake@3.26.3%gcc@12.2.0~doc+ncurses+ownlibs~qt build_system=generic build_type=Release arch=linux-sles15-skylake_avx512 [^] ^openssl@1.1.1t%gcc@12.2.0~docs~shared build_system=generic certs=mozilla arch=linux-sles15-skylake_avx512 [^] ^perl@5.36.0%gcc@12.2.0+cpanm+open+shared+threads build_system=generic arch=linux-sles15-skylake_avx512 [^] ^parmetis@4.0.3%gcc@12.2.0~gdb+int64~ipo+shared build_system=cmake build_type=Release generator=make patches=4f89253,50ed208,704b84f arch=linux-sles15-skylake_avx512 [+] ^petsc@3.20.1%gcc@12.2.0~X~batch~cgns~complex~cuda~debug+double~exodusii~fftw+fortran~giflib+hdf5~hpddm~hwloc+hypre+int64~jpeg+knl~kokkos~libpng~libyaml~memkind+metis~mkl-pardiso~mmg~moab~mpfr+mpi+mumps~openmp~p4est~parmmg~ptscotch~random123~rocm~saws+scalapack+shared~strumpack~suite-sparse+superlu-dist~sycl~tetgen~trilinos~valgrind build_system=generic clanguage=C memalign=32 arch=linux-sles15-skylake_avx512 [^] ^diffutils@3.9%gcc@12.2.0 build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^libiconv@1.17%gcc@12.2.0 build_system=autotools libs=shared,static arch=linux-sles15-skylake_avx512 [^] ^hdf5@1.10.9%gcc@12.2.0+cxx+fortran+hl~ipo~java+mpi+shared+szip+threadsafe+tools api=default build_system=cmake build_type=Release generator=make arch=linux-sles15-skylake_avx512 [^] ^libaec@1.0.6%gcc@12.2.0~ipo+shared build_system=cmake build_type=Release generator=make arch=linux-sles15-skylake_avx512 [+] ^hypre@develop%gcc@12.2.0~caliper~complex~cuda~debug+fortran~gptune+int64~internal-superlu~magma~mixedint+mpi~openmp~rocm+shared~superlu-dist~sycl~umpire~unified-memory build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^intel-oneapi-mkl@2023.1.0%gcc@12.2.0+cluster+envmods~ilp64+shared build_system=generic threads=none arch=linux-sles15-skylake_avx512 [^] ^intel-oneapi-tbb@2021.9.0%gcc@12.2.0+envmods build_system=generic arch=linux-sles15-skylake_avx512 [+] ^mumps@5.5.1%gcc@12.2.0~blr_mt+complex+double+float~incfort~int64+metis+mpi~openmp+parmetis~ptscotch~scotch+shared build_system=generic patches=373d736 arch=linux-sles15-skylake_avx512 [^] ^python@3.10.10%gcc@12.2.0+bz2+crypt+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tkinter+uuid+zlib build_system=generic patches=0d98e93,7d40923,f2fd060 arch=linux-sles15-skylake_avx512 [^] ^bzip2@1.0.8%gcc@12.2.0~debug~pic+shared build_system=generic arch=linux-sles15-skylake_avx512 [^] ^expat@2.5.0%gcc@12.2.0+libbsd build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^libbsd@0.11.7%gcc@12.2.0 build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^libmd@1.0.4%gcc@12.2.0 build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^gdbm@1.23%gcc@12.2.0 build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^gettext@0.21.1%gcc@12.2.0+bzip2+curses+git~libunistring+libxml2+tar+xz build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^libxml2@2.10.3%gcc@12.2.0~python build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^tar@1.30%gcc@12.2.0 build_system=autotools zip=pigz arch=linux-sles15-skylake_avx512 [^] ^pigz@2.7%gcc@12.2.0 build_system=makefile arch=linux-sles15-skylake_avx512 [^] ^libffi@3.4.4%gcc@12.2.0 build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^libxcrypt@4.4.33%gcc@12.2.0~obsolete_api build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^sqlite@3.40.1%gcc@12.2.0+column_metadata+dynamic_extensions+fts~functions+rtree build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^util-linux-uuid@2.38.1%gcc@12.2.0 build_system=autotools arch=linux-sles15-skylake_avx512 [^] ^xz@5.4.1%gcc@12.2.0~pic build_system=autotools libs=shared,static arch=linux-sles15-skylake_avx512 [+] ^superlu-dist@develop%gcc@12.2.0~cuda+int64~ipo~openmp+parmetis~rocm+shared build_system=cmake build_type=Release generator=make arch=linux-sles15-skylake_avx512 [^] ^zlib@1.2.13%gcc@12.2.0+optimize+pic+shared build_system=makefile arch=linux-sles15-skylake_avx512
launched with:
#!/bin/bash # Job Name and Files (also --job-name) #SBATCH -J tandem #Output and error (also --output, --error): #SBATCH -o ./%j.%x.out #SBATCH -e ./%j.%x.out #Initial working directory: #SBATCH --chdir=./ #Notification and type #SBATCH --mail-type=END #SBATCH --mail-user=thomas.ulrich@lmu.de #SBATCH --no-requeue #Setup of execution environment #SBATCH --export=ALL #SBATCH --account=pn49ha #SBATCH --ntasks-per-node=48 #SBATCH --cpus-per-task=1 #EAR may impact code performance #SBATCH --ear=off ##SBATCH --nodes=20 --partition=general --time=00:35:00 #SBATCH --nodes=4 --partition=test --time=00:30:00 #--exclude="i01r01c[01-02]s[01-12]" module load slurm_setup export MP_SINGLE_THREAD=yes export OMP_NUM_THREADS=1 export MP_TASK_AFFINITY=core:$OMP_NUM_THREADS echo 'num_nodes:' $SLURM_JOB_NUM_NODES 'ntasks:' $SLURM_NTASKS ulimit -Ss 2097152 srun tandem bp5.toml --mg_strategy twolevel --mg_coarse_level 1 --petsc -ksp_max_it 400 -pc_type mg -mg_levels_ksp_max_it 4 -mg_levels_ksp_type cg -mg_levels_pc_type bjacobi -ksp_rtol 1.0e-6 -mg_coarse_pc_type gamg -mg_coarse_ksp_type cg -mg_coarse_ksp_rtol 1.0e-1 -ksp_type gcr -log_view
The text was updated successfully, but these errors were encountered:
I've added some additional error log:
diff --git a/app/localoperator/DieterichRuinaAgeing.h b/app/localoperator/DieterichRuinaAgeing.h index 5d4b5b6..019edf0 100644 --- a/app/localoperator/DieterichRuinaAgeing.h +++ b/app/localoperator/DieterichRuinaAgeing.h @@ -106,7 +106,11 @@ public: V = zeroIn(a, b, fF); } catch (std::exception const&) { std::cout << "sigma_n = " << snAbs << std::endl + << "-sn = " << -sn << std::endl + << "SnPre = " << p_[index].get<SnPre>() << std::endl << "|tau| = " << tauAbs << std::endl + << "|tau_inc| = " << norm(tau) << std::endl + << "|TauPre| = " << norm(p_[index].get<TauPre>()) << std::endl << "psi = " << psi << std::endl << "L = " << a << std::endl << "U = " << b << std::endl
And they show tau_ini is probably correct.
sigma_n = 28.5945 -sn = 3.59447 SnPre = 25 |tau| = 7012.44 |tau_inc| = 6991.29 |TauPre| = 21.1481 psi = -0.790723 L = 0 sigma_n = 80.8889 -sn = 55.8889 SnPre = 25
Also tested v1.0, same issue. (both p1 and p2). Also tested Nico's setup.
Sorry, something went wrong.
This was because I was not setting the Petsc parameters for the TS file ! Maybe we could catch this missing parameter in the future.
No branches or pull requests
Describe the bug
Using tandem p2 and bp5 example for the repository, I get an error at first time step:
Expected behavior
no error
To Reproduce
Steps to reproduce the behavior:
I'm running BP5.toml based on this branch #72 (at commit ee87ac9)
which is a few commits on top of #59
spack installed on supermuc NG with:
spack install -j 30 tandem@tscp polynomial_degree=2 domain_dimension=3
Here is a list of the dependencies of tandem, and there specs:
launched with:
The text was updated successfully, but these errors were encountered: