Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schwarz preconditioned CG gives nans in develop #1535

Open
weinbe2 opened this issue Jan 23, 2025 · 0 comments
Open

Schwarz preconditioned CG gives nans in develop #1535

weinbe2 opened this issue Jan 23, 2025 · 0 comments
Assignees
Labels

Comments

@weinbe2
Copy link
Contributor

weinbe2 commented Jan 23, 2025

As described in the title.

cmake command:

cmake -DCMAKE_BUILD_TYPE=RELEASE -DQUDA_DIRAC_DEFAULT_OFF=ON -DQUDA_DIRAC_DOMAIN_WALL=ON  -DQUDA_GPU_ARCH=sm_80 -DQUDA_DOWNLOAD_USQCD=ON -DQUDA_QIO=ON -DQUDA_QMP=ON -DQUDA_FAST_COMPILE_DSLASH=ON -DQUDA_FAST_COMPILE_REDUCE=ON ../quda

Test executable command (~same as on the wiki page: https://github.com/lattice/quda/wiki/The-Multi-Splitting-Preconditioned-Conjugate-Gradient-(MSPCG),-an-application-of-the-additive-Schwarz-Method )

mpirun -np 1 ./invert_test \
  --matpc even-even \
  --dim 12 12 12 16 \
  --Lsdim 8 \
  --gridsize 1 1 1 1 \
  --dslash-type mobius \
  --b5 2.5 --c5 1.5 \
  --inv-type pcg \
  --precon-type cg \
  --precon-schwarz-type additive \
  --tol 1e-6 \
  --tol-precondition 1e-6 \
  --niter 860 \
  --maxiter-precondition 6 \
  --mass 0.01 \
  --prec single \
  --prec-sloppy half \
  --prec-precondition half

Output (with --verbosity verbose added):

[...]
Computed plaquette is 1.226087e-01 (spatial = 1.221845e-01, temporal = 1.230330e-01)
Solution = mat, Solve = normop_pc, Solver = pcg, Precision = single, Sloppy precision = half
DiracMobius: Detected fixed real cofficients: using regular Mobius
DiracMobius: Detected fixed real cofficients: using regular Mobius
DiracMobius: Detected fixed real cofficients: using regular Mobius
DiracMobius: Detected fixed real cofficients: using regular Mobius
Source: 2.65451e+06
Prepared: source = 2.1658e+06, solution = 0
Creating a PCG solver
CG: Convergence at 6 iterations, L2 relative residual: iterated = 4.735067e-02 (requested = 1.000000e-06)
PCG:     0 iterations, <r,r> = 2.841162e+06, |r|/|b| = 1.000000e+00
ERROR: Solver appears to have diverged with residual       nan (rank 0, host ipp1-1776.nvidia.com, solver.cpp:417 in bool quda::Solver::convergence(quda::cvector<double>&, quda::cvector<double>&, quda::cvector<double>&, quda::cvector<double>&)())
       last kernel called was (name=N4quda4blas11axpyCGNorm2IdfEE,volume=6x12x12x16x8,aux=GPU-offline,vol=110592,parity=1,precision=2,order=8,Ns=4,Nc=3,n_rhs=1)
Saving 56 sets of cached parameters to /scratch/local/build/tests/tunecache/tunecache_error.tsv
QMP m0,n1@ipp1-1776.nvidia.com error: abort: 1
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants