Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

egs++ very rare infinite loop detected #580

Open
karnigen opened this issue Feb 12, 2020 · 7 comments
Open

egs++ very rare infinite loop detected #580

karnigen opened this issue Feb 12, 2020 · 7 comments
Assignees
Labels

Comments

@karnigen
Copy link

karnigen commented Feb 12, 2020

It seems like bug but I don't know where the problem actually is. Looping is in egs_shower and in egs_ausgab x vector has position given next, which never changes
E=0.010000000000000002 x=-0.35750032602018411 y=-3.9060383952088462 z=39.965866508062611
Looping start within 3 minutes from start, see log file.

  • egsnrc - github latest develop branch
  • system Mint 19.2 Tina Mate - Ubuntu 18.04 (Bionic Beaver)
  • compiler gcc, g++, gfortran 7.4.0
  • egsnrc compiled with options: 'fflags': '"-fPIC -v -g"', 'oflags': '"-O3 -march=native"', 'cflags': '"-O3 -march=native -fPIC -v -g"', 'cpp': '"-O3 -march=native -ffast-math -v -g"'

Application:
#include "egs_advanced_application.h"
APP_MAIN (EGS_AdvancedApplication);

Input egsinp file:
xdavka3.egsinp.txt

Log from calculation:
xdavka3_wxx.egslog.txt

Some debug info:
xdavka3_debug.txt

@karnigen
Copy link
Author

karnigen commented Feb 15, 2020

I found example where loop starts earlier (only 40mil egsAusgab calls) and may be better checked.

Input egsinp file:
xdavka3.egsinp.txt

Compiling options:
egs_compiling.txt

I've also added several debug lines into egs_advanced_application.cpp to better reproduce the bug:

extern __extc__ void egsAusgab(EGS_I32 *iarg) {
CHECK_GET_APPLICATION(app,"egsAusgab()");
...
app->Np = the_stack->np-1;
static unsigned long counter=0;
static EGS_Vector v1;
#define NNN1 39733920
#define NNN2 100000000
if((counter<20) || (counter >= NNN1 && counter <= NNN1+50) || (counter >= NNN2 && counter <= NNN2+50)) {
v1=app->top_p.x;
printf("egsAusgab cnt:%9ld x:%f y:%f z:%f\n", counter,v1.x,v1.y,v1.z);
}
counter++;

*iarg = app->userScoring(*iarg);
}

Debug info from egsAusgab, cycle starts after around 40mil calls:
debug_data.txt

Unfortunately using different compile options or compiler leads to different random sequence.
There is no cycle when using original egsnrc compiling options ('fflags': '"-fPIC"', 'oflags': '"-O2 -mtune=native"', 'cflags': '"-O2 -fPIC"', 'cpp': '"-O2 -mtune=native"') and egsAusgab gives:
debug_data_orig.txt

I'm also using compile option -march=native which uses my specific cpu abilities and makes this bug hardly reproducible.

@rtownson rtownson added the bug label Apr 14, 2020
@ftessier ftessier self-assigned this Apr 21, 2021
@ftessier
Copy link
Member

Thanks for taking the time to report this @karnigen! I will try to reproduce this bug at my end with the input your provided.

@ftessier
Copy link
Member

ftessier commented Apr 23, 2021

I am having trouble reproducing this here (with your second input), as expected. Tried 3 cpus (all Intel though), flags here are -fPIC -O2 -mtune=native across the board. Will continue to try, with longer simulations.

In the mean time @karnigen can you try to adjust your geometry input so that the z_dose planes inserted inside the cd geometry are "bleeding" over the evelope, as in put the first one say at -0.1, and the last one at 62.1 (and then the same thing for the z_envelope planes).

EGSnrc is very finicky with floating-point precision issues, and one must avoid all overlapping surfaces. Typically this leads to particles being stuck at the interface, and infinite looping such as what you observe. Let me know if that resolves the problem!

@ftessier
Copy link
Member

I ran 3e9 histories on 256 processors with your second input, and all simulations completed as expected. This is most likely due a rare geometry floating point error. Try offsetting the geometry planes and let me know.

@crcrewso
Copy link
Contributor

@karnigen
I have a feeling I might know what's going on here. Could you try recompiling with -march=x86-64 -mtune=x86-64 -O0 to see if you can still reproduce the error. Could you try with both the seeds you've provided and another set of seeds? I'm thinking it's a combination of architecture and the family of gcc (7.4) that you're on.

The difference between @karnigen and @ftessier is -O3 vs -O2 and -march=native vs -mtune=native.

@ftessier
Copy link
Member

ftessier commented Apr 23, 2021

Oh, I had misread the second message:

There is no cycle when using original egsnrc compiling options ('fflags': '"-fPIC"', 'oflags': '"-O2 -mtune=native"', 'cflags': '"-O2 -fPIC"', 'cpp': '"-O2 -mtune=native"')

(I read that the problem did still occur with the default flags). We have had issues in the past with the -O3 and -ffast-math, so indeed the recommended setting is -O2 and -mtune=native instead. I will run again with the problematic flags, for kicks.

@ftessier
Copy link
Member

See #85 and #174.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants