Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating-point exception when concatenating IAEA phase space files with addphsp #423

Closed
mchamberland opened this issue Apr 14, 2018 · 4 comments
Assignees

Comments

@mchamberland
Copy link
Contributor

I get a floating-point exception (IEEE_INVALID_FLAG) when concatenating IAEA phase space files. I re-compiled addphsp with the -ffpe-trap=invalid flag and no optimization to produce the backtrace below:

[~/Desktop/TrueBeam_v2_phsp_10FFF] Marc$ addphsp TrueBeam_v2_10FFF test 2 0 1 1

 Will sum from phsp file TrueBeam_v2_10FFF_w0.1.IAEAphsp
 to TrueBeam_v2_10FFF_w1.1.IAEAphsp
 And output result to test.1.IAEAphsp


 Adding TrueBeam_v2_10FFF_w0.1.IAEAphsp to test.1.IAEAphsp: 


 Header information for TrueBeam_v2_10FFF_w0.1.IAEAphsp:

  Warning: IAEA format phsp file does not store LATCH

            TOTAL NUMBER OF PARTICLES IN FILE:     45862111
                     TOTAL NUMBER OF PHOTONS:     45427146
THE REST ARE ELECTRONS/POSITRONS.
 
      MAXIMUM KINETIC ENERGY OF THE PARTICLES:       10.377 MeV
 # OF INCIDENT PARTICLES FROM ORIGINAL SOURCE:    324000000

                       Z AT WHICH PHSP SCORED:       26.700 cm



 Header information for test.1.IAEAphsp:


 First time writing to this file.
 No header data to display.


 BEGIN READING/WRITING PH-SP DATA .....


Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x10280c092
#1  0x10280b3b0
#2  0x7fff6a4bdf59
#3  0x1027fe4e6
#4  0x1027f967b
#5  0x1027d733a
#6  0x1027daaa8
#7  0x1027dd3e8
Floating point exception: 8

I'm not sure if I can provide anything more than that since those are the Varian phase space files which I'm not allowed to share. Let me know if I can help debug.

@mchamberland
Copy link
Contributor Author

mchamberland commented Apr 14, 2018

Slightly more informative backtrace:

Process 32052 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_SSEEXTERR, subcode=0x1f21)
    frame #0: 0x00000001000c84e6 libiaea_phsp.dylib`iaea_record_type::read_particle() + 614
libiaea_phsp.dylib`iaea_record_type::read_particle:
->  0x1000c84e6 <+614>: mulss  %xmm1, %xmm0
    0x1000c84ea <+618>: movss  0x39da(%rip), %xmm3       ; xmm3 = mem[0],zero,zero,zero 
    0x1000c84f2 <+626>: mulss  %xmm0, %xmm3
    0x1000c84f6 <+630>: mulss  %xmm1, %xmm0
Target 0: (addphsp) stopped.

Edit: and that’s as far as I get because my debugger refuses to print any variables.

@mchamberland
Copy link
Contributor Author

A bit more of testing: the error does not occur when creating a fresh debug configuration of EGSnrc with all optimizations turned off. But I can't figure out exactly with which combination of optimization flags the error shows up.

@mchamberland
Copy link
Contributor Author

I think I isolated the problem to, what else, the -ffast-math flag. Perhaps Iwan is right and it's time to move away from that pesky flag (see, e.g., #174).

ftessier added a commit that referenced this issue Apr 19, 2018
Update the default gcc optimization configuration to use -march=native
instead of -ffast-math. The latter causes various floating-point
exceptions on newer cpus and compilers. If the programs are run on a
different cpu, then one should use the corresponding -march option for
that architecture instead of "native", or else use the less aggressive
-mtune=native if the compiling and running cpus are in the same family.
ftessier added a commit that referenced this issue Apr 19, 2018
Update the default gcc optimization configuration to -mtune=native
instead of -ffast-math. The latter causes various floating-point
exceptions on newer cpus and compilers. Note that if everything is
compiled and run on identical cpu, then the more aggressive
-march=native option should be considered during configuration.
ftessier added a commit that referenced this issue Apr 19, 2018
Update the default gcc optimization configuration to -mtune=native
instead of -ffast-math. The latter causes various floating-point
exceptions on newer cpus and compilers. Note that if everything is
compiled and run on identical cpu, then the more aggressive
-march=native option should be considered during configuration.
@blakewalters blakewalters self-assigned this Apr 19, 2018
@crcrewso
Copy link
Contributor

Very happy to see the change from march to mtune!!!

As this code currently works on Arm32 as well I'm hesitant to suggest we go so far as to include an -march=nocona as that's the oldest x86_64 architecture, but for reasonable people should we consider it.

ftessier added a commit that referenced this issue Sep 25, 2018
Update the default gcc optimization configuration to -mtune=native
instead of -ffast-math. The latter causes various floating-point
exceptions on newer cpus and compilers. Note that if everything is
compiled and run on identical cpu, then the more aggressive
-march=native option should be considered during configuration.

Also add a test in the Fortran compiler version check to catch the
gfortran version string.
ftessier added a commit that referenced this issue Sep 26, 2018
Update the default gcc optimization configuration to -mtune=native
instead of -ffast-math. The latter causes various floating-point
exceptions on newer cpus and compilers. Note that if everything is
compiled and run on identical cpu, then the more aggressive
-march=native option should be considered during configuration.

Also add a test in the Fortran compiler version check to catch the
gfortran version string, and fix a duplicate echo for the default
fortran debugger flag.
ftessier added a commit that referenced this issue Sep 26, 2018
Update the default gcc optimization configuration to -mtune=native
instead of -ffast-math. The latter causes various floating-point
exceptions on newer cpus and compilers. Note that if everything is
compiled and run on identical cpu, then the more aggressive
-march=native option should be considered during configuration.

Change the default optimization level to -O2 instead of -O3. There have
been cases where upgrading to a newer compiler revealed bugs under -O3,
and more aggressive optimization does not always lead to increased
performance. The -O2 option is a better default, and another level can
be selected at configuration time.

Also add a test in the Fortran compiler version check to catch the
gfortran version string, and fix a duplicate echo for the default
fortran debugger flag.
ftessier added a commit that referenced this issue Sep 26, 2018
Update the default gcc optimization configuration to -mtune=native
instead of -ffast-math. The latter causes various floating-point
exceptions on newer cpus and compilers. Note that if everything is
compiled and run on identical cpu, then the more aggressive
-march=native option should be considered during configuration.

Change the default optimization level to -O2 instead of -O3. There have
been cases where upgrading to a newer compiler revealed bugs under -O3,
and more aggressive optimization does not always lead to increased
performance. The -O2 option is a better default, and another level can
be selected at configuration time.

Also add a test in the Fortran compiler version check to catch the
gfortran version string, and fix a duplicate echo for the default
fortran debugger flag.
@rtownson rtownson closed this as completed Oct 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants