add second flush for beam IO #480

SeverinDiederichs · 2021-05-05T22:07:36Z

This PR solves a bug we found in the diagnostics:

In rare occasions (since commit 5d4bb74, but not on the commit before), the beam IO would show undefined behaviour in parallel runs. Although in the print statement in SaveRealProperty showed the correct values, the openPMD file had NaNs or 0s in there.

This was presumably caused by altering the data before it was flushed. Before #453, beam and field data were dumped together at the end of the step. Now, the beam is written to file before the plasma loop. However, there was still only one flushing to file at the end of the time step (so after the beam was altered).

This is fixed in this PR and the bug does not appear anymore.

Using the following input script, the bug may be reproduced (reproducible with GCC, but not with Clang):

amr.n_cell = 16 16 100

hipace.normalized_units = 1
hipace.verbose = 1
hipace.output_period = 1
random_seed = 4
amr.blocking_factor = 2
amr.max_level = 1

max_step = 9

hipace.numprocs_x = 1
hipace.numprocs_y = 1
hipace.dt=1.

hipace.depos_order_xy = 2

geometry.coord_sys   = 0                  # 0: Cartesian
geometry.is_periodic = 1     1     0      # Is periodic?
geometry.prob_lo     = -10.   -10.   -10.    # physical domain
geometry.prob_hi     =  10.    10.    10.

beams.names = beam beam2

beam.injection_type = fixed_weight
beam.num_particles = 100000
beam.density = 200.
beam.u_mean = 20. 10. 20
beam.u_std = 100. 100. 15.
beam.position_mean = 0. 0. 0
beam.position_std = 0.1 0.1 1.41
beam.dx_per_dzeta = 0.2

beam2.injection_type = fixed_weight
beam2.num_particles = 30
beam2.density = 200.
beam2.u_mean = 8. 23. 21
beam2.u_std = 80. 120. 14.
beam2.position_mean = 0. 0. 0
beam2.position_std = 8. 0.3 1.41

plasmas.names = no_plasma
diagnostic.diag_type = xz

Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
Tested (describe the tests in the PR description)
Runs on GPU (basic: the code compiles and run well with the new module)
Contains an automated test (checksum and/or comparison with theory)
Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
Constified (All that can be const is const)
Code is clean (no unwanted comments, )
Style and code conventions are respected at the bottom of https://github.com/Hi-PACE/hipace
Proper label and GitHub project, if applicable

MaxThevenet

(discussed offline) Look great, thanks for this fix! A few tests fail. Could you check that the code is correct now for these tests (meaning it was incorrect before)? For instance, you could compute the checksum (or print all values to std::out or something) directly in the code right before flushing. You could also run a simulation where the checksum is obvious (e.g. with 1000 particles having uz=100, the checksum for uz defined as np.sum(np.abs(Q)) should be exactly 1.e5). If the results agree with the new version, this PR is a fix indeed. If it agrees with the old, there's still something we're missing.

SeverinDiederichs · 2021-05-06T07:54:57Z

I did manage to generate a test showing that this PR is correct.
Adapting the beam_evolution.1Rank.sh test to

# Run the simulation
mpiexec -n 1 $HIPACE_EXECUTABLE $HIPACE_EXAMPLE_DIR/inputs_normalized \
        amr.n_cell = 32 32 10 \
        max_step = 20 \
        geometry.prob_lo = -2. -2. -2. \
        geometry.prob_hi =  2.  2.  2. \
        hipace.dt = 10. \
        hipace.output_period = 20 \
        beam.density = 1.e-8 \
        beam.radius = 1. \
        beam.injection_type = fixed_weight \
        beam.num_particles = 10000 \
        beam.position_mean = 0 0 0 \
        beam.position_std = 0.1 0.1 0.1 \
        beam.ppc = 4 4 1 \
        hipace.external_ExmBy_slope = .0 \
        hipace.external_Ez_uniform = 1.0 \
        hipace.file_prefix = $TEST_NAME

Now the uz benchmark should be npart * (uz_init - dt*max_step*Ez) = 8e6. uz_init is 1e3 as in the input file. In this PR, I do get the correct result. On development, I get 7.9e6. This shows exactly what goes wrong: In the last time step, the beam is assigned to be written to file, then pushed once more, and then it is finally flushed to file, so the output is npart * (uz_init - dt*(max_step + 1)*Ez)

I will now reset the benchmarks, which were incorrect previously and then we can merge

MaxThevenet · 2021-05-06T08:04:48Z

Sounds great! Could you reset the benchmarks then?

add flush just for beam IO

eb7132b

SeverinDiederichs added bug Something isn't working component: beam About the beam species component: diagnostics About any types of diagnostics labels May 5, 2021

MaxThevenet reviewed May 6, 2021

View reviewed changes

reset benchmark because bug in IO was fixed

9cb7181

MaxThevenet merged commit 51c4bb3 into development May 6, 2021

MaxThevenet deleted the fix_IO_bug branch May 6, 2021 08:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add second flush for beam IO #480

add second flush for beam IO #480

SeverinDiederichs commented May 5, 2021 •

edited

Loading

MaxThevenet left a comment •

edited

Loading

SeverinDiederichs commented May 6, 2021 •

edited

Loading

MaxThevenet commented May 6, 2021

add second flush for beam IO #480

add second flush for beam IO #480

Conversation

SeverinDiederichs commented May 5, 2021 • edited Loading

MaxThevenet left a comment • edited Loading

Choose a reason for hiding this comment

SeverinDiederichs commented May 6, 2021 • edited Loading

MaxThevenet commented May 6, 2021

SeverinDiederichs commented May 5, 2021 •

edited

Loading

MaxThevenet left a comment •

edited

Loading

SeverinDiederichs commented May 6, 2021 •

edited

Loading