P3M benchmark randomly fails CI #2924

espresso-ci · 2019-06-18T05:00:09Z

https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/pipelines/7734

jngrad · 2019-06-18T08:30:48Z

This is a recurring error:
https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/-/jobs/125861
https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/-/jobs/126068
https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/-/jobs/126071
Re-starting the job usually fixes it. It is caused by this line:

espresso/maintainer/benchmarks/p3m.py

Line 167 in b7812cb

energies = system.analysis.energy()

With error message:

Exception: calc_long_range_energies failed: ERROR: number of cells 1 is smaller than minimum 8 (interaction range too large or min_num_cells too large) in function void dd_create_cell_grid()

. It happens at random. The P3M benchmark script is MPI-capable, but its test currently runs without MPI. I never had any issue running this benchmark without MPI, even with a larger number of particles.

RudolfWeeber · 2019-06-20T14:20:57Z

Either the p3m tuning doesnot respect box_l/2 as maximum real space cutoff
or tune_skin(), which is called after p3m tuning does not.
Maybe some code needs to be inserted to calculate max_skin from the current interaction cutoffs and the local box length.

fweik · 2019-06-21T12:22:41Z

I'm looking into this, I think there is a bug in the tuning.

jngrad · 2019-06-21T12:33:02Z

I can't reproduce the bug locally nor locally in the docker container, but I can reproduce it on coyote8 in the docker container (after 175 tries). I'll try again while printing the random seed and check if it's deterministic.

fweik · 2019-06-21T12:35:11Z

Please don't spend any more time on this, I will fix it, I know what the error is.

RudolfWeeber · 2019-07-01T08:25:55Z

#2961
I assume that the resulting p3m parameters are outside usual values in the CI environment.
Short term, it might be enough to disable the tune_skin in the test, but generally #2961 has to be fixed.

RudolfWeeber · 2019-07-01T08:26:53Z

Actually, the maximum safe skin might be available from s.cell_system.get_state()["max_skin"]

RudolfWeeber · 2019-08-06T07:47:49Z

Did this re-occur, since the adjust_max_skin was added in the tune_skin() call?
Otherwise, it can be closed.

jngrad · 2019-08-07T15:11:00Z

Hasn't yet in CI. But I was able to trigger a similar error today on coyote8 after 493 retries:

[...same as in the original logfile...]
  File "/home/espresso/espresso/build/testsuite/scripts/benchmarks/local_benchmarks/p3m_processed.py", line 164, in <module>
    adjust_max_skin=True)))
  File "cellsystem.pyx", line 308, in espressomd.cellsystem.CellSystem.tune_skin
  File "utils.pyx", line 261, in espressomd.utils.handle_errors
Exception: Error during tune_skin: ERROR: number of cells 1 is smaller than minimum 8 (interaction range too large or min_num_cells too large)

The last 3 lines changed. The error now throws in espressomd.cellsystem.CellSystem.tune_skin instead of espressomd.analyze.Analysis.energy.

RudolfWeeber · 2019-08-07T15:52:25Z

So that means that tune_skin() doesn't chose the maximum permissible skin for adjust_max_skin=True correctly?

RudolfWeeber · 2019-08-07T15:58:05Z

There is a global 'max_skin' domain_decomposition.cpp: max_skin = min_cell_size - max_cut; which is set to sth different than in tune_skin() double const max_permissible_skin = 0.5 * min_local_box_l - max_cut;

fweik · 2019-08-07T16:01:53Z

The first one is correct, the second one isn't. You should also maybe have a look at #3053, which clarifies some of these things.

jngrad · 2019-08-08T11:13:17Z

I've just merged #3053 locally in python and was able to get the same error message, plus a new one:

resulting parameters: mesh: (22 22 22), cao: 7, r_cut_iL: 3.6018e-01,
                      alpha_L: 9.0136e+00, accuracy: 9.9536e-05, time: 12.38

0: rs_mesh overflow! (pos 12.448619, nmp=24)
0: allowed coordinates: -1.600000 - 13.477258
0: rs_mesh overflow! (pos 12.482639, nmp=24)
0: allowed coordinates: -1.600000 - 13.477258
0: rs_mesh overflow! (pos 12.516053, nmp=24)
0: allowed coordinates: -1.600000 - 13.477258
[1617038c16f4:13557] *** Process received signal ***
[1617038c16f4:13557] Signal: Segmentation fault (11)
[1617038c16f4:13557] Signal code:  (128)
[1617038c16f4:13557] Failing at address: (nil)
[1617038c16f4:13557] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f2b8a88b390]
[1617038c16f4:13557] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x22)[0x7f2b8a534512]
[1617038c16f4:13557] [ 2] /home/espresso/espresso/build3/src/core/EspressoCore.so.4(Particle::~Particle()+0x3c)[0x7f2b88d3ffbc]
[1617038c16f4:13557] [ 3] /home/espresso/espresso/build3/src/core/EspressoCore.so.4(invalidate_ghosts()+0x6f)[0x7f2b88dab89f]
[1617038c16f4:13557] [ 4] /home/espresso/espresso/build3/src/core/EspressoCore.so.4(cells_resort_particles(int)+0x2b)[0x7f2b88d3babb]
[1617038c16f4:13557] [ 5] /home/espresso/espresso/build3/src/core/EspressoCore.so.4(integrate_vv(int, int)+0x25a)[0x7f2b88db6e4a]
[1617038c16f4:13557] [ 6] /home/espresso/espresso/build3/src/core/EspressoCore.so.4(mpi_integrate(int, int)+0x77)[0x7f2b88d51ed7]
[1617038c16f4:13557] [ 7] /home/espresso/espresso/build3/src/core/EspressoCore.so.4(tune_skin(double, double, double, int, bool)+0x27b)[0x7f2b88e24a9b]
[1617038c16f4:13557] [ 8] /home/espresso/espresso/build3/src/python/espressomd/cellsystem.so(+0x1282f)[0x7f2b546cc82f]
[1617038c16f4:13557] [ 9] /home/espresso/espresso/build3/src/python/espressomd/script_interface.so(+0x1999c)[0x7f2b7e6af99c]
[1617038c16f4:13557] [10] /usr/bin/python3(PyObject_Call+0x47)[0x5c20e7]
...
[1617038c16f4:13557] *** End of error message ***
Segmentation fault (core dumped)

jngrad · 2019-08-28T08:27:29Z

master is still failing:
https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/-/jobs/153761

jngrad · 2019-09-06T12:29:49Z

even after merging #3132, it's still possible to get the p3m benchmark to fail on coyote7 after 200 trials: Exception: Error during tune_skin: ERROR: number of cells 1 is smaller than minimum 8 (interaction range too large or min_num_cells too large)

RudolfWeeber · 2019-09-10T16:02:55Z

Out of ideas. De-milestoning.

jngrad · 2019-11-28T22:39:57Z

We might as well disable this test in CI. We already know this benchmark cannot be used due to the non-deterministic nature of the P3M tuning function. It also fails due to the other, less frequent bug reported above. We can re-enable the test once the tuning function gets re-implemented.

fweik · 2019-11-29T10:39:58Z

I agree do disable it now. We could also use fixed parameters to avoid the problem.

…

On Thu, Nov 28, 2019, 23:39 Jean-Noël Grad ***@***.***> wrote: We might as well disable this test in CI. We already know this benchmark cannot be used due to the non-deterministic nature of the P3M tuning function. It also fails due to the other, less frequent bug reported above <#2924 (comment)>. We can re-enable the test once the tuning function gets re-implemented. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2924?email_source=notifications&email_token=AAG2FX6JJ4QDYCZ3HF2JX4TQWBCD5A5CNFSM4HY4JVAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFNQQEY#issuecomment-559613971>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG2FXYUQOCWZ6TXPKICAJTQWBCD5ANCNFSM4HY4JVAA> .

3358: Fix breaking tests on 4.1.1 Description of changes: - disable benchmark tests in CI jobs where the P3M benchmark test fails repeatedly (closes #2924) - increase tolerance of `field_coupling_fields` for i586 builds (partial fix for #3315)

jngrad changed the title ~~CI build failed for merged PR~~ P3M benchmark randomly fails CI Jun 18, 2019

jngrad mentioned this issue Jun 20, 2019

CI build failed for merged PR #2931

Closed

This was referenced Jun 26, 2019

CI build failed for merged PR #2944

Closed

CI build failed for merged PR #2954

Closed

jngrad mentioned this issue Jul 5, 2019

CI build failed for merged PR #2959

Closed

KaiSzuttor added this to the Espresso 4.1 milestone Jul 15, 2019

This was referenced Jul 19, 2019

No standard way of testing performance #1345

Open

CI build failed for merged PR #3002

Closed

RudolfWeeber mentioned this issue Jul 25, 2019

tune_skin: add adjust_max_skin feature. #3011

Merged

jngrad mentioned this issue Jul 29, 2019

CI build failed for merged PR #3020

Closed

jngrad mentioned this issue Aug 8, 2019

CI build failed for merged PR #3064

Closed

KaiSzuttor assigned RudolfWeeber Aug 9, 2019

jngrad mentioned this issue Aug 16, 2019

CI build failed for merged PR #3086

Closed

RudolfWeeber mentioned this issue Aug 22, 2019

Core: Correct maximum permissible skin in tune_skin() #3096

Merged

bors bot closed this as completed in 6ef8d36 Aug 22, 2019

jngrad reopened this Aug 28, 2019

jngrad mentioned this issue Aug 28, 2019

CI build failed for merged PR #3106

Closed

RudolfWeeber removed their assignment Sep 10, 2019

RudolfWeeber modified the milestones: Espresso 4.1, Espresso 5 Sep 10, 2019

jngrad mentioned this issue Oct 4, 2019

CI build failed for merged PR #3231

Closed

This was referenced Oct 18, 2019

CI build failed for merged PR #3264

Closed

CI build failed for merged PR #3275

Closed

This was referenced Nov 9, 2019

CI build failed for merged PR #3297

Closed

CI build failed for merged PR #3313

Closed

jngrad mentioned this issue Nov 18, 2019

CI build failed for merged PR #3322

Closed

mkuron mentioned this issue Nov 28, 2019

CI build failed for merged PR #3344

Closed

This was referenced Nov 29, 2019

Disable P3M benchmark test #3347

Closed

CI build failed for merged PR #3357

Closed

Fix breaking tests on 4.1.1 #3358

Merged

jngrad closed this as completed in #3358 Dec 5, 2019

jngrad removed this from the Espresso 5 milestone Jun 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P3M benchmark randomly fails CI #2924

P3M benchmark randomly fails CI #2924

espresso-ci commented Jun 18, 2019

jngrad commented Jun 18, 2019

RudolfWeeber commented Jun 20, 2019

fweik commented Jun 21, 2019

jngrad commented Jun 21, 2019

fweik commented Jun 21, 2019

RudolfWeeber commented Jul 1, 2019

RudolfWeeber commented Jul 1, 2019

RudolfWeeber commented Aug 6, 2019

jngrad commented Aug 7, 2019

RudolfWeeber commented Aug 7, 2019 via email

RudolfWeeber commented Aug 7, 2019 via email

fweik commented Aug 7, 2019

jngrad commented Aug 8, 2019

jngrad commented Aug 28, 2019

jngrad commented Sep 6, 2019

RudolfWeeber commented Sep 10, 2019

jngrad commented Nov 28, 2019

fweik commented Nov 29, 2019 via email

P3M benchmark randomly fails CI #2924

P3M benchmark randomly fails CI #2924

Comments

espresso-ci commented Jun 18, 2019

jngrad commented Jun 18, 2019

RudolfWeeber commented Jun 20, 2019

fweik commented Jun 21, 2019

jngrad commented Jun 21, 2019

fweik commented Jun 21, 2019

RudolfWeeber commented Jul 1, 2019

RudolfWeeber commented Jul 1, 2019

RudolfWeeber commented Aug 6, 2019

jngrad commented Aug 7, 2019

RudolfWeeber commented Aug 7, 2019 via email

RudolfWeeber commented Aug 7, 2019 via email

fweik commented Aug 7, 2019

jngrad commented Aug 8, 2019

jngrad commented Aug 28, 2019

jngrad commented Sep 6, 2019

RudolfWeeber commented Sep 10, 2019

jngrad commented Nov 28, 2019

fweik commented Nov 29, 2019 via email