Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

espresso-v4.1.1: MpiCallbacks_test and matrix_vector_product fail on i586 (OpenSuse Tumbleweed) #3315

Closed
junghans opened this issue Nov 15, 2019 · 5 comments
Assignees

Comments

@junghans
Copy link
Member

64-bit works, but i586 fails with:

[ 1083s]  6/68 Test  #6: MpiCallbacks_test ................***Failed    2.26 sec
[ 1083s] Running 8 test cases...
[ 1083s] Running 8 test cases...
[ 1083s] /home/abuild/rpmbuild/BUILD/espresso/src/core/unit_tests/MpiCallbacks_test.cpp(55): �[1;31;49merror: in "invoke_test": check f(i, d) == (invoke<decltype(f), int, double>(f, ia)) has failed [126.14149999999999 != 126.14149999999999]�[0;39;49m
[ 1083s] /home/abuild/rpmbuild/BUILD/espresso/src/core/unit_tests/MpiCallbacks_test.cpp(55): �[1;31;49merror: in "invoke_test": check f(i, d) == (invoke<decltype(f), int, double>(f, ia)) has failed [126.14149999999999 != 126.14149999999999]�[0;39;49m
[ 1083s] 
[ 1083s] �[1;31;49m*** 1 failure is detected in the test module "MpiCallbacks test"
[ 1083s] �[0;39;49m
[ 1083s] �[1;31;49m*** 1 failure is detected in the test module "MpiCallbacks test"
[ 1083s] �[0;39;49m-------------------------------------------------------
[ 1083s] Primary job  terminated normally, but 1 process returned
[ 1083s] a non-zero exit code.. Per user-direction, the job has been aborted.
[ 1083s] -------------------------------------------------------
[ 1083s] --------------------------------------------------------------------------
[ 1083s] mpiexec detected that one or more processes exited with non-zero status, thus causing
[ 1083s] the job to be terminated. The first process to do so was:
[ 1083s] 
[ 1083s]   Process name: [[38606,1],1]
[ 1083s]   Exit code:    201
[ 1083s] --------------------------------------------------------------------------
[ 1083s] 

and

[ 1085s] 48/68 Test #48: matrix_vector_product ............***Failed    0.01 sec
[ 1085s] Running 1 test case...
[ 1085s] /home/abuild/rpmbuild/BUILD/espresso/src/utils/tests/matrix_vector_product.cpp(34): �[1;31;49merror: in "inner_product": check result[i] == boost::inner_product(matrix[i], vector, 0.0) has failed�[0;39;49m
[ 1085s] /home/abuild/rpmbuild/BUILD/espresso/src/utils/tests/matrix_vector_product.cpp(34): �[1;31;49merror: in "inner_product": check result[i] == boost::inner_product(matrix[i], vector, 0.0) has failed�[0;39;49m
[ 1085s] /home/abuild/rpmbuild/BUILD/espresso/src/utils/tests/matrix_vector_product.cpp(34): �[1;31;49merror: in "inner_product": check result[i] == boost::inner_product(matrix[i], vector, 0.0) has failed�[0;39;49m
[ 1085s] 
[ 1085s] �[1;31;49m*** 3 failures are detected in the test module "matrix_vector_product test"
[ 1085s] �[0;39;49m
[ 1085s]

Details here

I didn't package v4.1 for OpenSuse, so I am not sure if was introduced in v4.1 or v4.1.1.

@junghans
Copy link
Member Author

And:

[ 1703s] 106/143 Test  #84: collision_detection ...............................***Failed   15.09 sec
[ 1703s] .ERROR: Particle 1 moved more than one local box length in one timestep.
[ 1703s] E......
[ 1703s] ======================================================================
[ 1703s] ERROR: test_bind_at_point_of_collision (__main__.CollisionDetection)
[ 1703s] ----------------------------------------------------------------------
[ 1703s] Traceback (most recent call last):
[ 1703s]   File "/home/abuild/rpmbuild/BUILD/espresso/build/testsuite/python/collision_detection.py", line 225, in test_bind_at_point_of_collision
[ 1703s]     self.run_test_bind_at_point_of_collision_for_pos(np.array((0, 0, 0)))
[ 1703s]   File "/home/abuild/rpmbuild/BUILD/espresso/build/testsuite/python/collision_detection.py", line 140, in run_test_bind_at_point_of_collision_for_pos
[ 1703s]     self.s.integrator.run(3000)
[ 1703s]   File "integrate.pyx", line 104, in espressomd.integrate.Integrator.run
[ 1703s]   File "utils.pyx", line 264, in espressomd.utils.handle_errors
[ 1703s] Exception: Encountered errors during integrate: ERROR: Particle 1 moved more than one local box length in one timestep.
[ 1703s] 
[ 1703s] ----------------------------------------------------------------------
[ 1703s] Ran 8 tests in 14.228s
[ 1703s] 
[ 1703s] FAILED (errors=1)
[ 1703s] -------------------------------------------------------
[ 1703s] Primary job  terminated normally, but 1 process returned
[ 1703s] a non-zero exit code.. Per user-direction, the job has been aborted.
[ 1703s] -------------------------------------------------------
[ 1703s] --------------------------------------------------------------------------
[ 1703s] mpiexec detected that one or more processes exited with non-zero status, thus causing
[ 1703s] the job to be terminated. The first process to do so was:
[ 1703s] 
[ 1703s]   Process name: [[2638,1],0]
[ 1703s]   Exit code:    1
[ 1703s] --------------------------------------------------------------------------
[ 1703s] 

bors bot added a commit that referenced this issue Nov 20, 2019
3327: core: Fixed bitwise comparison of floating point numbers r=fweik a=fweik

Partial fix for #3315.


Co-authored-by: Florian Weik <fweik@icp.uni-stuttgart.de>
@jngrad
Copy link
Member

jngrad commented Dec 5, 2019

Build log for python3-espressomd/openSUSE_Tumbleweed/i586:

[  997s]       Start 16: field_coupling_couplings
[  997s] 16/66 Test #16: field_coupling_couplings .........   Passed    0.01 sec
[  997s]       Start 17: field_coupling_fields
[  997s] 17/66 Test #17: field_coupling_fields ............***Failed    0.01 sec
[  997s] Running 7 test cases...
[  997s] /home/abuild/rpmbuild/BUILD/espresso/src/core/unit_tests/field_coupling_fields_test.cpp(397): [1;31;49merror: in "interpolated_vector_field": absolute value of (interpolated_value[0] - field_value[0]).norm(){5.7034162359677394e-16} exceeds 2.2204460492503131e-16[0;39;49m
[  997s] 
[  997s] [1;31;49m*** 1 failure is detected in the test module "AutoParameter test"
[  997s] [0;39;49m
[  997s] 
[  997s]       Start 18: field_coupling_force_field
[  997s] 18/66 Test #18: field_coupling_force_field .......   Passed    0.01 sec

The machine precision tolerance of another unit test in the same file was increased last year (213101a) due to a similar issue on i386. I'll increase the test tolerance.

jngrad added a commit that referenced this issue Dec 5, 2019
3358: Fix breaking tests on 4.1.1

Description of changes:
- disable benchmark tests in CI jobs where the P3M benchmark test fails repeatedly (closes #2924)
- increase tolerance of `field_coupling_fields` for i586 builds (partial fix for #3315)
@jngrad jngrad added this to the Espresso 4.1.3 milestone Jan 21, 2020
@jngrad
Copy link
Member

jngrad commented Jan 21, 2020

@junghans
Copy link
Member Author

Fix on its way into OpenSUSE: https://build.opensuse.org/request/show/766150

@jngrad
Copy link
Member

jngrad commented Feb 24, 2020

Since our CI infrastructure doesn't have i586, we cannot reproduce the collision_detection failure on our side, and the error message is not sufficient to determine the cause, so we will not investigate it further. The test has been disabled for i586 (openSUSE:Factory/python3-espressomd/python3-espressomd.spec at line 102) and the package now builds on i586. I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants