Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in atlas_fctest_trans_unstructured #249

Open
DJDavies2 opened this issue Dec 15, 2024 · 2 comments
Open

Failure in atlas_fctest_trans_unstructured #249

DJDavies2 opened this issue Dec 15, 2024 · 2 comments

Comments

@DJDavies2
Copy link
Contributor

What happened?

Running atlas_fctest_trans_unstructured with certain configs gives this failure:

183/216 Test #183: atlas_fctest_trans_unstructured ...........................Subprocess aborted***Exception: 0.23 sec
Runtime Error: *** Arithmetic exception: Floating overflow - aborting
/home/users/david.davies/cylc-run/mi-bg671/work/1/get_source_atlas/atlas/src/atlas_f/trans/atlas_Trans_module.F90, line 352: Error occurred in ATLAS_TRANS_MODULE:INVTRANS_VORDIV2WIND_FIELD
/data/users/david.davies/cylc-run/mi-bg671/work/1/get_source_atlas/atlas/src/tests/trans/fctest_trans_unstructured.F90, line 84: Called by FCTEST_ATLAS_TRANS_UNSTR:TEST_TRANS
/home/users/david.davies/cylc-run/mi-bg671/work/1/build_atlas_nag/build/src/tests/trans/fctest_trans_unstructured_main.F90, line 21: Called by RUN_FCTEST_ATLAS_TRANS_UNSTR

What are the steps to reproduce the bug?

Building and runinng with NAG/GCC, building with ectrans support. However this failure only occurs with some configs. I don't think the NAG is particularly relevant here (see below)(

Version

Head

Platform (OS and architecture)

Linus

Relevant log output

No response

Accompanying data

No response

Organisation

Met Office

@DJDavies2
Copy link
Contributor Author

I dug around a bit and printed some stuff out. I think the failure occurs in this line in src/atlas/trans/local/VorDivToUVLocal.cc:

                    rv[ir + ji]  = -chiIm * rvor[ii + ji] - psiM1 * rdiv[ir + ji + 1] + psiP1 * rdiv[ir + ji - 1];

However I think the root of the problem lies earlier than that. In src/tests/trans/fctest_trans_unstructured.F90 the call to spectral%create_field for sp_div_field and friends produce arrays that have 6 elements. I have traced the path of sp_div_field down into the code and I believe it ends up in extend_trunction in the file src/atlas/trans/local/TransLocal.cc as the old_spectra parameter. Printing some values and array indices out show that this line:

                      new_spectra[k++] = old_spectra[k_old++];

is going out of bounds in terms of the access to old_spectra. This results in undefined values being copied into new_spectra, which in some circumstances result in arithmetic exceptions in subsequent calculations such as the line noted above.

I have checked the hypothesis by adding some special code to extend_truncation so that only the first 6 elements of old_spectra are used (0 otherwise); this seems to fix atlas_fctest_trans_unstructured but is of course unacceptable as it breaks other tests.

I don't know what to do next about this. There is no way in extend_trunction of knowing how many elements old_spectra has so there would be no way of generalizing my hack even if it was considered okay in principle.

@wdeconinck
Copy link
Member

Thank you @DJDavies2 for digging! I will dig myself a bit further in the New Year.
I can refactor this and add some assertions to make sure we don't go out of bound silently, and then see how to prevent it.
These functions are implementation details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants