Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ducc0 does not compile on MacOS using gcc #39

Closed
DiamonDinoia opened this issue Jul 30, 2024 · 31 comments
Closed

Ducc0 does not compile on MacOS using gcc #39

DiamonDinoia opened this issue Jul 30, 2024 · 31 comments

Comments

@DiamonDinoia
Copy link
Contributor

Dear @mreineck,

I do not know if you plan to support ducc0 on MacOS. It works with llvm on that toolchain but it breaks with gcc. It seems a small issue. Could you have a look? We would like to support ducc in finufft for MacOS GCC users.

[1/75] Building CXX object CMakeFiles/ducc0.dir/_deps/ducc0-src/src/ducc0/math/gl_integrator.cc.o
[2/75] Building CXX object CMakeFiles/ducc0.dir/_deps/ducc0-src/src/ducc0/infra/mav.cc.o
FAILED: CMakeFiles/ducc0.dir/_deps/ducc0-src/src/ducc0/infra/mav.cc.o 
/usr/local/bin/g++-14  -I/Users/runner/work/finufft/finufft/build/_deps/ducc0-src/src -O3 -DNDEBUG -isysroot /Applications/Xcode_15.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -march=native -funroll-loops -ffp-contract=fast -fno-math-errno -fno-signed-zeros -fno-trapping-math -fassociative-math -freciprocal-math -fmerge-all-constants -ftree-vectorize -fimplicit-constexpr -fcx-limited-range -O3 -MD -MT CMakeFiles/ducc0.dir/_deps/ducc0-src/src/ducc0/infra/mav.cc.o -MF CMakeFiles/ducc0.dir/_deps/ducc0-src/src/ducc0/infra/mav.cc.o.d -o CMakeFiles/ducc0.dir/_deps/ducc0-src/src/ducc0/infra/mav.cc.o -c /Users/runner/work/finufft/finufft/build/_deps/ducc0-src/src/ducc0/infra/mav.cc
/Users/runner/work/finufft/finufft/build/_deps/ducc0-src/src/ducc0/infra/mav.cc: In function 'void ducc0::detail_mav::opt_shp_str(fmav_info::shape_t&, std::vector<std::vector<long int> >&)':
Error: /Users/runner/work/finufft/finufft/build/_deps/ducc0-src/src/ducc0/infra/mav.cc:74:25: error: 'min_element' was not declared in this scope
   74 |       auto dim = size_t(min_element(strcrit.begin(),strcrit.begin()+lastdim)
      |                         ^~~~~~~~~~~
[3/75] Building CXX object CMakeFiles/ducc0.dir/_deps/ducc0-src/src/ducc0/math/gridding_kernel.cc.o
[4/75] Building CXX object CMakeFiles/ducc0.dir/_deps/ducc0-src/src/ducc0/infra/string_utils.cc.o
[5/75] Building CXX object CMakeFiles/ducc0.dir/_deps/ducc0-src/src/ducc0/infra/threading.cc.o
[6/75] Building CXX object CMakeFiles/finufft_f32.dir/src/finufft.cpp.o
[7/75] Building CXX object CMakeFiles/finufft_f32.dir/src/fft.cpp.o
ninja: build stopped: subcommand failed.
Error: Process completed with exit code 1.

Please see:
https://github.com/flatironinstitute/finufft/actions/runs/10167458503/job/28119953764

@mreineck
Copy link
Owner

I must admit that I don't understand what is going wrong here. min_element is defined in std in <algorithm>, and it should be visible at this location. Do you have any insights on this?

@mreineck
Copy link
Owner

Hmm ... I don't see any explicit flag for C++17 in the compiler call in the log. Perhaps this is the problem; min_element was introduced in C++17.
(As far as I know, g++ should default to C++17 as language standard, but I'm not sure whether it will include the proper headers for the tandard library on Mac OS...)

@DiamonDinoia
Copy link
Contributor Author

It seems fixed in master. The v34 seems to have the issue, it is missing #include <algorithm>.

@DiamonDinoia
Copy link
Contributor Author

However master fft breaks one of our accuracy tests by a small amount:
https://github.com/flatironinstitute/finufft/actions/runs/10184292576/job/28174805483

@mreineck
Copy link
Owner

mreineck commented Aug 1, 2024

It seems fixed in master. The v34 seems to have the issue, it is missing #include .

How do we address this? Should I release version 0.35 soon?

@mreineck
Copy link
Owner

mreineck commented Aug 1, 2024

However master fft breaks one of our accuracy tests by a small amount:

OK, this is strange. I need to take a closer look at the details...

@DiamonDinoia
Copy link
Contributor Author

It seems fixed in master. The v34 seems to have the issue, it is missing #include .

How do we address this? Should I release version 0.35 soon?

v35 still has the issue of the missing include. I can point finufft to master but we should check why the accuracy is slightly off with no fast-math

@mreineck
Copy link
Owner

mreineck commented Aug 2, 2024

Version 0.35 is not out yet ... when you say "v35 still has the issue of the missing include", what did you test against?

It is possible that FFT accuracy is a tiny little bit worse without fast math, since then FMA instructions are disabled, which incurs two rounding errors instead of one for a multiply-and-add. Still, the difference is so small that it really shouldn't make the difference between pass and fail in a NUFFT test...

@DiamonDinoia
Copy link
Contributor Author

DiamonDinoia commented Aug 2, 2024

Version 0.35 is not out yet ... when you say "v35 still has the issue of the missing include", what did you test against?

The commit that updated the changelog here: 920155c

It is possible that FFT accuracy is a tiny little bit worse without fast math, since then FMA instructions are disabled, which incurs two rounding errors instead of one for a multiply-and-add. Still, the difference is so small that it really shouldn't make the difference between pass and fail in a NUFFT test...

We manually add flags for FMA to ducc and finufft to avoid that issue. But it seems that in single precision for type 3 (that does multiple FFTs) the test does not pass. We do a NUFFT with requested precision 1e-5 and we allow the result to be off by 2e-4 but we get an error of 0.000275 when doing a big FFT and checking one resulting point against the analytical formula.

It is a tiny difference but worth investigating.

@mreineck
Copy link
Owner

mreineck commented Aug 2, 2024

The commit that updated the changelog here: 920155c

Ah, OK, thanks! I update the ChangeLog for the upcoming version from time to time, to keep track of the new features. As long as the version number in setup.py doesn't change, this doesn't mean much :-)

We do a NUFFT with requested precision 1e-5 and we allow the result to be off by 2e-4 but we get an error of 0.000275 when doing a big FFT and checking one resulting point against the analytical formula.

I'm looking at roughly line 188 in finufft3d_test.cpp. Is that the correct place?
If so, I'm not entirely sure that the error calculation does the correct thing. Imagine having a single nonuniform output point, which (let's assume) happens to have value F=0. In that situation, everything explodes, and for small F, measured errors increase a lot. I don't think that measuring the relative error on the output side will be the correct way of testing; you probably need to compare to absolute magnitudes on the input side, but it's definitely not trivial.

@DiamonDinoia
Copy link
Contributor Author

@ahbarnett, @lu1and10 what do you think?

@lu1and10
Copy link

lu1and10 commented Aug 2, 2024

@ahbarnett, @lu1and10 what do you think?

I'm not sure about what the test check tolerance should be, @ahbarnett may have more insight.
In the failing test for single precision, the code with ducc0 resulting error is 2.75e-4 without fp:fast flag, the test check tolerance is 2e-4, they are on the same magnitude though, the test will pass if we adjust tolerance to 3e-4.

@ahbarnett The resulting failed error 2.75e-4 is from the line https://github.com/flatironinstitute/finufft/blob/893bf6f2acf8afd4a716d13f534f517c547ccb02/test/finufft3d_test.cpp#L188

@lu1and10
Copy link

lu1and10 commented Aug 2, 2024

How do we address this? Should I release version 0.35 soon?

@mreineck it will be good to release 0.35 with the header fixed sometime, seems the default branch latest commit works, but ducc0_0_34_0 still have issue on MacOs gcc finding min_element header.
@ahbarnett makefile tries too pick ducc0_0_34_0 https://github.com/flatironinstitute/finufft/blob/7ecc0c1460e27de731d8dffb7220ccd4055bf879/makefile#L69
so may fail with macOS gcc.

@ahbarnett
Copy link

Several issues here: firstly, I thought ducc/CMakeLists.txt and cmake/setupDUCC.cmake enforced this:
target_compile_features(ducc0 PRIVATE cxx_std_17)
so I don't understand why C++17 wasn't being used on all DUCC builds?

Now to math tests in FINUFFT: indeed I have been relying on pseudorandom numbers (with fixed seed), and testing both i) the error at one output point (relative to the maximum across all outputs, ie the infinity-norm), and ii) the relative l2 error for the vector of outputs.
So, firstly, Martin's concern about "divide by a small number" cannot happen (I already thought of that in 2017...). You'll see the division by the infnorm here:
https://github.com/flatironinstitute/finufft/blob/893bf6f2acf8afd4a716d13f534f517c547ccb02/test/finufft3d_test.cpp#L188

The point of having both i) and ii) available is that:

  • i) can be evaluated for arbitrarily large N,M, but is susceptible to randomness (it's a sample-size of 1), but
  • ii), although it is almost immune to random variation, can only be computed in reasonable time via a direct summation when NM < 1e7 (say, for a quick test suite).

Both tests have their place. For small NM as in our CI, eg test/check_finufft.sh SINGLE, it makes sense to test both.

Now, errors in FINUFFT should be dominated by kernel choice, not by the FFT.
And the pseudorandom variation in FINUFFT test-codes cannot be affected by the choice of FFT.
So, making a change to DUCC0 FFT flags causing a change in FINUFFT errors is concerning.
Libin, is there a way you could drill down numerically to see what effect the fast flag has on the FINUFFT type 1 output vector (test could be done in matlab, since convenient) ? Or make sure this isn't a change in the thread number in CI? (see below).

For now I am ok with the idea that the rand generator is different in OSX from linux, so FINUFFT single-prec threshold could be raised from 2e-4 to 3e-4. I just noticed the seed is set to the omp_thread_num, so if the test is run on a different number of threads, the results will be different.
I am now thinking of removing test i) above from the exit-code in finufft?d_test.cpp, to make it less stochastic.

@lu1and10
Copy link

lu1and10 commented Aug 2, 2024

Libin, is there a way you could drill down numerically to see what effect the fast flag has on the FINUFFT type 1 output vector (test could be done in matlab, since convenient) ? Or make sure this isn't a change in the thread number in CI? (see below).

When @DiamonDinoia and I looked at the failing ci runs, it turns out not deterministic. I recall that some times, if we rerun once, the error is less than 2e-4 passing the test, some times it's 2.75e-4. All these only happens with Windows msvc, linux and Mac were fine. Maybe it's related to the multi-threading with ducc. Though with fftw it always less than 2e-4 on all OSes.

I'm not sure how to drill down which sub flags inside msvc fp:fast makes the error less. @DiamonDinoia do you know how to debug this in msvc for the detail optimization causing this? Normally I thought fp:fast increase error, but this one fp:fast reduced the error with ducc?

@DiamonDinoia
Copy link
Contributor Author

/fp:fast can make the error smaller. It might rearrange the instructions so that an FMA is possible by converting divisions to multiplications.

How to debug? Not sure, I cannot trigger the issue on my windows laptop.

Given that is not deterministic, it can be caused by the RNG, msvc is likely to use a different implementation than the others.

@lu1and10
Copy link

lu1and10 commented Aug 2, 2024

How to debug? Not sure, I cannot trigger the issue on my windows laptop.

I guess we see the error more often because of the multiple runs with msvc using the ci matrix, if it's deterministic with one thread and the same RNG seed that will be easier to debug.

@DiamonDinoia
Copy link
Contributor Author

DiamonDinoia commented Aug 3, 2024

Part of the FINUFFT de-macroize process includes using c++ RNGs, some of them like MT32_64 have the same implementation across platform that will allow to have consistent tests on all platforms. So, we can delay this as it seems a testing issue more than a FINUFFT/DUCC issue.

@mreineck
Copy link
Owner

mreineck commented Aug 5, 2024

Sorry for the delay, I took a few days off ...

So, firstly, Martin's concern about "divide by a small number" cannot happen (I already thought of that in 2017...). You'll see the division by the infnorm here:

It can happen if the number of outputs is 1, and I think that is the case here. (It could also happen for more than one output, but I agree that this becomes extremely improbable with increasing number of outputs.)

[Edit: The reason why I think this is Marco's statement further above that "when doing a big FFT and checking one resulting point against the analytical formula", but I might be misinterpreting this.]

@mreineck
Copy link
Owner

mreineck commented Aug 5, 2024

Regarding reproducibility of ducc.fft results with multithreading: the design is such that the results should be fully deterministic as long as the requested number of threads does not change, but results from runs with different numbers of threads may be very slightly different.

@DiamonDinoia
Copy link
Contributor Author

Hi Martin before releasing ducc, I would like to improve the cmake file. Can we also add a makefile so to include it in finufft?

Also, may I ask why did you revert to -march=native instead of the flags I proposed? I noticed the error gets smaller but that comes with some issues.

@mreineck
Copy link
Owner

mreineck commented Aug 6, 2024

The issue was that -fcx-limited-range is not recognized by some versions of clang (v16 and v17 give an error, v18 can deal with it), and so some CI runs failed.

We can add a Makefile as well, but I'm not sure how much value that will add. In the end, compiler flags will always depend on the "calling" application.

@DiamonDinoia
Copy link
Contributor Author

DiamonDinoia commented Aug 6, 2024

Indeed llvm does not support it yet. That can be removed. All it does is faster complex division/multiplication by skipping -nan checking. One can overload std::complex or write his own class that does skip this controls if crucial.

In cmake library flags are usually not set by the user but each flag is compiled with their own.

@mreineck
Copy link
Owner

mreineck commented Aug 7, 2024

I don't think I will go to such lengths just to avoid the -ffast-math flag. As long as it s not used in the linking step, this is not problematic. And yes, extra checks witin the complex multiplication routine would be quite bad for some parts of ducc0.

@ahbarnett
Copy link

ahbarnett commented Aug 7, 2024 via email

@DiamonDinoia
Copy link
Contributor Author

I don't think I will go to such lengths just to avoid the -ffast-math flag. As long as it s not used in the linking step, this is not problematic. And yes, extra checks witin the complex multiplication routine would be quite bad for some parts of ducc0.

I think using --ffinite-math-only would achieve the same effect with the other flags.

The problem with -ffast-math is that a python library compiled with it (python loads shared libraries when loading c/c++) broke my code by changing the floating point env of my cpu breaking the following code.

@mreineck
Copy link
Owner

mreineck commented Aug 7, 2024

I fully agree that -fftast-math can break things, but again (and the links confirm that, as far as I can tell) only if you link the shared library with that flag, but not when you compile the object files for tha shared library with the flag.

Is it possible to discern between these two situations with cmake? I'm doing this in my setup.py approach, and I'm fairly sure the resulting .so is safe to load.

@mreineck
Copy link
Owner

mreineck commented Aug 7, 2024

g++ -ffast-math -c a.cc
should be fine and does not have a bad influence on other code

g++ -ffast-math a.o -shared -o libfoo.so
is dangerous and should be avoided at all costs.

@DiamonDinoia
Copy link
Contributor Author

in cmake there are:

target_compile_options( [target] [PRIVATE|PUBLIC|INTERFACE] [flags])  # compile only
target_link_options( [target] [PRIVATE|PUBLIC|INTERFACE] [flags]) # link options

@mreineck
Copy link
Owner

mreineck commented Aug 7, 2024

Then I think having -ffast-math in target_compile_options should be safe.

@mreineck
Copy link
Owner

mreineck commented Jan 1, 2025

Closing this for now, as I think all points have been discussed.

@mreineck mreineck closed this as completed Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants