forked from mom-ocean/MOM6
-
Notifications
You must be signed in to change notification settings - Fork 62
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Intrinsics: Faster cuberoot scaling functions
This patch replaces the intrinsic-based exponent rescaling with explicit bit manipulation of the floating point number. This appears to produce a ~2.5x speedup of the solver, reducing its time from embarassingly slow to disappointingly slow. It is slightly faster than the GNU cbrt function, but still about 3x slower than the Intel SVML cbrt function. Timings (s) (16M array, -O3 -mavx -mfma) | Solver | -O2 | -O3 | |---------------------|-------|-------| | GNU x**1/3 | 0.225 | 0.198 | | GNU cuberoot before | 0.418 | 0.412 | | GNU cuberoot after | 0.208 | 0.187 | | Intel x**1/3 | 0.068 | 0.067 | | Intel before | 0.514 | 0.507 | | Intel after | 0.213 | 0.189 | At least one issue here is that Intel SVML is using fast vectorized logic operators whereas the Fortran intrinsics are replaced with slower legacy scalar versions. Not sure there is much we could even do about that without complaining to vendors. Also, I'm sure there's magic in their solvers which we are not capturing. Regardless, I think this is a major improvement. I do not believe it will change answers, but probably a good idea to verify this and get it in before committing any solutions using cuberoot().
- Loading branch information
1 parent
76f0668
commit 60cb551
Showing
1 changed file
with
98 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters