fb_smc_speedup: Performance optimize on SMC grid #30
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Summary
Added extra 1-dimension arrays with duplicated data for IJKUFc, IJKVFc, IJKCel, to make the data more contiguous, which can save memory bandwidth, achieve better performance.
Description
The performance of SMC grid is limited by the memory bandwidth, because it needs to read
the 2-dimension arrays IJKUFc, IJKVFc, IJKCel many times.
In the loop, only 2 cells of the column (such as IJKCel(3,j), IJKCel(4,j)) are used,
the pre-fetch on the other cells is wasted.
So these arrays are kind of "AOS" (Array of Structure), I built some extra 1-dimension arrays
like "SOA" to make the data more contiguous, achieve better performance.
On the PRC NMEFC workloads, these changes can achieve 17% performance increasement.
Issue(s) addressed
This is an enhancement, I didn't open an issue before.
Check list
Testing
How were these changes tested?
This has been tested on NMEFC's workload, and regtests/ww3_tp2.16, on Intel Icelake and Cascadelake platforms.
Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
tested on ww3_tp2.16.
If a new feature was added, was a new regression test added?
Have regression tests been run?
Yes
Which compiler / HPC you used to run the regression tests in the PR?
On Intel ifort & mpiifort.
Please provide the summary output of matrix.comp (matrix.Diff.out, matrixCompFull.out and matrixCompSummary.out):
Summary:
******************************************************
*** compare WAVEWATCH III matrix of regression tests ***
******************************************************
base directory : /home/liuyun/WW3/regtests/
comp directory : /home/liuyun/Yun1Liu/WW3/regtests/
test(s) :
ww3_tp2.16
********************* non-identical cases ****************************
************************ identical cases *****************************
ww3_tp2.16/./work
******************** summary of comparison ***************************
********** only results of non-identical cases are listed ************
****** if less than 10 files differ for a case, they are listed ******
Full:
base directory : /home/liuyun/WW3/regtests/
comp directory : /home/liuyun/Yun1Liu/WW3/regtests/
test(s) :
ww3_tp2.16
********************* non-identical cases ****************************
************************ identical cases *****************************
ww3_tp2.16/./work
******************* full output of comparison ************************
found 36 files in base directory
found 36 files in compare directory
restart.ww3 are identical (binary)
ww3.68060600.wnd are identical
ww3.68060600.hs are identical
ww3.68060600.t01 are identical
ww3.68060603.wnd are identical
ww3.68060603.hs are identical
ww3.68060603.t01 are identical
ww3.68060606.wnd are identical
ww3.68060606.hs are identical
ww3.68060606.t01 are identical
ww3.68060609.wnd are identical
ww3.68060609.hs are identical
ww3.68060609.t01 are identical
ww3.68060612.wnd are identical
ww3.68060612.hs are identical
ww3.68060612.t01 are identical
ww3.68060615.wnd are identical
ww3.68060615.hs are identical
ww3.68060615.t01 are identical
ww3.68060618.wnd are identical
ww3.68060618.hs are identical
ww3.68060618.t01 are identical
ww3.68060621.wnd are identical
ww3.68060621.hs are identical
ww3.68060621.t01 are identical
ww3.68060700.wnd are identical
ww3.68060700.hs are identical
ww3.68060700.t01 are identical
ww3_grid.out are identical
mod_def.ww3 are identical (binary)
ww3_strt.out are identical
skipped ww3_shel.out
log.ww3 are identical (filtered)
out_grd.ww3 are identical (binary)
out_pnt.ww3 are identical (binary)
ww3_outf.out are identical
Please indicate the expected changes in the outputs (excluding the known list of non-identical tests).