Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fb_smc_speedup: Performance optimize on SMC grid #30

Closed
wants to merge 1 commit into from

Conversation

Yun1Liu
Copy link

@Yun1Liu Yun1Liu commented Jun 9, 2021

Pull Request Summary

Added extra 1-dimension arrays with duplicated data for IJKUFc, IJKVFc, IJKCel, to make the data more contiguous, which can save memory bandwidth, achieve better performance.

Description

The performance of SMC grid is limited by the memory bandwidth, because it needs to read
the 2-dimension arrays IJKUFc, IJKVFc, IJKCel many times.
In the loop, only 2 cells of the column (such as IJKCel(3,j), IJKCel(4,j)) are used,
the pre-fetch on the other cells is wasted.
So these arrays are kind of "AOS" (Array of Structure), I built some extra 1-dimension arrays
like "SOA" to make the data more contiguous, achieve better performance.

On the PRC NMEFC workloads, these changes can achieve 17% performance increasement.

Issue(s) addressed

  • Is there an issue associated with this development (bug fix, enhancement, new feature)?
    This is an enhancement, I didn't open an issue before.
  • fixes #<issue_number>
  • fixes noaa-emc/ww3/issues/<issue_number>

Check list

Testing

  • How were these changes tested?
    This has been tested on NMEFC's workload, and regtests/ww3_tp2.16, on Intel Icelake and Cascadelake platforms.

  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
    tested on ww3_tp2.16.

  • If a new feature was added, was a new regression test added?

  • Have regression tests been run?
    Yes

  • Which compiler / HPC you used to run the regression tests in the PR?
    On Intel ifort & mpiifort.

  • Please provide the summary output of matrix.comp (matrix.Diff.out, matrixCompFull.out and matrixCompSummary.out):

Summary:
******************************************************
*** compare WAVEWATCH III matrix of regression tests ***
******************************************************

base directory : /home/liuyun/WW3/regtests/
comp directory : /home/liuyun/Yun1Liu/WW3/regtests/
test(s) :
ww3_tp2.16


********************* non-identical cases ****************************



************************ identical cases *****************************


ww3_tp2.16/./work


******************** summary of comparison ***************************
********** only results of non-identical cases are listed ************
****** if less than 10 files differ for a case, they are listed ******


Full:

   ******************************************************
 ***  compare WAVEWATCH III matrix of regression tests  ***
   ******************************************************

base directory : /home/liuyun/WW3/regtests/
comp directory : /home/liuyun/Yun1Liu/WW3/regtests/
test(s) :
ww3_tp2.16


********************* non-identical cases ****************************



************************ identical cases *****************************


ww3_tp2.16/./work


******************* full output of comparison ************************


  • test case: ww3_tp2.16; test run: ./work

found 36 files in base directory
found 36 files in compare directory
restart.ww3 are identical (binary)
ww3.68060600.wnd are identical
ww3.68060600.hs are identical
ww3.68060600.t01 are identical
ww3.68060603.wnd are identical
ww3.68060603.hs are identical
ww3.68060603.t01 are identical
ww3.68060606.wnd are identical
ww3.68060606.hs are identical
ww3.68060606.t01 are identical
ww3.68060609.wnd are identical
ww3.68060609.hs are identical
ww3.68060609.t01 are identical
ww3.68060612.wnd are identical
ww3.68060612.hs are identical
ww3.68060612.t01 are identical
ww3.68060615.wnd are identical
ww3.68060615.hs are identical
ww3.68060615.t01 are identical
ww3.68060618.wnd are identical
ww3.68060618.hs are identical
ww3.68060618.t01 are identical
ww3.68060621.wnd are identical
ww3.68060621.hs are identical
ww3.68060621.t01 are identical
ww3.68060700.wnd are identical
ww3.68060700.hs are identical
ww3.68060700.t01 are identical
ww3_grid.out are identical
mod_def.ww3 are identical (binary)
ww3_strt.out are identical
skipped ww3_shel.out
log.ww3 are identical (filtered)
out_grd.ww3 are identical (binary)
out_pnt.ww3 are identical (binary)
ww3_outf.out are identical

Please indicate the expected changes in the outputs (excluding the known list of non-identical tests).

… IJKUFc, IJKVFc, IJKCel, to make the data more contiguous, which can save memory bandwidth, achieve better performance.
@ukmo-jianguo-li
Copy link

The changes are fine to me. Just need to work out how to submit my review and complete it.

Copy link

@ukmo-jianguo-li ukmo-jianguo-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are fine to me. Thanks @Yun1Liu

@ukmo-ccbunney ukmo-ccbunney deleted the branch ukmo-waves:staging_may2021 September 14, 2021 13:44
@ukmo-ccbunney
Copy link
Member

Hello @Yun1Liu
My apologies - this PR completely slipped under my radar.
I will run some tests next week and raise a PR with NOAA if all looks good.
Thank you for your contribution!

@ukmo-ccbunney
Copy link
Member

@Yun1Liu can you make sure your branch is up-to-date with the NOAA develop branch please?
Thanks.

@ukmo-ccbunney
Copy link
Member

@Yun1Liu - I have taken the liberty of updating your branch with the latest development from NOAA (there have been some quite substantial changes to the code recently).; it is available on a branch in this repo:

https://github.com/ukmo-waves/WW3/tree/fb_smc_speedup

Regression test results are unchanged.

If you like, I can raise a new PR for this change with NOAA?

@Yun1Liu
Copy link
Author

Yun1Liu commented Feb 4, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants