Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving to use -march=coreavx2 instead of xCORE-AVX2 with Intel Fortran? #240

Closed
mathomp4 opened this issue Dec 2, 2021 · 3 comments · Fixed by #241
Closed

Moving to use -march=coreavx2 instead of xCORE-AVX2 with Intel Fortran? #240

mathomp4 opened this issue Dec 2, 2021 · 3 comments · Fixed by #241
Assignees
Labels
question Further information is requested

Comments

@mathomp4
Copy link
Member

mathomp4 commented Dec 2, 2021

With the proliferation of a lot of AMD EPYC (Rome) nodes at NAS, we might want to change the architecture flags for Intel Fortran in GEOS. Currently, we use -xCORE-AVX2 on Intel processors, but this has the problem that it uses instructions that don't exist on the AMD chips.

On AMD we can use -march=core-avx2 and this will work on both Intel (Haswell+) and AMD Rome with no changes needed. (I'm not sure if they'd be non-zero-diff between Intel and AMD, but they should run. This needs to be tested at NAS.)

But, it is non-zero-diff and possibly slower if we are somehow crucially using one of the AVX2 instructions only in -xCORE-AVX2. I'm doing some runs now to see if I see a performance hit.

@mathomp4 mathomp4 added the question Further information is requested label Dec 2, 2021
@mathomp4
Copy link
Member Author

mathomp4 commented Dec 2, 2021

Here are some (on-going) results.

These are 1-day runs of GEOSgcm on the Cascade Lakes at NCCS with no history and no checkpointing. I built each as both Release and Aggressive and these are Model Throughput in days/day.

Resolution Release xCore Release march
C360 L072 135.384 139.923
C360 L181 53.638 54.865
C720 L072 57.344 58.237
C720 L181 22.058 22.354
Resolution Agg xCore Agg march
C360 L072 154.028 159.308
C360 L181 60.027 62.031
C720 L072 65.070 66.079
C720 L181 24.852 25.392

@mathomp4
Copy link
Member Author

mathomp4 commented Dec 3, 2021

Pending test by @aoloso and myself, I think we might recommend to @wmputman and @sdrabenh to update the arch flag for Intel Fortran. Everything seems pretty good performance wise.

@mathomp4
Copy link
Member Author

mathomp4 commented Dec 3, 2021

Tests at NAS have shown that if we use -march=core-avx2 then we gain quite a bit of "ease" with GEOS.

I built GEOSgcm using -march=core-avx2 once on pfe (Intel chip) and once on a Rome node (AMD chip). I then made four experiments:

  1. Build on Intel, Run on Intel
  2. Build on AMD, Run on Intel
  3. Build on Intel, Run on AMD
  4. Build on AMD, Run on AMD

When all was done, 1 == 2 and 3 == 4. That is, no matter where you build, you can get the same answers on the same architecture.

Of course, a run on AMD will never be zero-diff to a run on Intel, but at least we have "weak" form of equivalence. (Or "strong"? Maybe @tclune and I need to come up with the strong/weak version of "running on different architectures" 😄 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants