Flags in compiler settings #1138
Replies: 25 comments 7 replies
-
@junwang-noaa connected me to this document (I hope it comes through). So consistent should be a stricter standard chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/viewer.html?pdfurl=https%3A%2F%2Fwww.intel.com%2Fcontent%2Fdam%2Fdevelop%2Fexternal%2Fus%2Fen%2Fdocuments%2Fpdf%2Ffp-consistency-121918.pdf&clen=502091 |
Beta Was this translation helpful? Give feedback.
-
Intel has some good documentation on the differences in fp-model
|
Beta Was this translation helpful? Give feedback.
-
Whatever options give us run to run reproducible outputs, reproducible outputs using different MPI decomposition, and reproducible outputs after model restart, in my opinion should be a default. If that's what current 'REPRO' mode is than that should be a default, something you get without setting any specific build option. If there are cases where users do not care about any of the above, then we can provide options that give ultimate performance but that must not be a default, it must be hidden behind an option, and it should be documented with big fat warning. Basically, reproducibility must not be a opt-in feature. It must be an opt-out. You need to explicitly say 'I do not care about reproducibility, give me faster code', by passing -Dsomething. And, again in my opinion, that 'something' should not be 'PROD', because as far as I know production runs are required to be reproducible. |
Beta Was this translation helpful? Give feedback.
-
we should do a timing run on prod vs repro mode to see what the differences are |
Beta Was this translation helpful? Give feedback.
-
Also based on the experiments being done by @RatkoVasic-NOAA (see #649 (comment) ) not all results are being fixed by this |
Beta Was this translation helpful? Give feedback.
-
We did that 2-3 years back when REPRO was introduced first (for b4b reproducibility between IPD and CCPP, testing MPI decomp reproducibility for regional runs wasn’t a question back then). At that time it was about 10% give or take, mainly due to the AVX2 flags (the others were negligible). But this was NEMSfv3gfs atm only back then, and nearly all the time was spent in the dycore for these ATM only runs.
… On Mar 23, 2022, at 1:08 PM, arun chawla ***@***.***> wrote:
we should do a timing run on prod vs repro mode to see what the differences are
—
Reply to this email directly, view it on GitHub <#1138 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RJ7CMNZIN3OX3LFK2LVBNTZ5ANCNFSM5ROXQSUA>.
You are receiving this because you are subscribed to this thread.
|
Beta Was this translation helpful? Give feedback.
-
Turns out GOCART is reproducible in REPRO mode and we do not have to switch from -fp-model consistent to -fp-model source, which is good because it was creating confusion as to why this was a problem. However it does raise a question on which of the flags in the PROD mode are really needed for speeding up the codes. Dom's suggestion is that only the AVX flag makes a big difference |
Beta Was this translation helpful? Give feedback.
-
To recap our opsreqtest does the following
We want our codes to satisfy these tests -- in conversation with NCO they have indicated they only want those compiler flags that satisfy the above tests. So we need to decide what compiler options give us that |
Beta Was this translation helpful? Give feedback.
-
From science reproducibility and testing point of view we should only focus on our testing on 2 sets of compiler settings -- one for debug and one standard that will expect reproducibility with the code Jun pointed to this presentation which provides a discussion The performance hit is about 10-20% |
Beta Was this translation helpful? Give feedback.
-
So what do we consider "safe compiler flags?" Right now in repro mode we use -g -traceback -fpp -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -nowarn -sox -align array64byte -qno-opt-dynamic-align -O2 -debug minimal -fp-model consistent -qoverride-limits Plus small changes for 32 bit , openmp etc |
Beta Was this translation helpful? Give feedback.
-
The additional flags that we use for performance are -qopt-prefetch=3 -no-prec-div -no-prec-sqrt plus the different AVX recipes These are what we use for performance. Perhaps we should just drop them in our testing and in operations ? |
Beta Was this translation helpful? Give feedback.
-
-no-prec-div -no-prec-sqrt
is only used in double precision (64bit), not in single precision (32bit). To my experience, they don’t do much to the performance.
Agressive prefetching can be a problem, and I didn’t see a huge impact of -qopt-prefetch=3 on the performance either.
The only ones that really matter for performance are AVX2 and the -fp-model choice, in my experience. Whether you need to remove SIMD (AVX2) entirely or substitute with core-avx-i or something like that needs to be seen.
Note that the PDF that you are pointing to is from 2012 and while generally valid, a lot happened on the compiler side since then (e.g. -fp-model consistent didn’t exist back then, therefore they used precise). Note also that they used -O3, which is more agressive than our -O2 - not sure how much that helps with performance and what it does with reproducibility in our case.
… On Mar 28, 2022, at 7:16 AM, arun chawla ***@***.***> wrote:
The additional flags that we use for performance are
-qopt-prefetch=3 -no-prec-div -no-prec-sqrt
plus the different AVX recipes
These are what we use for performance. Perhaps we should just drop them in our testing and in operations ?
—
Reply to this email directly, view it on GitHub <#1138 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RLO5R5VSOMR6SMB7T3VCGWJDANCNFSM5ROXQSUA>.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
I want to point out that the compile options such as -ftz or fp-model consistent do not guarantee the reproducibility. So in general, what is the approach for us to get reproducibility ( only using guaranteed reproducibility compile options or test-and-see approach) for ufs-weather-model? |
Beta Was this translation helpful? Give feedback.
-
Intel has some documentation as well https://www.intel.com/content/dam/develop/external/us/en/documents/pdf/fp-consistency-121918.pdf |
Beta Was this translation helpful? Give feedback.
-
OK, we need two things as Arun's email pointed out |
Beta Was this translation helpful? Give feedback.
-
For now none of options reproduce regional hi-res case (LAM parallel). Only low-resolution case reproduces: |
Beta Was this translation helpful? Give feedback.
-
Better component-level testing would reduce the importance of these whole-system tests. |
Beta Was this translation helpful? Give feedback.
-
Hi, we would like to finalize this discussion this week. We would like to replace PROD and REPRO with just one set of flags, for lack of better word lets call it STD. So far we have the the REPRO flags to which we can add AVX. Do we need anymore ? |
Beta Was this translation helpful? Give feedback.
-
I agree with Dusan.
… On Mar 30, 2022, at 12:36 PM, Dusan Jovic ***@***.***> wrote:
Why not just 'Release' and 'Debug' which are standard cmake build types, as @aerorahul suggested.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
Me to. Then we don’t need to translate the PROD and DEBUG flags into Release and Debug for CCPP anymore. Just adjust the CCPP Release type and remove the Bitforbit type.
… On Mar 30, 2022, at 11:14 AM, JamesAbeles-NOAA ***@***.***> wrote:
I agree with Dusan.
> On Mar 30, 2022, at 12:36 PM, Dusan Jovic ***@***.***> wrote:
>
>
> Why not just 'Release' and 'Debug' which are standard cmake build types, as @aerorahul suggested.
>
> —
> Reply to this email directly, view it on GitHub, or unsubscribe.
> You are receiving this because you commented.
—
Reply to this email directly, view it on GitHub <#1138 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RN7ZAT2BS6Y3UJUVHTVCSDVVANCNFSM5ROXQSUA>.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
Trying to put a pin in this discussion. So we are agreeing on
Does this sound right ? |
Beta Was this translation helpful? Give feedback.
-
@BenjaminBlake-NOAA just did test on LAM parallel using PROD vs. REPRO. PROD was about 12% faster.
Not sure how each option contribute to the speedup/slowdown, but if necessary, we could do the tests. |
Beta Was this translation helpful? Give feedback.
-
I do not think -qopt-prefetch=3 does anything except to add compile time. I
suspect the main thing will be use of avx2.
…On Tue, Apr 5, 2022 at 1:05 PM RatkoVasic-NOAA ***@***.***> wrote:
@BenjaminBlake-NOAA <https://github.com/BenjaminBlake-NOAA> just did test
on LAM parallel using PROD vs. REPRO. PROD was about 12% faster.
Differences in compilation:
PROD: -fp-model source -qopt-prefetch=3 -march=core-avx2
REPRO: -fp-model consistent
Not sure how each option contribute to the speedup/slowdown, but if
necessary, we could do the tests.
—
Reply to this email directly, view it on GitHub
<#1138 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALYJU7ZDHPPCUEGELWWANI3VDRXD7ANCNFSM5ROXQSUA>
.
You are receiving this because you commented.Message ID:
<ufs-community/ufs-weather-model/repo-discussions/1138/comments/2509853@
github.com>
--
Jim Abeles
*IMSG* at NOAA/NWS/NCEP/EMC
301 879-3283
|
Beta Was this translation helpful? Give feedback.
-
Based on discussions this is the final compile settings that we will use in our tests
|
Beta Was this translation helpful? Give feedback.
-
See PR #1171 |
Beta Was this translation helpful? Give feedback.
-
We have three compiler options -- debug, repro and prod
prod mode settings not everything is reproducing. Repro mode is supposed to test for reproducibility with some level of optimization. In repro mode setting we have an option -fp-model consistent. This is not reproducing results but with -fp-model source the results are reproducing.
Should we make this change ? Also should the model only be run in repro mode since that is the mode that guarantees reproducibility ?
Beta Was this translation helpful? Give feedback.
All reactions