Flags in compiler settings #1138

arunchawla-NOAA · 2022-03-23T18:07:46Z

arunchawla-NOAA
Mar 23, 2022
Maintainer

We have three compiler options -- debug, repro and prod

prod mode settings not everything is reproducing. Repro mode is supposed to test for reproducibility with some level of optimization. In repro mode setting we have an option -fp-model consistent. This is not reproducing results but with -fp-model source the results are reproducing.

Should we make this change ? Also should the model only be run in repro mode since that is the mode that guarantees reproducibility ?

arunchawla-NOAA · 2022-03-23T18:22:49Z

arunchawla-NOAA
Mar 23, 2022
Maintainer Author

@junwang-noaa connected me to this document (I hope it comes through). So consistent should be a stricter standard

chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/viewer.html?pdfurl=https%3A%2F%2Fwww.intel.com%2Fcontent%2Fdam%2Fdevelop%2Fexternal%2Fus%2Fen%2Fdocuments%2Fpdf%2Ffp-consistency-121918.pdf&clen=502091

0 replies

kgerheiser · 2022-03-23T18:22:59Z

kgerheiser
Mar 23, 2022

Intel has some good documentation on the differences in fp-model

-fp-model source also enables -fp-model precise, which may be where the difference in reproducibility is coming from. source is supposed to be a good tradeoff between performance and reproducibility. You can also combine -fp-model consistent -fp-model precise, if you want to keep consistent.

0 replies

DusanJovic-NOAA · 2022-03-23T18:48:15Z

DusanJovic-NOAA
Mar 23, 2022
Maintainer

Whatever options give us run to run reproducible outputs, reproducible outputs using different MPI decomposition, and reproducible outputs after model restart, in my opinion should be a default. If that's what current 'REPRO' mode is than that should be a default, something you get without setting any specific build option.

If there are cases where users do not care about any of the above, then we can provide options that give ultimate performance but that must not be a default, it must be hidden behind an option, and it should be documented with big fat warning.

Basically, reproducibility must not be a opt-in feature. It must be an opt-out. You need to explicitly say 'I do not care about reproducibility, give me faster code', by passing -Dsomething. And, again in my opinion, that 'something' should not be 'PROD', because as far as I know production runs are required to be reproducible.

0 replies

arunchawla-NOAA · 2022-03-23T19:08:02Z

arunchawla-NOAA
Mar 23, 2022
Maintainer Author

we should do a timing run on prod vs repro mode to see what the differences are

0 replies

arunchawla-NOAA · 2022-03-23T19:09:55Z

arunchawla-NOAA
Mar 23, 2022
Maintainer Author

Also based on the experiments being done by @RatkoVasic-NOAA (see #649 (comment) ) not all results are being fixed by this

0 replies

climbfuji · 2022-03-23T19:39:24Z

climbfuji
Mar 23, 2022
Maintainer

We did that 2-3 years back when REPRO was introduced first (for b4b reproducibility between IPD and CCPP, testing MPI decomp reproducibility for regional runs wasn’t a question back then). At that time it was about 10% give or take, mainly due to the AVX2 flags (the others were negligible). But this was NEMSfv3gfs atm only back then, and nearly all the time was spent in the dycore for these ATM only runs.

…

On Mar 23, 2022, at 1:08 PM, arun chawla ***@***.***> wrote: we should do a timing run on prod vs repro mode to see what the differences are — Reply to this email directly, view it on GitHub <#1138 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RJ7CMNZIN3OX3LFK2LVBNTZ5ANCNFSM5ROXQSUA>. You are receiving this because you are subscribed to this thread.

0 replies

arunchawla-NOAA · 2022-03-24T13:19:26Z

arunchawla-NOAA
Mar 24, 2022
Maintainer Author

Turns out GOCART is reproducible in REPRO mode and we do not have to switch from -fp-model consistent to -fp-model source, which is good because it was creating confusion as to why this was a problem. However it does raise a question on which of the flags in the PROD mode are really needed for speeding up the codes. Dom's suggestion is that only the AVX flag makes a big difference

0 replies

arunchawla-NOAA · 2022-03-25T16:01:37Z

arunchawla-NOAA
Mar 25, 2022
Maintainer Author

To recap our opsreqtest does the following

compile with debug and run to completion
compile with single and double precision and run to completion
compile and test reproducibility with following options -- change threading, change mpi count, change domain decomposition, and restart from a restart file

We want our codes to satisfy these tests -- in conversation with NCO they have indicated they only want those compiler flags that satisfy the above tests.

So we need to decide what compiler options give us that

0 replies

arunchawla-NOAA · 2022-03-28T13:09:16Z

arunchawla-NOAA
Mar 28, 2022
Maintainer Author

From science reproducibility and testing point of view we should only focus on our testing on 2 sets of compiler settings -- one for debug and one standard that will expect reproducibility with the code

Jun pointed to this presentation which provides a discussion

https://www.ecmwf.int/sites/default/files/elibrary/2012/14026-optimisation-weather-[…]ications-power-and-x86-architectures-focus-reproducibility.pdf

The performance hit is about 10-20%

0 replies

arunchawla-NOAA · 2022-03-28T13:13:16Z

arunchawla-NOAA
Mar 28, 2022
Maintainer Author

So what do we consider "safe compiler flags?"

Right now in repro mode we use

-g -traceback -fpp -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -nowarn -sox -align array64byte -qno-opt-dynamic-align -O2 -debug minimal -fp-model consistent -qoverride-limits

Plus small changes for 32 bit , openmp etc

0 replies

arunchawla-NOAA · 2022-03-28T13:15:51Z

arunchawla-NOAA
Mar 28, 2022
Maintainer Author

The additional flags that we use for performance are

-qopt-prefetch=3 -no-prec-div -no-prec-sqrt

plus the different AVX recipes

These are what we use for performance. Perhaps we should just drop them in our testing and in operations ?

0 replies

climbfuji · 2022-03-28T13:24:18Z

climbfuji
Mar 28, 2022
Maintainer

-no-prec-div -no-prec-sqrt

is only used in double precision (64bit), not in single precision (32bit). To my experience, they don’t do much to the performance. Agressive prefetching can be a problem, and I didn’t see a huge impact of -qopt-prefetch=3 on the performance either. The only ones that really matter for performance are AVX2 and the -fp-model choice, in my experience. Whether you need to remove SIMD (AVX2) entirely or substitute with core-avx-i or something like that needs to be seen. Note that the PDF that you are pointing to is from 2012 and while generally valid, a lot happened on the compiler side since then (e.g. -fp-model consistent didn’t exist back then, therefore they used precise). Note also that they used -O3, which is more agressive than our -O2 - not sure how much that helps with performance and what it does with reproducibility in our case.

…

On Mar 28, 2022, at 7:16 AM, arun chawla ***@***.***> wrote: The additional flags that we use for performance are -qopt-prefetch=3 -no-prec-div -no-prec-sqrt plus the different AVX recipes These are what we use for performance. Perhaps we should just drop them in our testing and in operations ? — Reply to this email directly, view it on GitHub <#1138 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RLO5R5VSOMR6SMB7T3VCGWJDANCNFSM5ROXQSUA>. You are receiving this because you commented.

0 replies

junwang-noaa · 2022-03-28T13:36:40Z

junwang-noaa
Mar 28, 2022
Maintainer

I want to point out that the compile options such as -ftz or fp-model consistent do not guarantee the reproducibility. So in general, what is the approach for us to get reproducibility ( only using guaranteed reproducibility compile options or test-and-see approach) for ufs-weather-model?

1 reply

arunchawla-NOAA Mar 28, 2022
Maintainer Author

I am guessing yes, as these area is always changing. But happy to hear other suggestions. One thing we need to be careful about is not choosing the tests that match our codes as opposed to cleaning our codes

arunchawla-NOAA · 2022-03-28T13:43:37Z

arunchawla-NOAA
Mar 28, 2022
Maintainer Author

Intel has some documentation as well

https://www.intel.com/content/dam/develop/external/us/en/documents/pdf/fp-consistency-121918.pdf

0 replies

JamesAbeles-NOAA · 2022-03-28T14:01:28Z

JamesAbeles-NOAA
Mar 28, 2022

OK, we need two things as Arun's email pointed out
1 - consistent, reproducible answers
2 - a configuration to meet the operational runtime requirement
We should focus on 1 first to determine which flags will give reproducible results for the criteria described in earlier comments. Then we can see what configuration can meet the runtime. If we find the runtime is too slow meeting (1) above, then we need to play with flags to see if we can get better runtime while still meeting the requirement

0 replies

RatkoVasic-NOAA · 2022-03-28T14:06:04Z

RatkoVasic-NOAA
Mar 28, 2022
Collaborator

For now none of options reproduce regional hi-res case (LAM parallel). Only low-resolution case reproduces:
https://docs.google.com/spreadsheets/d/1RUK063Ml3ES2-wXhk-WymJpf4K1soLmOSpmf7NvTjr8/edit?usp=sharing

1 reply

arunchawla-NOAA Apr 1, 2022
Maintainer Author

Ratko sorry I did not reply to this. Once we have chosen the final flags for release mode we will want reproducibility so this issue will need to get addressed for HAFS as well as regional

edwardhartnett · 2022-03-28T15:47:27Z

edwardhartnett
Mar 28, 2022

Better component-level testing would reduce the importance of these whole-system tests.

0 replies

arunchawla-NOAA · 2022-03-30T16:13:09Z

arunchawla-NOAA
Mar 30, 2022
Maintainer Author

Hi, we would like to finalize this discussion this week. We would like to replace PROD and REPRO with just one set of flags, for lack of better word lets call it STD.

So far we have the the REPRO flags to which we can add AVX. Do we need anymore ?

2 replies

DusanJovic-NOAA Mar 30, 2022
Maintainer

Why not just 'Release' and 'Debug' which are standard cmake build types, as @aerorahul suggested.

arunchawla-NOAA Mar 30, 2022
Maintainer Author

Release works, whatever is the standard nomenclature. No need to invent new ones

JamesAbeles-NOAA · 2022-03-30T17:13:50Z

JamesAbeles-NOAA
Mar 30, 2022

I agree with Dusan.

…

On Mar 30, 2022, at 12:36 PM, Dusan Jovic ***@***.***> wrote: Why not just 'Release' and 'Debug' which are standard cmake build types, as @aerorahul suggested. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

0 replies

climbfuji · 2022-03-30T18:28:41Z

climbfuji
Mar 30, 2022
Maintainer

Me to. Then we don’t need to translate the PROD and DEBUG flags into Release and Debug for CCPP anymore. Just adjust the CCPP Release type and remove the Bitforbit type.

…

On Mar 30, 2022, at 11:14 AM, JamesAbeles-NOAA ***@***.***> wrote: I agree with Dusan. > On Mar 30, 2022, at 12:36 PM, Dusan Jovic ***@***.***> wrote: > > > Why not just 'Release' and 'Debug' which are standard cmake build types, as @aerorahul suggested. > > — > Reply to this email directly, view it on GitHub, or unsubscribe. > You are receiving this because you commented. — Reply to this email directly, view it on GitHub <#1138 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RN7ZAT2BS6Y3UJUVHTVCSDVVANCNFSM5ROXQSUA>. You are receiving this because you commented.

0 replies

arunchawla-NOAA · 2022-04-01T19:53:53Z

arunchawla-NOAA
Apr 1, 2022
Maintainer Author

Trying to put a pin in this discussion. So we are agreeing on

remove the repro mode
replace the name PROD with RELEASE
Get rid of -qopt-prefetch=3 -no-prec-div -no-prec-sqrt

Does this sound right ?

2 replies

JessicaMeixner-NOAA Apr 1, 2022
Collaborator

I think so. I think we also have to use -fp-model consistent instead of -fp-model source

arunchawla-NOAA Apr 1, 2022
Maintainer Author

that is correct. Huh, we use -fp-model consistent in REPRO but not in PROD mode.

RatkoVasic-NOAA · 2022-04-05T17:04:51Z

RatkoVasic-NOAA
Apr 5, 2022
Collaborator

@BenjaminBlake-NOAA just did test on LAM parallel using PROD vs. REPRO. PROD was about 12% faster.
Differences in compilation:

PROD:  -fp-model source     -qopt-prefetch=3 -march=core-avx2
REPRO: -fp-model consistent

Not sure how each option contribute to the speedup/slowdown, but if necessary, we could do the tests.

0 replies

JamesAbeles-NOAA · 2022-04-05T17:50:14Z

JamesAbeles-NOAA
Apr 5, 2022

I do not think -qopt-prefetch=3 does anything except to add compile time. I suspect the main thing will be use of avx2.

…

On Tue, Apr 5, 2022 at 1:05 PM RatkoVasic-NOAA ***@***.***> wrote: @BenjaminBlake-NOAA <https://github.com/BenjaminBlake-NOAA> just did test on LAM parallel using PROD vs. REPRO. PROD was about 12% faster. Differences in compilation: PROD: -fp-model source -qopt-prefetch=3 -march=core-avx2 REPRO: -fp-model consistent Not sure how each option contribute to the speedup/slowdown, but if necessary, we could do the tests. — Reply to this email directly, view it on GitHub <#1138 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALYJU7ZDHPPCUEGELWWANI3VDRXD7ANCNFSM5ROXQSUA> . You are receiving this because you commented.Message ID: <ufs-community/ufs-weather-model/repo-discussions/1138/comments/2509853@ github.com>

-- Jim Abeles *IMSG* at NOAA/NWS/NCEP/EMC 301 879-3283

1 reply

climbfuji Apr 6, 2022
Maintainer

Yes, this confirms our earlier testing (2 years ago) that avx2 speeds the code up by 10-15%.

arunchawla-NOAA · 2022-04-06T14:28:34Z

arunchawla-NOAA
Apr 6, 2022
Maintainer Author

Based on discussions this is the final compile settings that we will use in our tests

remove the repro mode
replace the name PROD with RELEASE
update the flags for RELEASE to
a. Get rid of -qopt-prefetch=3 -no-prec-div -no-prec-sqrt options
b. replace -fp-model source with -fp-model consistent

0 replies

DusanJovic-NOAA · 2022-04-12T17:06:53Z

DusanJovic-NOAA
Apr 12, 2022
Maintainer

See PR #1171

0 replies

Flags in compiler settings #1138

arunchawla-NOAA Mar 23, 2022 Maintainer

Replies: 25 comments · 7 replies

arunchawla-NOAA Mar 23, 2022 Maintainer Author

kgerheiser Mar 23, 2022

DusanJovic-NOAA Mar 23, 2022 Maintainer

arunchawla-NOAA Mar 23, 2022 Maintainer Author

arunchawla-NOAA Mar 23, 2022 Maintainer Author

climbfuji Mar 23, 2022 Maintainer

arunchawla-NOAA Mar 24, 2022 Maintainer Author

arunchawla-NOAA Mar 25, 2022 Maintainer Author

arunchawla-NOAA Mar 28, 2022 Maintainer Author

arunchawla-NOAA Mar 28, 2022 Maintainer Author

arunchawla-NOAA Mar 28, 2022 Maintainer Author

climbfuji Mar 28, 2022 Maintainer

junwang-noaa Mar 28, 2022 Maintainer

arunchawla-NOAA Mar 28, 2022 Maintainer Author

arunchawla-NOAA Mar 28, 2022 Maintainer Author

JamesAbeles-NOAA Mar 28, 2022

RatkoVasic-NOAA Mar 28, 2022 Collaborator

arunchawla-NOAA Apr 1, 2022 Maintainer Author

edwardhartnett Mar 28, 2022

arunchawla-NOAA Mar 30, 2022 Maintainer Author

DusanJovic-NOAA Mar 30, 2022 Maintainer

arunchawla-NOAA Mar 30, 2022 Maintainer Author

JamesAbeles-NOAA Mar 30, 2022

climbfuji Mar 30, 2022 Maintainer

arunchawla-NOAA Apr 1, 2022 Maintainer Author

JessicaMeixner-NOAA Apr 1, 2022 Collaborator

arunchawla-NOAA Apr 1, 2022 Maintainer Author

RatkoVasic-NOAA Apr 5, 2022 Collaborator

JamesAbeles-NOAA Apr 5, 2022

climbfuji Apr 6, 2022 Maintainer

arunchawla-NOAA Apr 6, 2022 Maintainer Author

DusanJovic-NOAA Apr 12, 2022 Maintainer

arunchawla-NOAA
Mar 23, 2022
Maintainer

Replies: 25 comments 7 replies

arunchawla-NOAA
Mar 23, 2022
Maintainer Author

kgerheiser
Mar 23, 2022

DusanJovic-NOAA
Mar 23, 2022
Maintainer

arunchawla-NOAA
Mar 23, 2022
Maintainer Author

arunchawla-NOAA
Mar 23, 2022
Maintainer Author

climbfuji
Mar 23, 2022
Maintainer

arunchawla-NOAA
Mar 24, 2022
Maintainer Author

arunchawla-NOAA
Mar 25, 2022
Maintainer Author

arunchawla-NOAA
Mar 28, 2022
Maintainer Author

arunchawla-NOAA
Mar 28, 2022
Maintainer Author

arunchawla-NOAA
Mar 28, 2022
Maintainer Author

climbfuji
Mar 28, 2022
Maintainer

junwang-noaa
Mar 28, 2022
Maintainer

arunchawla-NOAA Mar 28, 2022
Maintainer Author

arunchawla-NOAA
Mar 28, 2022
Maintainer Author

JamesAbeles-NOAA
Mar 28, 2022

RatkoVasic-NOAA
Mar 28, 2022
Collaborator

arunchawla-NOAA Apr 1, 2022
Maintainer Author

edwardhartnett
Mar 28, 2022

arunchawla-NOAA
Mar 30, 2022
Maintainer Author

DusanJovic-NOAA Mar 30, 2022
Maintainer

arunchawla-NOAA Mar 30, 2022
Maintainer Author

JamesAbeles-NOAA
Mar 30, 2022

climbfuji
Mar 30, 2022
Maintainer

arunchawla-NOAA
Apr 1, 2022
Maintainer Author

JessicaMeixner-NOAA Apr 1, 2022
Collaborator

arunchawla-NOAA Apr 1, 2022
Maintainer Author

RatkoVasic-NOAA
Apr 5, 2022
Collaborator

JamesAbeles-NOAA
Apr 5, 2022

climbfuji Apr 6, 2022
Maintainer

arunchawla-NOAA
Apr 6, 2022
Maintainer Author

DusanJovic-NOAA
Apr 12, 2022
Maintainer