-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEM_Ld3.ne4_ne4.FC5AV1C-04P2 fails compare on cori-knl, but passes in DEBUG #1477
Comments
If you can't find a way around this with compiler flags, someone form the atmosphere group will have to look at it. |
OK. I'm trying again after some changes to verify. My test on edison has still not started. |
Hmm, perhaps what I'm trying is not ideal. After creating/running this test, I edit Macros.make to adjust compiler flags, then rebuild/rerun. However, since this test creates two executables and runs two runs (so it can compare) it is probably only rebuilding/rerunning one of them. Is there a suggested way to have a test such as this "try again" with edited Macros.make ? |
@ndkeen: After you change compiler flags, do you clean compile the case? |
Yes, I was doing -- there is also another entire case dir
I guess I need to also make the same change there and |
Ok. Yes, it seems like you have to do that as well. I recently got into a lot of issues (unexpected behavior) when I used For this particular problem, we just need to find out which flag in debug is making it to PASS the test. It may be optimization level as well. |
Just a quick update. To help diagnose this, I'm changing compiler flags in config_compiler.xml, then doing create_tests so I know for sure I'm testing what I think. I'm using intel/17.0.2.174 (i1702) which is current default. Also, all of these tests were on KNL. |
So, O2 always fails but the test passes with O0 and O1. It sounds like compiler is optimizing some code differently for each compile or there is a bug somewhere in the code. |
OK, there is a new-to-me setting for the fp-model flag called "consistent". So instead of
Sounds like it's what we want, right? I tried this and the test passed (both forcing 1 thread and without). I can try this setting for all acme_developer as well as a larger problem to see if there are any performance issues? acme_developer tests passed on cori-haswell and cori-knl. I haven't looked closely, and would need to run a larger problem to know better, but don't see any major perf differences. |
Thanks @ndkeen ! Its great that this flag solves this problem. I didn't know about this flag either. |
Note: I edited my previous comment with results of tests. Should we |
I do not anticipate it to make much difference performance wise but it is always a good thing to check to be sure. I think that's my only concern, otherwise it is a very neat solution to this problem. If there is no major performance penalty, I don't see any other reason for not going for this solution. |
OK, I ran a hires F compset -- same type of experiment as I was doing before when looking at KNL performance with F compsets. When using In |
additional fields needed for environment_variables additional environment attributes needed Test suite: hand tested Test baseline: Test namelist changes: Test status: bit for bit Fixes cdash on blues and melvin User interface changes?: Code review:
Reminding myself of this issue. I'm not sure what to do. I could: |
my vote is for a (b)-like option. Use -fp-model=consistent for testing, but not the default for production runs. I agree with your suggestion that BFB reproducibility is a critical feature to maintain, but should not be a requirement for all production runs. And then, as a low priority, work on (d) so that eventually the production runs can also pass this test. |
@ndk and @mt5555 , just browsing the performance data from
Lots of timers appear to be slower, but also
vs.
so halo exchanges, which should not be affected, directly, by '-fp-model=consistent '. Could be due to an increase in load imbalance preceding the halo exchange, due to larger computational cost, but the global statistics are
vs.
so even the minimum times are much slower. Not sure that I trust that '-fp-model=consistent ' is the only source of the performance difference in these two experiments. Have a question though ... another timer with a large difference is
vs.
However, "a:eus_2d_advec" calls
but I don't see the "eus_bexchV" timer in the timing data, so don't know how much of the performance difference is in bndry_exchangeV . Any idea why this timer is missing? |
I'm not familiar with this new flag, so not sure what it might affect. I ran the experiments again with my favorite June 1st master. This time it's only 4.7% slower.
|
I agree that (b) would be a fine solution until we find a better solution. I think (d) will be most time consuming to figure out but is also most desirable. Loosing BFB reproducibility is okay in production runs as long as we are sure that non-BFB nature is roundoff level noise (which I think is the case here) and not systematic. Sometimes non-BFB behavior makes it hard to debug an issue due to obvious issue of loosing reproducibility. |
Note that in the latest test the performance hit is less than 5% (for F case). How much of a performance hit is too much (for option (c))? |
I don't know the answer to that but I would be fine with a 5% loss for maintaining BFB, which I believe is very critical to maintain and helps a lot if something goes wrong. |
Quick minor update: after edison upgrade, I ran the 4 tests again (force-1-thread/normal and debug/opt) and they all passed. |
That's great! It might just be a compiler bug which is fixed in the newest version. |
Using most recent Intel compiler on cori-knl (18.0.0.128, which is the current NERSC default) the PEM tests pass. This is with no other change. |
Add |
I think I have a "solution" here. We use
Also of interest: I will make a PR and provide more detailed data from my experiments unless there is objection. |
… restrictive and allow more tests to pass. For Intel 17, instead of -fp-model source, use -fp-model consistent. For Intel 18, instead of -fp-model source, use -fp-model consistent -fimf-use-svml. Fixes #1477
… restrictive and allow more tests to pass. For Intel 17, instead of -fp-model source, use -fp-model consistent. For Intel 18, instead of -fp-model source, use -fp-model consistent -fimf-use-svml. Fixes #1477
… restrictive and allow more tests to pass. For Intel 17, instead of -fp-model source, use -fp-model consistent. For Intel 18, instead of -fp-model source, use -fp-model consistent -fimf-use-svml. Fixes #1477
… restrictive and allow more tests to pass. For Intel 17, instead of -fp-model source, use -fp-model consistent. For Intel 18, instead of -fp-model source, use -fp-model consistent -fimf-use-svml. Fixes #1477
… restrictive and allow more tests to pass. For Intel 17, instead of -fp-model source, use -fp-model consistent. For Intel 18, instead of -fp-model source, use -fp-model consistent -fimf-use-svml. Fixes #1477
… restrictive and allow more tests to pass. For Intel 17, instead of -fp-model source, use -fp-model consistent. For Intel 18, instead of -fp-model source, use -fp-model consistent -fimf-use-svml. Fixes #1477
On cori-knl, might have an issue with compiler flags.
To run: create_test PEM_Ld3.ne4_ne4.FC5AV1C-04P2 --machine=cori-knl
PEM_Ld3.ne4_ne4.FC5AV1C-04P2 fails
PEM_Ld3_PMx1.ne4_ne4.FC5AV1C-04P2 fails (force 1 thread)
PEM_D_Ld3.ne4_ne4.FC5AV1C-04P2 passes (debug)
PEM_D_Ld3_PMx1.ne4_ne4.FC5AV1C-04P2 passes (debug and force 1 thread)
Same thing happens on cori-haswell -- fail compare, but passed in DEBUG.
Tests on Edison have been passing.
The text was updated successfully, but these errors were encountered: