-
Notifications
You must be signed in to change notification settings - Fork 0
optimizations added w.r.t. threading #1151
optimizations added w.r.t. threading #1151
Conversation
…on loops and removal of unnecessary omp barriers
…se of omp barriers
@mywoodstock Thanks for making this PR. The plan was to make two PRs -- the first that is bit-for-bit and the second that is not. Is this PR bit-for-bit and, if not, can it be separated into two PRs? Thanks again for this effort. Cheers, Todd. |
Hi Todd,
I tried to separate the two, but some of the changes were tightly integrated (dependent on each other). This PR has about 6-7 commits, I can try to break them down into two phases.
Thanks,
Abhinav
…On Nov 27, 2016, 18:23 -0800, Todd Ringler ***@***.***>, wrote:
@mywoodstock (https://github.com/mywoodstock) Thanks for making this PR. The plan was to make two PRs -- the first that is bit-for-bit and the second that is not. Is this PR bit-for-bit and, if not, can it be separated into two PRs? Thanks again for this effort. Cheers, Todd.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (#1151 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAq_5pTjRwRHdiz7y0QFAzyz8_xSyqgjks5rCjszgaJpZM4K9UUF).
|
Hey Todd,
Github is not letting me create multiple pull requests for the same branch
but different commits, unless I create a separate branch for each! Do you
have an idea how this can be done. Otherwise, for the 8 commits in the
original pull request, the following says if they are or not
bit-reproducible:
9d94dfe This is NOT bit-reproducible.
4a94b40 This is NOT bit-reproducible.
45f4ab7 This is NOT bit-reproducible.
90be6ac This is bit-reproducible.
24f3317 This is bit-reproducible.
ce47433 This is bit-reproducible.
ca7842c and
bc1ddc2 These are bit reproducible
(these two go together).
Thanks,
Abhinav
On Sun, Nov 27, 2016 at 6:27 PM, Abhinav Sarje <abhinav.sarje@gmail.com>
wrote:
… Hi Todd,
I tried to separate the two, but some of the changes were tightly
integrated (dependent on each other). This PR has about 6-7 commits, I can
try to break them down into two phases.
Thanks,
Abhinav
On Nov 27, 2016, 18:23 -0800, Todd Ringler ***@***.***>,
wrote:
@mywoodstock <https://github.com/mywoodstock> Thanks for making this PR.
The plan was to make two PRs -- the first that is bit-for-bit and the
second that is not. Is this PR bit-for-bit and, if not, can it be separated
into two PRs? Thanks again for this effort. Cheers, Todd.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1151 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq_5pTjRwRHdiz7y0QFAzyz8_xSyqgjks5rCjszgaJpZM4K9UUF>
.
|
@mywoodstock, thanks for separating and labeling the commits for b-f-b. The NOT bfb ones are first, so you can make a new local branch which from 45f4ab7 which is "NOT bit-reproducible". Push it to your fork and make a PR from it that includes "NOT bit-reproducible". Then this PR will list all of them, but be the "bit reproducible" ones once the first one is merged. Make a comment on both PRs that the other (45f4ab7) should be done first, and this one ( bc1ddc2) should be done second. |
@mywoodstock it would be helpful if you could post some results on performance improvements and difference in solution to this PR. You can post pdf or jpg, whatever you already have. |
@mywoodstock It would also be good to see the performance improvement for mpi-only for this PR, and your recommendation for the openMP versus mpi lay-out, at least on the machines and configurations that you tested. |
Running the standard test case:
That was compiled with gnu and DEBUG=true. Compiling with intel and debug will produce line numbers. |
I will try this again. I have been using Intel 17, with which it was fine.
…On Dec 7, 2016, 10:56 AM -0800, Mark Petersen ***@***.***>, wrote:
Running the standard test case: global_ocean/QU_240km/performance_test
cd /lustre/scratch3/turquoise/mpeterse/runs/t39d/ocean/global_ocean/QU_240km/performance_test/forward
wf293.localdomain> tail -n 18 log.0000.err
*********************************************************************************************************
INFO: The split explicit time integration is configured to use: 20 barotropic subcycles
*********************************************************************************************************
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x2B0C99BCF2F7
#1 0x2B0C99BCF8FE
#2 0x2B0C9B05A65F
#3 0x10336EA in __ocn_diagnostics_MOD_ocn_diagnostic_solve
#4 0x1042C2B in __ocn_init_routines_MOD_ocn_init_routines_block
#5 0x1000A5B in __ocn_forward_mode_MOD_ocn_forward_mode_init
#6 0x1150C00 in __ocn_core_MOD_ocn_core_init
#7 0x40B3F7 in __mpas_subdriver_MOD_mpas_init
#8 0x4085B8 in MAIN__ at mpas.F:0
That was compiled with gnu and DEBUG=true. Compiling with intel and debug will produce line numbers.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@mywoodstock (cc @toddringler) , I was able to run your branch on edison and was stopped by an error on your branch. It may be the same as the one posted above. I am comparing these commits:
This is where you branched off from (call it 'mpas-o')
I can run a successful test case with mpas-o, and the same one dies with your branch with a Nan on the first time step. I am compiling with intel with debug on. You can both look at my output, and try to reproduce my results. The error is where the Nan is caught, but I believe the actual error in the code is earlier when the Nan is actually computed. I am running in our standard configuration, with no openmp. To run this test, copy this directory to your own space:
Then link the executable
I ran it as follows. Interactive login: You can see the error messages in the log*err files. |
@mywoodstock Looking at the files that have been changed, I see 5 are in The immediate question is: If we committed those two types of files separately, would there be errors? That is, if we merged just the Once the later is merged, does it potentially require other cores (atmosphere, sea-ice, land-ice) to make any changes in their code in |
@mark-petersen Let me make sure that the |
@mark-petersen On another note, I forgot to mention earlier, but when I compile the code on Cori, I always get the following error, and then it compiles fine if I comment out the last argument in that routine call. Here is the error: mpas_ocn_vmix_cvmix.F(920): error #6627: This is an actual argument keyword name, and not a dummy argument name. [LENHANCED_DIFF] |
@mywoodstock I just tried to compile a fresh checkout of ocean/develop on cori and did not see the issue you mentioned here. My guess is that your version of cvmix is not up to date. We recently changed the git tag of the cvmix library we pull for the model and this new tag introduced the lenhanced_diff flag. From within MPAS if you go to src/core_ocean/cvmix and do git describe --tags,
is the current tag of cvmix for MPAS. Yet, I'm guessing for your testing, just commenting out the argument as you have done is just fine. |
@mywoodstock same question here as on #1164. What if I split the MPAS framework changes (files in directory The ocean changes could be tested and merged in right away by me. The framework changes require approval by others, so could be on a much longer timescale. Do you foresee any problems if ACME ran with only the changes in the ocean files for a while, and without the framework changes? |
This PR replaces #1151, and includes only bfb changes to framework. This includes: 1. Implementation of threading into the mpas reconstruct routine. 2. Changing MPI threading level from multiple to funneled. 3. Reorganization in buffer pack and unpack in halo exchanges to minimize use of barriers. 4. Implementation of threaded memory buffer initializations.
This PR is to merge in all the updates added to optimize threading implementation: