Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dstore: keeping job info in the dstor #217

Merged
merged 2 commits into from
Nov 29, 2016

Conversation

karasevb
Copy link
Contributor

Reduction of memory usage on pmix-client's side by to not duplicate
the job data for each client's:

  • Job info will not be sent to clients when connecting to the server.
  • The server will provide the job information into dstore.
  • The client will use the dstore for access to job info.

Fixes #144

@karasevb
Copy link
Contributor Author

Need to fix:
The server is not shared the key pmix.mblob into dstor. While use without dstor this keys stored into client's global data. For provide their in dstor need will parse the pmix.mblob per ranks, and share to dstor by keys for each ranks. It's will be submited in another commit.

This is the reason of failing the Mellanox jenkins on test ./pmix_test --test-resolve-peers. But OMPI simple tests work.

@jjhursey jjhursey added this to the v1.2.0 milestone Nov 14, 2016
Reduction of memory usage on pmix-client's side by to not duplicate
the job data for each client's:
- Job info will not be sent to clients when connecting to the server.
- The server will provide the job information into `dstore`.
- The client will use the `dstore` for access to job info.
@karasevb karasevb force-pushed the dstore_job_info branch 2 times, most recently from d22abbe to 3761a90 Compare November 15, 2016 10:39
@karasevb
Copy link
Contributor Author

bot:retest

@artpol84
Copy link
Contributor

@jjhursey we are testing this internally but as you can see smoking tests are passed so we can test in parallel.
@karasevb is working on the port to v1.2. May take some time since this change touches dstore-external code parts that differ significantly between master and v1.2

Provide the information about process location into `dstor`
@hppritcha
Copy link
Contributor

@karasevb do you have an estimate on when this will be merged in?

@artpol84
Copy link
Contributor

@hppritcha
As I stated above we have some tests left but as for now it works well for us. What is left - is a memory consumption analysis that we want to perform this week.

What is really important - we need others to try this before merging since it's a reasonable change.
In particular we are waiting to hear from @jjhursey and if I understand correctly @hjelmn was about to try it as well.

Once we will have 👍 from others - we can merge.

@karasevb
Copy link
Contributor Author

We have a good results at init stage with use dstor for job info.
There are results for 16ppn, obtained by the pmix perf tool:

PMIx             | init time   | total time
w/o dstor        | 0.00190247  | 0.17449243
w dstor          | 0.00420717  | 0.15142844
dstor + job info | 0.00067727  | 0.15024186

Also, we need measure the memory footprint.

@artpol84
Copy link
Contributor

Here is the memory consumption data (in KB) collected on 8 nodes, 16 ppn for the updated perf tool (PR #223). Here PSS (the process's proportional share of this mapping) was collected.
Note that "rem" corresponds to "average" amongst all the processes on all of the nodes.
Also note that perf tool is submitting 10 keys of 400 bytes (4KB) per each proc which is greater than real process do. So the important thing to see is the difference between "w dstor" and "dstor + jobinfo". If we vary number of keys and their size this difference (200+KB/proc) remains the same.

image

@artpol84
Copy link
Contributor

@hppritcha @jjhursey @rhc54 @hjelmn

We removed the "In-progress" label and done all of our testing. We need updates from your side so we can merge this.

@artpol84
Copy link
Contributor

@karasevb could you please repeat the measurements with the key count = 0? Just out of curiosity.

@jjhursey
Copy link
Member

I'm planning to run some of these performance tests on some of our systems today.

@artpol84
Copy link
Contributor

@jjhursey Note, that we haven't done anything in PPC optimization direction yet. But I'm still curious to see the results.
Also, please use the latest version of out perf tool to capture the memory usage. I'm gonna merge #223 now so you'll be able to get it. We will fix output (rename "rem" to "avg" later).

@jjhursey
Copy link
Member

Per today's call:

@rhc54
Copy link
Contributor

rhc54 commented Nov 29, 2016

Passed MTT when installed in OMPI (dstore enabled by default):

+-------------+-----------------+-------------+----------+------+------+----------+------+--------------------------------------------------------------------------+
| Phase       | Section         | MPI Version | Duration | Pass | Fail | Time out | Skip | Detailed report                                                          |
+-------------+-----------------+-------------+----------+------+------+----------+------+--------------------------------------------------------------------------+
| MPI Install | my installation | 3.0.0a1     | 00:02    | 1    |      |          |      | MPI_Install-my_installation-my_installation-3.0.0a1-my_installation.html |
| Test Build  | trivial         | 3.0.0a1     | 00:02    | 1    |      |          |      | Test_Build-trivial-my_installation-3.0.0a1-my_installation.html          |
| Test Build  | ibm             | 3.0.0a1     | 00:47    | 1    |      |          |      | Test_Build-ibm-my_installation-3.0.0a1-my_installation.html              |
| Test Build  | intel           | 3.0.0a1     | 01:32    | 1    |      |          |      | Test_Build-intel-my_installation-3.0.0a1-my_installation.html            |
| Test Build  | java            | 3.0.0a1     | 00:02    | 1    |      |          |      | Test_Build-java-my_installation-3.0.0a1-my_installation.html             |
| Test Build  | orte            | 3.0.0a1     | 00:01    | 1    |      |          |      | Test_Build-orte-my_installation-3.0.0a1-my_installation.html             |
| Test Run    | trivial         | 3.0.0a1     | 00:05    | 8    |      |          |      | Test_Run-trivial-my_installation-3.0.0a1-my_installation.html            |
| Test Run    | ibm             | 3.0.0a1     | 13:45    | 485  |      | 4        |      | Test_Run-ibm-my_installation-3.0.0a1-my_installation.html                |
| Test Run    | spawn           | 3.0.0a1     | 00:08    | 7    |      |          |      | Test_Run-spawn-my_installation-3.0.0a1-my_installation.html              |
| Test Run    | loopspawn       | 3.0.0a1     | 02:41    |      | 1    |          |      | Test_Run-loopspawn-my_installation-3.0.0a1-my_installation.html          |
| Test Run    | intel           | 3.0.0a1     | 29:02    | 464  |      | 14       |      | Test_Run-intel-my_installation-3.0.0a1-my_installation.html              |
| Test Run    | intel_skip      | 3.0.0a1     | 01:04:24 | 421  |      | 37       | 20   | Test_Run-intel_skip-my_installation-3.0.0a1-my_installation.html         |
| Test Run    | java            | 3.0.0a1     | 00:14    |      |      | 1        |      | Test_Run-java-my_installation-3.0.0a1-my_installation.html               |
| Test Run    | orte            | 3.0.0a1     | 11:17    | 10   |      | 9        |      | Test_Run-orte-my_installation-3.0.0a1-my_installation.html               |
+-------------+-----------------+-------------+----------+------+------+----------+------+--------------------------------------------------------------------------+

@rhc54 rhc54 merged commit 32deba5 into openpmix:master Nov 29, 2016
karasevb added a commit to karasevb/pmix that referenced this pull request Nov 30, 2016
Corresponds to PR openpmix#217

(cherry-picked dcb6967)
(cherry-picked 32c93e2)
karasevb added a commit to karasevb/pmix that referenced this pull request Nov 30, 2016
Corresponds to PR openpmix#217

(cherry-picked from dcb6967)
(cherry-picked from 32c93e2)
karasevb added a commit to karasevb/pmix that referenced this pull request Dec 2, 2016
Corresponds to PR openpmix#217

(cherry-picked from dcb6967)
(cherry-picked from 32c93e2)
karasevb added a commit to karasevb/pmix that referenced this pull request Dec 4, 2016
Corresponds to PR openpmix#217

(cherry-picked from dcb6967)
(cherry-picked from 32c93e2)
@karasevb karasevb deleted the dstore_job_info branch December 29, 2016 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants