-
Notifications
You must be signed in to change notification settings - Fork 117
Meeting 2019 12 12
Josh Hursey edited this page Dec 12, 2019
·
1 revision
- PMIx v3.1.5
- Pick up a number of fixes including memory
-
https://github.com/openpmix/openpmix/pull/1554
- Artem has concerns if this was completely correct
- The resource manager is responsible for deregister_namespace that it actually cleans up the namespace. RM responsible for tracking 'connected' and deregistering when appropriately disconnected in the session.
- The PMIx server does not have internal security protections to
isolate two users allocations from each other. If the RM requires
this isolation then they need to create two PMIx server instances.
- It's the responsibility of the system to do this
- Tools support issue?
- Already in 3.1.4 so would be no worse in a 3.1.5
- No known problem at this time - but concern is that we need more exercise of this feature. Open MPI will exercise it, so maybe use that as a test vector.
- Possible Blockers?
- Double check the memory leak issue resolution (Dave/Artem to discuss the situation)
- Multiple init/finalize - would like to see this fixed if possible
- Bring in any tool related fixes that are ready.
- Would like to fully exercise the tools as much as possible for this release.
- Static 'get' -- is this targeted to v3.x or v4.x. Probably v4.x only.
- Josh to roll an rc1 this week - note that further fixes may be coming.
- Target release: Decide next week if we release before the end of the year or not.
- Multi-server make check
- Boris/Artem to take a look
- The make check CI test is failing cross-version
- We will keep it this way to help resolve the issue
- Note that it will fail the two job fence tests (
-s 2
tests)
- Recommend to add back
-s 1
in addition to-s 2
from this commit
- v3.2 rebranch
- Mellanox to check if they have a near term driver for this.
- Not critical - can wait on the dstore optimizations in v4.0 (1H'2020)
- For now we will hold v3.2 - if situation changes let us know
- v4.0.x progress
- Working through testing tool support (similar to discussion above for 3.1.5)
- Python bindings coming along.
- 2-3 months away from release
- Let Open MPI exercise it a bit more
- PMIx bugs in OMPI related to PMIx v3.x
- Open MPI reported issues (things like spawn) that need some attention
- https://github.com/openpmix/openpmix/issues/1256 checklist
- Standard clarification
- Persistence option for publishing data
- Persist 'app' and 'session' - should there be a 'job'?
- Persist 'app' - one app in job may die then data is not available
- Persist 'job' - once job dies then data goes away (should be default)
- Persist 'session' - once session goes away then data goes away
- Persist 'app' and 'session' - should there be a 'job'?
- Persistence option for publishing data
- PRRTE
- Restructuring - https://github.com/openpmix/prrte/pull/264
- Working on 1 node, currently crashing on 2 nodes
- Once that is working then we can commit the PR
- Once committed then add the PRRTE submodule to Open MPI issue
- Then we can iterate on the finer details
- Josh to work on getting scale CI testing infrastructure in place
- Need to develop some RTE tests
- Restructuring - https://github.com/openpmix/prrte/pull/264