Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMIx/job info: rank positioning #64

Open
artpol84 opened this issue Jul 1, 2020 · 13 comments
Open

PMIx/job info: rank positioning #64

artpol84 opened this issue Jul 1, 2020 · 13 comments
Assignees
Labels
Unit Test Spec Unit Test Specification

Comments

@artpol84
Copy link

artpol84 commented Jul 1, 2020

Test description

Verification of the information about this ranks position within the job.

Client-side expectations

  • [side channel check] PMIx provides expected:
    • Using wildcard process:
      • PMIX_JOB_SIZE, PMIX_UNIV_SIZE
      • PMIX_LOCAL_RANK
    • Using this rank as process:
      • PMIX_LOCAL_RANK, PMIX_NODEID, PMIX_HOSTNAME
      • PMIX_LOCAL_PEERS

Server-side expectations

  • All clients connect and disconnect
  • All client processes are successfully terminated
  • No requests (i.e. direct modex or notifications) are observed.

Reference implementation

openpmix/openpmix#1831

Side notes

It was observed that PMIx_Get(PMIX_RANK) hangs. @cpshereda what is the status?

@artpol84 artpol84 added the Unit Test Spec Unit Test Specification label Jul 1, 2020
@artpol84 artpol84 changed the title PMIx/job info: PMIx/job info: rank positioning Jul 1, 2020
@artpol84
Copy link
Author

artpol84 commented Jul 1, 2020

@jjhursey @rhc54 please review the test scenario

@jjhursey
Copy link
Member

jjhursey commented Jul 8, 2020

I think this sounds good for job level information specific to its position in the system.

You may consider adding:

  • PMIX_LOCAL_PEERS/ PMIX_LOCAL_PROCS - maybe you are saving this for a different use case
  • PMIX_APP_RANK / PMIX_APP_SIZE
  • PMIX_LOCALLDR
  • PMIX_HOSTNAME

When queried for the calling process I would not expect them to require an upcall into the server. All of that should be in the job data.

@artpol84
Copy link
Author

artpol84 commented Jul 8, 2020

@jjhursey yeah, we should have them here.

With the following exception:

  • APP related stuff assumes there are multiple apps and we are not testing this with this test. 1 app assumed.
  • I'll need to refresh my memory on what PMIX_LOCALLDR to see if it is relevant here

We definitely want to cover local peers

@cpshereda
Copy link
Contributor

Note: targeting 4 servers to mimic node-related information.

@artpol84
Copy link
Author

artpol84 commented Jul 8, 2020

TODO: update the description with exact PMIX key names

@cpshereda
Copy link
Contributor

Exact PMIX key names in this test:
PMIX_JOB_SIZE
PMIX_LOCAL_SIZE
PMIX_LOCAL_RANK
PMIX_NODEID
PMIX_LOCAL_PEERS
PMIX_HOSTNAME

Not sure about PMIX_LOCALLDR.

PMIX_NODE_RANK removed since this is a single app test, and the result would be the same as PMIX_LOCAL_RANK. Can be added later for a multi-app test.

@cpshereda
Copy link
Contributor

See openpmix/openpmix#1831.

Revised down to 2 servers.

Added PMIX_UNIV_SIZE to list.

Partly as a note to self to confirm it is not by bug and then raise it elsewhere, PMIx_Get of PMIX_RANK hangs when run from the testing framework.

@cpshereda
Copy link
Contributor

cpshereda commented Oct 9, 2020

@artpol84 The Get for PMIX_RANK still hangs for me. Maybe I'm not calling it right? Not sure, here is the relevant snippet:

    // Code hangs when PMIx_Get of PMIX_RANK is enabled
    job_proc.rank = PMIX_RANK_UNDEF;
    PMIXT_CHECK(PMIx_Get(&job_proc, PMIX_RANK, NULL, 0, &val), params, v_params);

(Edited to remove the validate call after the PMIx_Get.)

@rhc54
Copy link
Contributor

rhc54 commented Oct 9, 2020

I haven't implemented it yet - this was not previously supported 😄

@cpshereda
Copy link
Contributor

@rhc54 Ok np, was just responding to @artpol84 's status request.

@artpol84
Copy link
Author

@jjhursey @rhc54 this one is also done.
Please ack and can be closed

@jjhursey
Copy link
Member

There is just one improvement that I would suggest making before closing.

The PMIx_Get(PMIX_RANK) should work now according to openpmix/openpmix#1814 so I would suggest adding back the code in the test case seen here.

@cpshereda
Copy link
Contributor

cpshereda commented Feb 11, 2021

@jjhursey @artpol84 Ok, I'll include that in the next test suite commit and update here when that is in a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Unit Test Spec Unit Test Specification
Projects
None yet
Development

No branches or pull requests

4 participants