Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMIx Fence: single-job wildcard barrier #70

Open
artpol84 opened this issue Feb 10, 2021 · 2 comments
Open

PMIx Fence: single-job wildcard barrier #70

artpol84 opened this issue Feb 10, 2021 · 2 comments
Assignees
Labels
Unit Test Spec Unit Test Specification

Comments

@artpol84
Copy link

artpol84 commented Feb 10, 2021

Test description

Verifies that the Fence is synchronizing

Test sketch

#include "pmix.h"
double max_fence_time()
{
	double fence_time = 0;
	int i;
	
	/* Measure the typical fence execution time */
	for(i = 0; i < 100; i++) {
		ts1 = timestamp();
		PMIx_Fence(without_data_collection);
		ts2 = timestamp();
		fence_time = max(fence_time, ts2 - ts1);
	}
	return fence_time;
}

int main() 
{
    double timeout, fence_time;
	
    PMIx_Init();
	
    fence_time = max_fence_time();
    T = Ratio * fence_time; // Ratio might be 100, should be selected for the particular system

    PMIx_Fence(without_data_collection);

    if( rank == 0){
        sleep(T);
    }
    ts1 = timestamp();
    PMIx_Fence(without_data_collection);
    ts2 = timestamp();
    if( rank == 0 ){
        assert((t2 - t1) ~ fence_time);
    } else {
        assert((t2 - t1) ~ T);
    }
    PMIx_Finalize();
}

Execution details

  • 4 servers
  • 16 clients
  • Predefined (passed through cmdline) namespace
  • Predefined process placement: "0:0,1,2,3; 1:4,5,6,7; 2:8,9,10,11; 3:12,13,14,15;"
  • Ratio and "~" are selected to match the system
    • The time-dependant checks can be turned off
  • Execute M times to capture race conditions
  • The first rank is simulating the delay. The test verifies that the Fence is really synchronizing;

Client-side expectations:

  1. All PMIx calls return PMIX_SUCCESS
  2. All ranks (except rank=0) observe > T of Fence execution.

Server-side expectations:

  1. N invocations of:
  • client_connected
  • client_finalized
  1. Verify, that proc structure was set to the individual ranks.
  2. 2 Fence callback invocation with WILDCARD.
  3. Distance between Fence's on node0 is > T
  4. Starting from "modex: avoid exchange unnecessary buffer when collect flag is not set openpmix#1135" the size of Fence should be 0B.
  5. No other callbacks are called (no direct modex requests)
  6. (? Any event-related activity?)

Reference implementation:

TBD

@artpol84 artpol84 added the Unit Test Spec Unit Test Specification label Feb 10, 2021
@jjhursey
Copy link
Member

👍 This looks good to me.

@cpshereda
Copy link
Contributor

See openpmix/openpmix#2085.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Unit Test Spec Unit Test Specification
Projects
None yet
Development

No branches or pull requests

3 participants