-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed CUDAEnsemble #1073
Comments
Rough plan for the basic process: enum EnvelopeTag : int {
RequestJob=0,
AssignJob=1;
}; Rank 0MPI_Init(NULL, NULL);
int world_rank; // This var will branch behaviour
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
int finalize_size = 1;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Status status;
while(finalize_size < world_size) {
memset(&status, sizeof(MPI_Status), 0);
// Wait for job requests from any source, these have no data
MPI_Recv(
void* data, // nullptr
int count, // 0
MPI_Datatype datatype, // MPI_DATATYPE_NULL
int source, // MPI_ANY_SOURCE
int tag, // EnvelopeTag::RequestJob
MPI_Comm communicator, // MPI_COMM_WORLD
MPI_Status* status) // &status
// Select the next unassigned job
int next_job_index = next_run++;
// Respond to the sender with a job assignment
MPI_Send(
void* data, // &next_job_index
int count, / 1
MPI_Datatype datatype, // MPI_INT
int destination, // status.MPI_SOURCE
int tag, // EnvelopeTag::AssignJob
MPI_Comm communicator) // MPI_COMM_WORLD
if (next_job_index >= plans.size())
++finalize_size;
}
MPI_Finalize() Rank > 1MPI_Init(NULL, NULL);
int world_rank; // This var will branch behaviour
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int next_run = -1;
MPI_Status status;
while(next_run < plans.size()) {
memset(&status, sizeof(MPI_Status), 0);
// Send a job request to 0, these have no data
MPI_Send(
void* data, // nullptr
int count, / 0
MPI_Datatype datatype, // MPI_DATATYPE_NULL
int destination, // status.MPI_SOURCE
int tag, // EnvelopeTag::RequestJob
MPI_Comm communicator) // MPI_COMM_WORLD
// Wait for a job assignment from 0
MPI_Recv(
void* data, // &next_run
int count, // 1
MPI_Datatype datatype, // MPI_INT
int source, // 0 (is there a better built in macro/enum for 0?)
int tag, // EnvelopeTag::AssignJob
MPI_Comm communicator, // MPI_COMM_WORLD
MPI_Status* status) // &status
// Process the job assignment
do_run()
}
MPI_Finalize() Will require some more thought as to validation (e.g. check count of receive matches) and how to separate it from regular |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
MPI distributed ensemble,
RunPlanVector
has it's work split across multiple nodes.Should be a relatively simple chance to try out MPI, give us a better idea of what we'd be getting into if aiming for multi-node simulations.
The text was updated successfully, but these errors were encountered: