Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating the event manager for MPI communication #136

Merged
merged 12 commits into from
Sep 6, 2018
Merged

Conversation

cssherman
Copy link
Contributor

No description provided.

@cssherman cssherman requested review from rrsettgast and artv3 August 27, 2018 18:26
@artv3
Copy link
Contributor

artv3 commented Aug 27, 2018

@cssherman , is this ready for a review?

@cssherman
Copy link
Contributor Author

I still need to check on a few things with sub-events, I'll let you know when I think its ready for review

@@ -240,6 +240,13 @@ void EventBase::CheckEvents(real64 const time,
{
subEvent->CheckEvents(time, dt, cycle, domain);
});

// Synchronize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to occur here (instead of in event manager) because we need to know the forecast for all events and their children (to allow nesting)

@@ -377,4 +386,15 @@ void EventBase::Cleanup(real64 const& time_n,



integer EventBase::GetExitFlag()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were not propagating any exit flags upwards to the main run loop.

@cssherman
Copy link
Contributor Author

@artv3 I think this is ready for you to look at. Also, it looks like the code is only working with 1-2 partitions along each dimension?

@artv3
Copy link
Contributor

artv3 commented Aug 28, 2018

@cssherman did you merge with a different branch? There seems to be a lot of commits unrelated to the event manager.


#if USE_MPI
real64 dt_global;
MPI_Allreduce(&dt, &dt_global, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can replace the two MPI_Allreduce calls with a single MPI_Gather for a buffer containing both variables, local reduction of each component that was gathered...then a single MPI_Bcast for a buffer containing both variables...then you can overwrite the local variables with the values in the buffer.

}
});

time += dt;
++cycle;
dt = nextDt;
dt = (time + dt > maxTime) ? (maxTime - time) : dt;

#if USE_MPI
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more efficient than the two allreduce calls for the dt and exitFlag

@@ -86,6 +86,14 @@ void HaltEvent::EstimateEventTiming(real64 const time,
m_realDt = currentTime - m_lastTime;
m_lastTime = currentTime;
integer forecast = static_cast<integer>((maxRuntime - (currentTime - m_startTime)) / m_realDt);

// The timing for the ranks may differ slightly, so synchronize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Allreduce can happen here, or we could store a common time-stamp for each of the ranks somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will these be different? It seems that currentTime and m_lastTime will always be in sync?? You could sync up m_startTime in the constructor when it is set....wait what is going on here? Why do you care about the real world runtime? Is this for a maximum wall time?


// Because the function applied to an object may differ by rank, synchronize
// (Note: this shouldn't occur very often, since it is only called if the base forecast <= 0)
#if USE_MPI
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allreduce should be fairly uncommon, since it will only be applied when the following apply:

  1. The base forecast is 0
  2. The forecast is being modified via a function/threshold being applied to an object (not a simple time-series)

@@ -0,0 +1,107 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solver is meant to toy around with random time-step requests. I've been using it to make sure that the event manager, mpi communications are handling the time-stepping correctly.

#if USE_MPI
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
send_buffer = static_cast<real64 *>(malloc(2 * sizeof(real64)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not deallocated anywhere that I can see??

It is better just to do something like
real64 send_buffer[2];
array1d< real64 > receive_buffer( 2 * comm_size )

{
for (integer ii=0; ii<comm_size; ii++)
{
send_buffer[0] = std::min(send_buffer[0], receive_buffer[2*ii]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use dt instead of send_buffer[0] as the argument?

@@ -86,6 +86,14 @@ void HaltEvent::EstimateEventTiming(real64 const time,
m_realDt = currentTime - m_lastTime;
m_lastTime = currentTime;
integer forecast = static_cast<integer>((maxRuntime - (currentTime - m_startTime)) / m_realDt);

// The timing for the ranks may differ slightly, so synchronize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will these be different? It seems that currentTime and m_lastTime will always be in sync?? You could sync up m_startTime in the constructor when it is set....wait what is going on here? Why do you care about the real world runtime? Is this for a maximum wall time?

@@ -86,6 +86,14 @@ void HaltEvent::EstimateEventTiming(real64 const time,
m_realDt = currentTime - m_lastTime;
m_lastTime = currentTime;
integer forecast = static_cast<integer>((maxRuntime - (currentTime - m_startTime)) / m_realDt);

// The timing for the ranks may differ slightly, so synchronize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrsettgast - This event type is meant to handle max wall time. If the forecast for this type of event is <=0, then it will set the exitFlag. Because the currentTime and m_lastTime on each rank will be slightly different, we need to do this check to make sure that all of the ranks will agree.

If you would rather restrict this event from having a executable target, we could skip the allreduce here because the global exitFlag is synced in the main run loop.

@cssherman cssherman merged commit f7990d8 into develop Sep 6, 2018
@cssherman cssherman deleted the eventManagerMPI branch September 10, 2018 19:33
rrsettgast added a commit that referenced this pull request May 14, 2024
rrsettgast added a commit that referenced this pull request May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants