Nondeterministic epoch behavior for enqueue/addAction/threads #2179

Matthew-Whitlock · 2023-07-26T17:46:01Z

Describe the bug
A few nondeterministic situations for addAction/enqueue

As Adding an action in the terminator should produce/consume on epoch? #649 addAction still has an arbitrary epoch stack state, as does enqueue. Should fcontext user-threads maintain separate epoch stacks?
None produce/consume on any epochs, so waiting for an epoch to finish does not actually guarantee completion of all enclosing work.
addAction on an already completed epoch executes immediately, blocking continuation of the current context. Same for any context which finishes an epoch via finishedEpoch or releaseLocalDependency/consume on a locally rooted epoch. At best it's potentially performance-hurting order-enforcement; at worst it generates a deadlock if the action uses a blocking runSchedulerThrough on an epoch relying on that context.
3.5. Similarly, each action is run sequentially so the order of adding them can lead to blocking actions preventing the subsequent ones from being run.

To Reproduce
1/2. See #649 (comment)

The following code deadlocks, as the action for ep1 runs immediately and blocks progress on main() while depending on further progress of main()

int main() {
  auto ep1 = theTerm()->makeEpochCollective();
  auto ep2 = theTerm()->makeEpochCollective();

  theMsg()->pushEpoch(ep1);
  if(... false ...) //send some messages with ep1
  theMsg()->popEpoch(ep1);
  theTerm()->finishedEpoch(ep1);

  theTerm()->addAction(ep1, [=]{
    //some work
    runSchedulerThrough(ep2);
    //some more work
  });

  theMsg()->pushEpoch(ep2);
  // send some messages with ep2
  theMsg()->popEpoch(ep2);
  theTerm()->finishedEpoch(ep2);
}

Expected behavior

Any unit of work is (default) included in the enclosing epoch, and the epoch stack is handled accordingly.
If the current semantics are required, adding some calls like theSched()->enqueueTask(runnable) and theSched()->enqueueTask(epoch, runnable) to point users to as the default and safe functions would be a fix.
The runtime only switches to other units of work when explicitly requested by the user (runScheduler, runInEpoch, finalize).
A solution to both 3/3.5 is to enqueue rather than run each lambda on epoch completion.

Using task vs action as consistent ways to express tracked/asychronous and untracked/immediate would be a good approach.

The text was updated successfully, but these errors were encountered:

lifflander · 2023-09-14T22:08:26Z

@Matthew-Whitlock I've had more of a chance to think about this and would like to discuss with you in more detail when you get a chance.

Matthew-Whitlock added the type: bug label Jul 26, 2023

lifflander self-assigned this Aug 1, 2023

lifflander closed this as completed Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nondeterministic epoch behavior for enqueue/addAction/threads #2179

Nondeterministic epoch behavior for enqueue/addAction/threads #2179

Matthew-Whitlock commented Jul 26, 2023 •

edited

Loading

lifflander commented Sep 14, 2023

Nondeterministic epoch behavior for enqueue/addAction/threads #2179

Nondeterministic epoch behavior for enqueue/addAction/threads #2179

Comments

Matthew-Whitlock commented Jul 26, 2023 • edited Loading

lifflander commented Sep 14, 2023

Matthew-Whitlock commented Jul 26, 2023 •

edited

Loading