Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministic epoch behavior for enqueue/addAction/threads #2179

Closed
Matthew-Whitlock opened this issue Jul 26, 2023 · 1 comment
Closed
Assignees

Comments

@Matthew-Whitlock
Copy link
Contributor

Matthew-Whitlock commented Jul 26, 2023

Describe the bug
A few nondeterministic situations for addAction/enqueue

  1. As Adding an action in the terminator should produce/consume on epoch? #649 addAction still has an arbitrary epoch stack state, as does enqueue. Should fcontext user-threads maintain separate epoch stacks?
  2. None produce/consume on any epochs, so waiting for an epoch to finish does not actually guarantee completion of all enclosing work.
  3. addAction on an already completed epoch executes immediately, blocking continuation of the current context. Same for any context which finishes an epoch via finishedEpoch or releaseLocalDependency/consume on a locally rooted epoch. At best it's potentially performance-hurting order-enforcement; at worst it generates a deadlock if the action uses a blocking runSchedulerThrough on an epoch relying on that context.
    3.5. Similarly, each action is run sequentially so the order of adding them can lead to blocking actions preventing the subsequent ones from being run.

To Reproduce
1/2. See #649 (comment)

  1. The following code deadlocks, as the action for ep1 runs immediately and blocks progress on main() while depending on further progress of main()
int main() {
  auto ep1 = theTerm()->makeEpochCollective();
  auto ep2 = theTerm()->makeEpochCollective();

  theMsg()->pushEpoch(ep1);
  if(... false ...) //send some messages with ep1
  theMsg()->popEpoch(ep1);
  theTerm()->finishedEpoch(ep1);

  theTerm()->addAction(ep1, [=]{
    //some work
    runSchedulerThrough(ep2);
    //some more work
  });

  theMsg()->pushEpoch(ep2);
  // send some messages with ep2
  theMsg()->popEpoch(ep2);
  theTerm()->finishedEpoch(ep2);
}

Expected behavior

  1. Any unit of work is (default) included in the enclosing epoch, and the epoch stack is handled accordingly.
    If the current semantics are required, adding some calls like theSched()->enqueueTask(runnable) and theSched()->enqueueTask(epoch, runnable) to point users to as the default and safe functions would be a fix.
  2. The runtime only switches to other units of work when explicitly requested by the user (runScheduler, runInEpoch, finalize).
    A solution to both 3/3.5 is to enqueue rather than run each lambda on epoch completion.

Using task vs action as consistent ways to express tracked/asychronous and untracked/immediate would be a good approach.

@lifflander
Copy link
Collaborator

@Matthew-Whitlock I've had more of a chance to think about this and would like to discuss with you in more detail when you get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants