[Extension] Improve Interoperability with other asynchronous libraries #181

illuhad · 2019-12-17T18:06:39Z

Based on a twitter discussion with @ax3l, we should investigate improving the interoperability of SYCL implementations with other asynchronous libraries (such as MPI or some async IO library).
This issue serves to track possible approaches and to discuss the specific requirements for users, so feedback is appreciated :)

At the moment I am thinking in particular of two new features:

Specify command group dependencies on some external events
Add callback mechanisms that trigger as soon as the SYCL implementation realizes that a task is complete (Note: depending on the task submission model implemented by the SYCL implementation, it may be possible that this callback will not be triggered right after a task completes; for example in a batched submission model the SYCL runtime may only notice that tasks are done once the entire batch is complete)

1. Specifying command group dependencies on some external events

I think external events could best be implemented on top of the explicit synchronization mechanism that is part of the Intel USM proposal which introduces sycl::handler::depends_on(sycl::event evt), e.g.:

q.submit([&](cl::sycl::handler& cgh){
  cgh.depends_on(some_sycl_event);
  cgh.parallel_for(...);
});

In analogy, we could introduce an overload sycl::handler::depends_on(sycl::external_event evt), with a new class external_event:

class external_event
{
public:
  external_event();
  // user can optionally provide a function that will be used by the SYCL implementation 
  // to test the state of the event. If this function is not provided, the event can only complete 
  // when   `signal_completion()` is called.
  // As soon as either the test_state function returns true, or the signal_completion() function
  // is called, the SYCL runtime will consider the event as complete.
  // The test_state function will not be invoked again after it has returned true for the first time.
  external_event(std::function<bool ()> test_state);

  void wait();

  // When called, signals to the SYCL runtime that this event has completed.
  void signal_completion();

  // ... plus remaining functions that are present in sycl::event for consistency
};

2. Add callback mechanisms

Probably what would be most consistent with SYCL would be adding a function handler::callback(std::function<void ()>) to specify callbacks for a given command group handler. Combined with external events, this could lool like this:

sycl::external_event evt([]() -> bool {
  return is_my_event_done();
});
q.submit([&](sycl::handler& cgh) {
  cgh.depends_on(evt);
  cgh.callback([q](){
    // Will be called once this command group is done. Could do some additional submits:
    q.submit(...);
  });
  cgh.parallel_for(...);
});

semantically, a callback is similar to a single_task kernel running on the host device that depends on the given kernel. However, a callback shall be executed by the SYCL implementation at its earliest covenience, whereas a regular kernel may be delayed in execution (e.g., because of kernel reordering for better overlap of compute/data transfers, waiting for more kernels for batched kernel submission etc).
User data can be used in the callback lambda by simply capturing whatever is needed, therefore no parameters are required for its signature.
Whether the execution of the callback blocks the device on which its kernel was executed is implementation-defined (?? we could also specify that it never blocks, but this probably requires some more implementation effort)
as for depends_on(), the location of the function call to callback() inside the command group is irrelevant.
It is allowed to submit other SYCL kernels inside the callback
When there are several callback() calls inside a command group all will be invoked even if some of them are called with the same callback lambda or function object. The order in which they will be executed is implementation defined and not guaranteed to be deterministic.

The text was updated successfully, but these errors were encountered:

ax3l · 2019-12-17T21:26:11Z

MPI communication and long-running I/O routines are exactly what I face every day, yes!

Just a minor comment that probably works with this proposal: for the user-provided test function, there are some cases such as MPI_Test functions that can be queried an arbitrary amount of time until they return success for the first time (and they must not be queried again after that). One can probably express this somehow as a user-defined state (static or member var?) when implementing this as a test_state function? Just want to mention this little odd use case.

illuhad · 2019-12-17T22:21:03Z

Thanks for the hint! I would expect that an implementation of external_event will typically contain a bool variable that specifies if the event has completed in order to support both the signal_completion() function as well as the user-provided test function. Under this assumption, the test function would probably not be called again anyway in typical implementations if the bool variable already indicates completion, which would be the behavior that you require.
In other words, since typical implementations of external_event would already work with such a restriction, I think we might as well just guarantee the user that the test function will not be called again after it returns success.
Edit: Original post with the definition of external_event now also guarantees this behavior.

ax3l · 2019-12-17T22:25:56Z

That's a good constrain, thanks!

illuhad · 2019-12-23T15:41:08Z

Added a concept for callbacks.

illuhad added the discussion General discussion about something label Dec 17, 2019

illuhad added the extension design label Jul 9, 2020

illuhad changed the title ~~Improve Interoperability with other asynchronous libraries~~ [Extension] Improve Interoperability with other asynchronous libraries Jul 9, 2020

abboomer mentioned this issue Oct 26, 2020

cuda clang++ compilation crash #356

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Extension] Improve Interoperability with other asynchronous libraries #181

[Extension] Improve Interoperability with other asynchronous libraries #181

illuhad commented Dec 17, 2019 •

edited

Loading

ax3l commented Dec 17, 2019 •

edited

Loading

illuhad commented Dec 17, 2019 •

edited

Loading

ax3l commented Dec 17, 2019

illuhad commented Dec 23, 2019

[Extension] Improve Interoperability with other asynchronous libraries #181

[Extension] Improve Interoperability with other asynchronous libraries #181

Comments

illuhad commented Dec 17, 2019 • edited Loading

1. Specifying command group dependencies on some external events

2. Add callback mechanisms

ax3l commented Dec 17, 2019 • edited Loading

illuhad commented Dec 17, 2019 • edited Loading

ax3l commented Dec 17, 2019

illuhad commented Dec 23, 2019

illuhad commented Dec 17, 2019 •

edited

Loading

ax3l commented Dec 17, 2019 •

edited

Loading

illuhad commented Dec 17, 2019 •

edited

Loading