-
Notifications
You must be signed in to change notification settings - Fork 897
opal/accelerator: Add IPC APIs #10959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A start for adding generic IPC functionality. |
opal/mca/accelerator/accelerator.h
Outdated
* | ||
*/ | ||
typedef int (*opal_accelerator_base_module_get_ipc_handle_fn_t)( | ||
int dev_id, void *dev_ptr, void **handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a void* instead of a class object like the rest of the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New revision encapsulates handle
and size
, therefore we don't need the API to get handle size.
However, I think it's necessary to keep device id and base address - didn't find a simplistic way to hide them, maybe in IPC cache if we decide to implement that.
@edgargabriel Reminded me of this PR. I shall pick it up soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Since @bwbarrett did the initial review of this pr, it might be good to get his review also in |
I'm going to be under water with things to do the next couple of weeks, so don't wait for my review. |
@lrbison Since Brian is down, could you give it a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty short PR. I am not tracking the IPC functions well enough to know if this is sufficient for all use cases. However I don't see anything wrong with it and it shouldn't break existing code so I see no problem in approving it and getting it merged either.
@wenduwan as I mentioned in the meeting, I think meanwhile that we are missing some functions. Assuming that the goal is to be able to remove the remaining hard-coded calls to cuda and replace them with accelerator framework API function calls, here are the functions that are still missing. Not all of them are IPC related, so not sure whether they would absolutely have to be part of this PR:
In addition, looking over the current Accelerator API, I noticed that we have an interface to create a stream, but not to destroy a stream ( similarly on how we could create an event, but not destroy one, that function is however already in the list above). |
Introduce the destroy stream operation to properly release resources. Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
Introduce the synchronize stream operation. Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
Introduce the operation to make stream wait on an event. Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
Introduce the destroy_event operation to properly release resources. Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
Signed-off-by: William Zhang <wilzhang@amazon.com> Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
@edgargabriel Thanks for the reminder - I took a second look at CUDA and ZE API docs and added a couple more that I think are important. Please let me know if that makes sense to you. I do not have extensive experience with accelerators so I'm open to suggestions to remove less common APIs. Also I want to mention that I didn't see a |
@wenduwan good point. As far as I can see, neither hip nor cuda have a Since I just read through the documentation of the ipc functionality, I am also double checking one more aspect, namely the fact that the events for IPC have to be created with certain flags set ( |
@wenduwan I would suggest to go ahead and merge the PR as is, i.e. without the closeIpcEventHandle. There is no hip or cuda function along those lines, and because of that its also not used anywhere in the code. If it turns out later that we still need one for another accelerator, we can still add the API in, we do have a bit of time before the 5.1 release |
@edgargabriel Cool thanks for double checking. @hppritcha Since you chimed in on this during the Tuesday call I want to see if you have other comments? |
@lrbison @hppritcha Can we merge this? I will open backport to 5.0 |
@wenduwan why would you backport this to 5.0? I think this is meant for 5.1, its new functionality? |
I see. Cool then we can leave it on main. |
This change implements stub APIs introduced in open-mpi#10959 Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
Signed-off-by: William Zhang wilzhang@amazon.com