-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for host operations to executor #1232
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some drawback in doing
exec->get_master()->run(...)
instead of having a separate call ? Then you dont need theGKO_REGISTER_HOST_OPERATION
at all, right ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not being logged with the executor the object lives on, which is also the reason that we need the workaround in the benchmarks. With the suggested approach, we'd need to pass an additional Executor parameter and provide overloads for both OmpExecutor and ReferenceExecutor and put them inside the
kernels
namespace, which IMO isn't a good place to putcore
functionality.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, anything computationally expensive shouldn't be happening on the host. It should be on the kernel side. That also is in line with the Ginkgo philosophy of kernels doing the hard work and the host orchestrating the calling. It is perfectly fine if these kernels then are serial (with reference or with single threaded OpenMP), but I dont think we should be doing anything intensive on the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operations we are looking at here (symbolic factorizations, elimination tree, later on MC64, AMD, METIS, ...) are either inherently sequential or very hard to parallelize (see e.g. RCM), let alone implement on GPUs, so either we duplicate the same implementation to OpenMP and Reference and call the kernels for host data only, or we have just a single place where they are implemented, in this case
core
. I think the latter is the more sensible choice here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the operations are sequential and probably not suited for GPUs. My point is that they perform computations which are potentially expensive (with scaled up problem size) and hence they should not be on the core side. Regarding the duplication, we can just have a common kernel file (like we do for HIP and CUDA) and have the serial implementation on OpenMP (forcing it on one thread) and reference alike.
I think performing computation on the host side can have implications for future design. For example, if we want to add some stream/asynchronous tasking for these operations. Performing these operations on the kernel side would allow us to keep the asynchronous tasking/streaming kernel interface and keep the core side clean.
IMO avoiding this also keeps our logging and profiling interface clean, without fragmenting it into host operations and kernel operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to unpack the meaning of core and kernel a bit more here. core is everything in libginkgo.so, kernel is everything in libginkgo_<backend>.so. You are talking about a conceptional separation that IMO doesn't apply here - we already have some expensive operations in core, mainly factorization-related. The only thing this PR does is make them visible to profilers regardless of the executor the objects live on. The logging interface is not impacted by this change, since to the loggers, host and kernel operations look the same.
If we were to make them kernel operations, we would have to do a lot more work: Kernels can't create Ginkgo objects, so we need to operate on arrays and combine them into Ginkgo objects on the core side again. Also as a minor point, it also doubles compilation time and binary size of the resulting functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would advocate for making them kernel operations, so this might be something we discuss before merging this PR.
Allocations should not happen inside a operation, and I think that is all the more reason to have allocations on the host side and do the operations on the kernel side. We log and profile the allocations separately anyway, so having the allocations inside the host operation makes the profiling and logging muddled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allocate a lot inside kernels, e.g. SpGEMM/SpGEAM, sparselib bindings, all kernels that use temporary memory, ..., removing these allocations/moving them into core would complicate the control flow significantly.