Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use stream in mul_add if given and allocator in subset_sum #438

Merged
merged 5 commits into from
May 8, 2024

Conversation

vincefn
Copy link
Contributor

@vincefn vincefn commented May 5, 2024

Hi Andreas,

Somehow gpuarrays' mul_add did not use the supplied stream parameter. Similarly, the allocator parameter of subset_sum was not used either.

This PR should fix that, and also adds an optional out parameter to mul_add.

Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A couple nits, then this should be good to go.

@@ -2087,7 +2087,8 @@ def subset_sum(subset, a, dtype=None, stream=None, allocator=None):
from pycuda.reduction import get_subset_sum_kernel

krnl = get_subset_sum_kernel(dtype, subset.dtype, a.dtype)
return krnl(subset, a, stream=stream)
return krnl(subset, a, stream=stream,
allocator=drv.mem_alloc if allocator is None else allocator)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should try to get the allocator off of one of the two arrays.

Copy link
Contributor Author

@vincefn vincefn May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could use a's allocator.

On the other hand, looking at other reduction functions (sum, all, any,...), they all just pass allocator to the kernel, even if it's None. So maybe we should do that here for consistency.

(I actually added the if allocator is None... by mistake - the kernel call works with allocator=None, it's only some functions like to_gpu which need a real allocator)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @inducer - I've pushed a87b22f which gives the same behavior as other functions (just pass the function's allocator parameter, even if None).
Let me know if you'd rather want to use the first array's allocator, but in that case the other functions may need to be changed as well.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. Thanks!

test/test_gpuarray.py Show resolved Hide resolved
pycuda/gpuarray.py Outdated Show resolved Hide resolved
@inducer inducer merged commit 8aa0766 into inducer:main May 8, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants