-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Improve cudf::gather scalability as number of columns increases #13509
Comments
I also believe, that improving the performance for |
If we ignore lists and strings for the moment, I think it would be pretty easy to put together a proof-of-concept for doing batched fixed width gathers as a single kernel invocation (well, maybe 2 - one more for validity). Strings is probably not too hard of an extension. Lists would definitely be tricky. I'd have to wrap my head around the list gather stuff to remember :) |
At one point cudf (possibly before libcudf!) used a stream pool for gather operations. Each gather is independent, so we can launch all the kernels on separate streams and synchronize them with an event on the input stream. I would love to reimplement this approach and see if it can improve the performance. See also #12086. |
As the number of columns increases for
cudf::gather
with the same gather map, we see the number of kernels called increase proportionally and the runtime increases linearly. We are wondering if there are better ways to group or "batch" these calls so we perform less kernel invocations that can do more work all at once, in hopes of amortizing some of the cost with many columns or deeply nested schemas.A very simple example is below. This creates a column of 10
int32_t
rows and adds it to a struct N times (whereN
is between 2 and 1024):As the column count increases by 2x, the gather kernel takes 2x longer:
A similar argument can be made for columns that have nested things like arrays of structs (each with array members). The number of calls to underlying cub calls can increase drastically.
I am filing this issue to solicit comments/patches to see how we could improve this behavior.
The text was updated successfully, but these errors were encountered: