[FEA] Support batch construction of strings columns #16486
Labels
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
In our framework, we often see this pattern:
That means there are a lot of calls to
cudf::make_strings_column
, involving a lot of stream synchronization (at least two stream syncs per call: one when generating offsets and one when generating bitmask). In case we have 10, 20 30 etc output columns, having such high count number of stream syncs is very inefficient.We can do better by implementing a batch construction for strings columns, deferring stream syncs until absolutely necessary. There may be just one stream sync needed for the entire process.
The next optimization level could be to fuse the involved kernels together: instead of calling a separate kernel for constructing each column, we call just one kernel for all columns. For example, calling one
valid_if
kernel for generating nullmask for all columns, onecopy
kernel for generating chars data of all columns etc.The text was updated successfully, but these errors were encountered: