[FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator #4395

abellina · 2021-12-20T16:09:40Z

There is an opportunity in GpuShuffleCoalesceIterator for optimization where we are holding onto the semaphore before we concat on the host.

The cuDF call we are currently calling is: JCudfSerialization.concatToContiguousTable, but it's trivial to split it up since what it calls inside of cuDF is public already. The proposal is to:

Without holding the semaphore, call: JCudfSerialization.concatToHostBuffer to get a HostConcatResult.
Acquire the semaphore
Get the contiguous table on the GPU by calling: HostConcatResult.toContiguousTable.

Results with this change are promising, saving for all queries about 1 minute of runtime when adding everything up. Most queries are above the 1x line. The queries at or below 0.9 were: q52, q46, q68, q45, and q42. When executed multiple times they all went above 1x (these are single-digit second queries).

The text was updated successfully, but these errors were encountered:

abellina · 2021-12-20T16:11:54Z

An even better change may be for us to delay putting these batches on the GPU until we really need them. This would be specific for joins, where the build side could rest on the GPU for a while as we wait for the stream side. In these cases what you almost want is to concatToHostBuffer, but keep that HostConcatResult around and not need to acquire the GPU at all until the stream side is also ready to go.

jlowe · 2021-12-20T16:22:03Z

In these cases what you almost want is to concatToHostBuffer, but keep that HostConcatResult around and not need to acquire the GPU at all until the stream side is also ready to go.

This is a join-specific optimization, and I don't think we should get too far ahead of ourselves there. The optimization described above should apply to all cases where we are doing host-side concat, and thus I think is a good optimization as-is. We can always add a more sophisticated optimization for the shuffle-into-a-join case, but let's get this one done first.

abellina added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Dec 20, 2021

abellina self-assigned this Dec 20, 2021

abellina mentioned this issue Dec 20, 2021

GpuShuffleCoalesceIterator acquire semaphore after host concat #4396

Merged

abellina closed this as completed in #4396 Dec 21, 2021

sameerz removed the ? - Needs Triage Need team to review and classify label Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator #4395

[FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator #4395

abellina commented Dec 20, 2021 •

edited

Loading

abellina commented Dec 20, 2021

jlowe commented Dec 20, 2021

[FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator #4395

[FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator #4395

Comments

abellina commented Dec 20, 2021 • edited Loading

abellina commented Dec 20, 2021

jlowe commented Dec 20, 2021

abellina commented Dec 20, 2021 •

edited

Loading