-
Notifications
You must be signed in to change notification settings - Fork 318
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Performance
Component
CUB
Describe the bug
In Rapids, after switching CCCL branch to 3.2.0, we observe a preformance regression that causes some of our code path to run at 2X slower speed. For example:
## [0] Quadro RTX 6000
| num_rows | depth | null_frequency | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
|------------|---------|------------------|------------|-------------|------------|-------------|-----------|---------|----------|
| 1024 | 4 | 0 | 17.769 ms | 5.34% | 39.703 ms | 5.35% | 21.934 ms | 123.44% | SLOW |
We had to do bisection on CCCL commits branch 3.2.X to investigate. The bisection process is not easy, as we had to go back in time across various repositories at the same time, fixing various building issues (due to changes in the build systems across various projects).
Finally, the root cause is found:
Checked out that commit:
| num_rows | depth | null_frequency | Samples | CPU Time | Noise | GPU Time | Noise |
|----------|-------|----------------|---------|-----------|-------|-----------|-------|
| 1024 | 4 | 0 | 374x | 40.160 ms | 5.06% | 40.154 ms | 5.06% |
Moving back one commit before it, and the performance gets back to normal:
| num_rows | depth | null_frequency | Samples | CPU Time | Noise | GPU Time | Noise |
|----------|-------|----------------|---------|-----------|-------|-----------|-------|
| 1024 | 4 | 0 | 647x | 23.169 ms | 5.68% | 23.163 ms | 5.66% |
Since 2X slow down regression is serious, we would like to have it fixed for our 26.02 release.
How to Reproduce
In cudf, build the benchmark SET_OPS_NVBENCH and run this particular one:
SET_OPS_NVBENCH --benchmark have_overlap --axis null_frequency=0 --axis depth=4 --axis num_rows=1024
Expected behavior
The runtime should be around 20ms.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status