Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why is GpuCoalesceBatches performance is bad sometimes ? #6107

Closed
chenrui17 opened this issue Aug 27, 2020 · 8 comments
Closed

why is GpuCoalesceBatches performance is bad sometimes ? #6107

chenrui17 opened this issue Aug 27, 2020 · 8 comments
Assignees
Labels
question Further information is requested Spark Functionality that helps Spark RAPIDS

Comments

@chenrui17
Copy link
Contributor

What is your question?
image
I test tpcds query93, i found that GpuCoalesceBatches is long tail , I want to know why and how to know what's happening

@chenrui17 chenrui17 added Needs Triage Need team to review and classify question Further information is requested labels Aug 27, 2020
@jlowe
Copy link
Member

jlowe commented Aug 27, 2020

GpuCoalesceBatches is not the slow part here. The actual concatenation of the batches, measured by the concat batch time metric, was only 3ms on average and 180ms worst-case. The collect batch time metric is measuring how long this query plan node took to receive all of the batches (i.e.: how long it took to drain the input iterator). That means it's really measuring the time of nodes earlier in the plan up to the beginning of the stage (usually a scan if the first stage or an exchange if a subsequent stage).

Since the image only shows this specific node I can't say offhand which node was the slowest, but it wasn't this one.

@jlowe jlowe removed the Needs Triage Need team to review and classify label Aug 27, 2020
@kkraus14
Copy link
Collaborator

Should this issue move to https://github.com/nvidia/spark-rapids?

@kkraus14 kkraus14 added the Spark Functionality that helps Spark RAPIDS label Aug 27, 2020
@jlowe
Copy link
Member

jlowe commented Aug 27, 2020

Yes, this is a question on the Spark plugin not cudf.

@jlowe jlowe closed this as completed Aug 27, 2020
@chenrui17
Copy link
Contributor Author

i am sorry , i carelessly sent it in the wrong place
@jlowe Thank you for your help, this is part of dag, i think GpuShuffledHashJoin is slowest , all right ? if so , why ?do we have some optimize method , I a tuned several times ,like concurrentTask , rapidsBatchSize, but its performance is also bad.
image
here is tpc-ds query 93 complete history UI , I want to why this query run slowly, and especially stage 7 spent 2.4m ,while cpu spark cost only 1m.
Details for TPC-DS Q93.htm.zip

@chenrui17
Copy link
Contributor Author

chenrui17 commented Aug 28, 2020

here is CPU SparkSQL history UI , the whole query only spent 1.3m
tpcds_q93_history_ui_CPU.zip

@jlowe
Copy link
Member

jlowe commented Aug 28, 2020

That's the wrong direction in the query plan to look. The build time metric in GpuShuffledHashJoin is very similar to the collect batch time metric in GpuCoalesceBatches. It's how long it spent waiting to drain it's input iterator to get all of the data for the build-side table. If you look at how long it took to actually perform the join once all the input was retrieved, it was very fast. join time was only 19ms on average, 657ms for the worst-case in any task.

I can't readily explain why the worst-case coalesce batch time collect time was only 19.3s but the worst-case build time was 59.6s and the join itself isn't most of the difference. Tasks waiting for their turn on the GPU can factor into these metrics, but I would expect any time spent waiting for the GPU to be accounted for in both node's metrics.

I'll spend some time diffing into the details you uploaded, but we may need an nsight systems profile trace to get more details on where the time is being spent in the join vs. collect.

@jlowe jlowe reopened this Aug 28, 2020
@jlowe jlowe self-assigned this Aug 28, 2020
@jlowe
Copy link
Member

jlowe commented Aug 28, 2020

@chenrui17 can you file this issue in https://github.com/NVIDIA/spark-rapids? This issue needs to be tracked there.

@JustPlay
Copy link

JustPlay commented Sep 6, 2020

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

4 participants