Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Improve shuffle performance #1123

Open
Tracked by #391 ...
andygrove opened this issue Nov 26, 2024 · 4 comments
Open
Tracked by #391 ...

[EPIC] Improve shuffle performance #1123

andygrove opened this issue Nov 26, 2024 · 4 comments
Labels
enhancement New feature or request performance
Milestone

Comments

@andygrove
Copy link
Member

andygrove commented Nov 26, 2024

This epic is for improving shuffle / ScanExec performance.

Issues

Context

I have been comparing Comet and Ballista performance for TPC-H q3. Both execute similar native plans. I am using the comet-parquet-exec branch which uses DataFusion's ParquetExec.

Ballista is approximately 3x faster than Comet. Given that they are executing similar DataFusion native plans, I would expect performance to be similar.

The main difference between Comet and Ballista is that Comet transfers batches between JVM and native code during shuffle operations.

Most of the native execution time in Comet is spent in ScanExec which is reading Arrow batches from the JVM using Arrow FFI. This time was not included in our metrics prior to #1128 and #1111.

Screenshot from 2024-11-26 12-30-01

@andygrove andygrove added enhancement New feature or request performance labels Nov 26, 2024
@andygrove andygrove changed the title Improve shuffle performance [EPIC] Improve shuffle performance Nov 26, 2024
@andygrove andygrove reopened this Nov 26, 2024
@viirya
Copy link
Member

viirya commented Nov 27, 2024

Most of the native execution time in Comet is spent in ScanExec which is reading Arrow batches from the JVM using Arrow FFI.

Do you mean when ScanExec is used as pseudo scan node to read shuffled data into next native execution?

If so, it makes some sense because Comet shuffle reader is still JVM based. We should make it native eventually to boost shuffle reading performance. It is on the early roadmap when we started on Comet shuffle. Although it is not super urgent and high priority at that moment. But now I think it is the time we can begin to work on this.

Opened: #1125

@andygrove
Copy link
Member Author

Do you mean when ScanExec is used as pseudo scan node to read shuffled data into next native execution?

Yes, exactly.

@andygrove
Copy link
Member Author

andygrove commented Dec 1, 2024

If so, it makes some sense because Comet shuffle reader is still JVM based.

This is also an issue for shuffle writes. The child node of ShuffleWriterExec is always a ScanNode reading the output from a native plan.

We pay the FFI cost twice - once to import from native plan to JVM, then again to export to the shuffle write native plan. We have the cost of schema serde in both directions. Perhaps there is a way to shortcut this and avoid a full serde because we do not need to read the batch in the JVM in this case, just pass it from one native plan to another.

@andygrove
Copy link
Member Author

I created a Google document for collaborating on ideas around improving shuffle performance:

https://docs.google.com/document/d/1rx1ue7UZ4ljzic9Rc2kT-v35bfLB7Rhhe5FW1d0Sw4I/edit?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

2 participants