Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use/test Comet? #20

Closed
jinwenjie123 opened this issue Feb 14, 2024 · 8 comments
Closed

How to use/test Comet? #20

jinwenjie123 opened this issue Feb 14, 2024 · 8 comments
Labels
question Further information is requested

Comments

@jinwenjie123
Copy link

jinwenjie123 commented Feb 14, 2024

Hi Team,

I am trying to evaluate the performance of using the comet plugin.
But I did not find any documentations about how to use the comet plugin after compiling it.
Specifically, how to use it in the cluster mode.

Thanks

@viirya
Copy link
Member

viirya commented Feb 14, 2024

I think we will provide more documents on this.

Currently, https://github.com/apache/arrow-datafusion-comet/blob/main/bin/comet-spark-shell contains the simply usage of how to come out necessary configuration in Spark side to use Comet.

You just need to build Comet, distribute the jar with Spark and set up necessary configs to trigger it.

Btw, we haven't open source some features which are performance related. So the performance number you run might be not really accurate.

@viirya viirya added the question Further information is requested label Feb 15, 2024
@jinwenjie123
Copy link
Author

jinwenjie123 commented Feb 15, 2024

Hi Team,

I also noticed that when comet detects unsupported features and fallback to Spark engine. Does it come with the cost of extra time to convert columnar data to row based data format ?

Thank you for your time.

@viirya
Copy link
Member

viirya commented Feb 15, 2024

Yes, that's correct. At the boundary between Comet operator and Spark operator, we need ColumnarToRow operator to convert from ColumnarBatch to Spark InternalRow which takes some time cost. Overall we expect that the gain obtained from native operators can be bigger than these cost. And with more native operator support, we can reduce such fallback and the cost of columnar to row conversion.

@jinwenjie123
Copy link
Author

Hi Team,

I am wondering will we have a documentation about what kinda of data types are supported by Comet/Datafusion (Like, Decimal(16, 6), interger .... ) and will not fallback to vanilla spark and cause the regression.

Or where I can lookup to find related information. Since this is very important while we are trying to evaluate whether to use Comet or not.

Thank you so much !

@viirya
Copy link
Member

viirya commented Feb 20, 2024

Hello, I answered in #64.

@comphead
Copy link
Contributor

comphead commented Mar 2, 2024

Can it be closed via #125? @viirya

@viirya viirya closed this as completed Mar 2, 2024
@viirya
Copy link
Member

viirya commented Mar 2, 2024

Okay

@viirya
Copy link
Member

viirya commented Mar 2, 2024

@jinwenjie123 Feel free to open new issues on Comet usage if you still have other questions. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants