Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Enable comet tests with spark-4.0 profile #493

Merged
merged 47 commits into from
Jun 3, 2024

Conversation

kazuyukitanimura
Copy link
Contributor

Which issue does this PR close?

Part of #372

Rationale for this change

To be ready for Spark 4.0

What changes are included in this PR?

This PR enables the comet tests with the spark-4.0 profile

How are these changes tested?

Tests with the spark-4.0 profile now should pass. (But Spark tests do not yet)

@kazuyukitanimura
Copy link
Contributor Author

assume(isSpark34Plus)
// Spark 4.0 no longer fails for widening types
// https://github.com/apache/spark/commit/3361f25dc0ff6e5233903c26ee105711b79ba967
assume(isSpark34Plus && !isSpark34Plus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean assume(isSpark34Plus && !isSpark40Plus) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Thank you!

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kazuyukitanimura
Copy link
Contributor Author

Friendly ping @viirya @andygrove @comphead @huaxingao
I plan to merge this unless I hear back.
@parthchandra already approved.

@@ -23,7 +23,6 @@ Comet provides the following configuration settings.

| Config | Description | Default Value |
|--------|-------------|---------------|
| spark.comet.ansi.enabled | Comet does not respect ANSI mode in most cases and by default will not accelerate queries when ansi mode is enabled. Enable this setting to test Comet's experimental support for ANSI mode. This should not be used in production. | false |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights that our configs vary depending on the version we build against and we currently only publish one version of our configs.

Ideally we should generate configs per spark version and publish all of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will file an issue for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I plan to associate this config with spark.sql.ansi.enabled. So when Spark enables ANSI, so does Comet


trait ShimCometConf {
protected val COMET_SCHEMA_EVOLUTION_ENABLED_DEFAULT = true
protected val COMET_ANSI_MODE_ENABLED_DEFAULT = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to enable this by default when we know that we are not compatible yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we have to do this otherwise all plans will currently fall back to Spark because ANSI mode is enabled by default in Spark 4.

We should really only fall back to Spark for expressions that are actually affected by ANSI support rather than fall back for everything.

Perhaps we could add a note to the documentation explaining that ANSI mode in Spark 4 is not fully compatible yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking to use Spark-4.0 as a test bed for ANSI readiness. I will send another PR for enabling Spark tests and if I find any incompatibility, I will disable them per function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need it to be enabled for Spark tests, isn't?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see #509 to allow us to fall back per function if ANSI is not supported yet and #313 where we are tracking all ANSI work

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @kazuyukitanimura

I tihnk we should add a note to the documentation saying that we support Spark 4 now and also point out that ANSI mode is not yet compatible. I think it would be fine to do that as a follow-up PR.

package org.apache.comet.shims

trait ShimCometConf {
protected val COMET_SCHEMA_EVOLUTION_ENABLED_DEFAULT = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need a trailing _DEFAULT in the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say so as we have COMET_SCHEMA_EVOLUTION_ENABLED in conf. Also this is default so that users can overwrite.

@kazuyukitanimura
Copy link
Contributor Author

I tihnk we should add a note to the documentation saying that we support Spark 4 now and also point out that ANSI mode is not yet compatible. I think it would be fine to do that as a follow-up PR.

Thank you @andygrove I plan to send a few more PRs on Spark-4.0
Yes, I will document things about Spark-4.0

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @kazuyukitanimura great PR

@@ -382,12 +384,13 @@ object CometConf {
.createWithDefault(Seq("Range,InMemoryTableScan"))

val COMET_ANSI_MODE_ENABLED: ConfigEntry[Boolean] = conf("spark.comet.ansi.enabled")
.internal()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why making it internal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #493 (comment)
doc is not per profile yet. By making it internal, doc will ignore it.

Comment on lines +387 to +393
.internal()
.doc(
"Comet does not respect ANSI mode in most cases and by default will not accelerate " +
"queries when ansi mode is enabled. Enable this setting to test Comet's experimental " +
"support for ANSI mode. This should not be used in production.")
.booleanConf
.createWithDefault(false)
.createWithDefault(COMET_ANSI_MODE_ENABLED_DEFAULT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to mention it is enabled by default for Spark 4.0 now. It could produce incorrect results so we better don't make it implicitly enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Spark 4.0 support is not ready at all. I will send follow up PRs

Comment on lines +2204 to +2205
val zero = Literal.default(expression.dataType)
If(EqualTo(expression, zero), Literal.create(null, expression.dataType), expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NullIf implementation was changed to use With that Comet does not support apache/spark#43623

@kazuyukitanimura kazuyukitanimura merged commit 77c9a6c into apache:main Jun 3, 2024
43 checks passed
@kazuyukitanimura
Copy link
Contributor Author

Thank you @viirya @andygrove @comphead @parthchandra

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
## Rationale for this change

To be ready for Spark 4.0

## What changes are included in this PR?

This PR enables the comet tests with the spark-4.0 profile

## How are these changes tested?

Tests with the spark-4.0 profile now should pass. (But Spark tests do not yet)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants