Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-7690][CORE][CH][VL] GlutenConfig should support runtime configuration changes #7691

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Oct 26, 2024

What changes were proposed in this pull request?

This PR proposes to improve GlutenConfig so as it supports runtime configuration changes.
This PR also makes below improvements.

  1. Unify the variable name of SparkSession with spark is usually used by Spark.
  2. Avoid declaring configuration value as a constant. e.g. val scanOnly: Boolean = glutenConf.enableScanOnly.
  3. Other changes.

(Fixes: #7690)

How was this patch tested?

integration tests

@github-actions github-actions bot added CORE works for Gluten Core VELOX CLICKHOUSE labels Oct 26, 2024
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@beliefer beliefer changed the title [WIP][GLUTEN-7690][CORE][CH][VL] Remove GlutenConfig from ColumnarRuleApplier [WIP][GLUTEN-7690][CORE][CH][VL] GlutenConfig should support runtime configuration changes Oct 28, 2024
@beliefer beliefer changed the title [WIP][GLUTEN-7690][CORE][CH][VL] GlutenConfig should support runtime configuration changes [GLUTEN-7690][CORE][CH][VL] GlutenConfig should support runtime configuration changes Oct 28, 2024
Copy link

#7690

Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

zhztheplayer commented Oct 28, 2024

GlutenConfig prevents the dynamically modified configurations in interactive scenarios.

@beliefer Would you like to post a code example of the interactive case you mentioned? Or probably to add a test case to guard against it?

@@ -37,7 +37,7 @@ import org.apache.spark.unsafe.types.UTF8String
// This rule try to make the filter condition into integer comparison, which is more efficient.
// The above example will be rewritten into
// select * from table where to_unixtime('2023-11-02', 'yyyy-MM-dd') >= unix_timestamp
class RewriteDateTimestampComparisonRule(session: SparkSession, conf: SQLConf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this kind of simplification is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, do we really need session as a parameter to create GlutenConfig? What's the issue of using GlutenConfig.getConf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an active SparkSession, the proper SQLConf associated with the thread's active session is used. So we should use spark.sessionState.conf first.
Glutenconfig.getConf actually proxies SQLConf.get, which will prioritize obtaining SQLConf from the active session, otherwise the default SQLConf will be used.

Copy link
Member

@zhztheplayer zhztheplayer Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glutenconfig.getConf actually proxies SQLConf.get

I see. Though I noticed Spark has SQLConfHelper that a lot of components (including Rule) rely on, using SQLConf.get to access session config. Is there any special consideration to avoid using SQLConf.get in Gluten? Or we just leverage a more consistent way than vanilla Spark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This PR doesn't avoid using SQLConf.get in Gluten.
In fact, SQLConf.get and spark.sessionState.conf are functionally equivalent. But spark.sessionState.conf is faster than SQLConf.get because the latter requires visit thread local. So we can see the usage scenarios:

  1. The code rely on SQLConfHelper in Spark we can't find SparkSession in context.
  2. All the places in Spark rely on spark.sessionState.conf if exists SparkSession.

@beliefer
Copy link
Contributor Author

GlutenConfig prevents the dynamically modified configurations in interactive scenarios.

@beliefer Would you like to post a code example of the interactive case you mentioned? Or probably to add a test case to guard against it?

Spark already have a lot of test cases. So I added the code example of the interactive case into the description of issue

Copy link

Run Gluten Clickhouse CI

@beliefer
Copy link
Contributor Author

@zhztheplayer @zhouyuan @FelixYBW The failure GC seems unrelated.

@zhztheplayer
Copy link
Member

@beliefer Aside of the constructor changes for GlutenConfig, I think 2 parts of code cleanups can be split and merged first. Would you want to split out the following parts from the PR?

  1. Rule constructor simplifications: E.g., RewriteDateTimestampComparisonRule(session: SparkSession, conf: SQLConf) to RewriteDateTimestampComparisonRule(conf: SQLConf) or RewriteDateTimestampComparisonRule(session: SparkSession)
  2. Naming improvements: E.g., columnarConf to glutenConf

Thanks!

Copy link

github-actions bot commented Nov 4, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 4, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 4, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 4, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Nov 5, 2024

Run Gluten Clickhouse CI

1 similar comment
@beliefer
Copy link
Contributor Author

beliefer commented Nov 5, 2024

Run Gluten Clickhouse CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLICKHOUSE CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GlutenConfig should support runtime configuration changes
2 participants