-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-7690][CORE][CH][VL] GlutenConfig should support runtime configuration changes #7691
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
@beliefer Would you like to post a code example of the interactive case you mentioned? Or probably to add a test case to guard against it? |
@@ -37,7 +37,7 @@ import org.apache.spark.unsafe.types.UTF8String | |||
// This rule try to make the filter condition into integer comparison, which is more efficient. | |||
// The above example will be rewritten into | |||
// select * from table where to_unixtime('2023-11-02', 'yyyy-MM-dd') >= unix_timestamp | |||
class RewriteDateTimestampComparisonRule(session: SparkSession, conf: SQLConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this kind of simplification is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, do we really need session
as a parameter to create GlutenConfig
? What's the issue of using GlutenConfig.getConf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an active SparkSession
, the proper SQLConf
associated with the thread's active session is used. So we should use spark.sessionState.conf
first.
Glutenconfig.getConf
actually proxies SQLConf.get
, which will prioritize obtaining SQLConf from the active session, otherwise the default SQLConf will be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glutenconfig.getConf actually proxies SQLConf.get
I see. Though I noticed Spark has SQLConfHelper
that a lot of components (including Rule
) rely on, using SQLConf.get
to access session config. Is there any special consideration to avoid using SQLConf.get
in Gluten? Or we just leverage a more consistent way than vanilla Spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This PR doesn't avoid using SQLConf.get
in Gluten.
In fact, SQLConf.get
and spark.sessionState.conf
are functionally equivalent. But spark.sessionState.conf
is faster than SQLConf.get
because the latter requires visit thread local. So we can see the usage scenarios:
- The code rely on
SQLConfHelper
in Spark we can't findSparkSession
in context. - All the places in Spark rely on
spark.sessionState.conf
if existsSparkSession
.
Spark already have a lot of test cases. So I added the code example of the interactive case into the description of issue |
Run Gluten Clickhouse CI |
@zhztheplayer @zhouyuan @FelixYBW The failure GC seems unrelated. |
@beliefer Aside of the constructor changes for
Thanks! |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
What changes were proposed in this pull request?
This PR proposes to improve
GlutenConfig
so as it supports runtime configuration changes.This PR also makes below improvements.
SparkSession
withspark
is usually used by Spark.val scanOnly: Boolean = glutenConf.enableScanOnly
.(Fixes: #7690)
How was this patch tested?
integration tests