-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-48678][CORE] Performance optimizations for SparkConf.get(ConfigEntry) #47049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48678][CORE] Performance optimizations for SparkConf.get(ConfigEntry) #47049
Conversation
core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala
Outdated
Show resolved
Hide resolved
…ry.scala Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
| (maybePrependedValue, value) match { | ||
| case (Some(prependedValue), Some(value)) => Some(s"$prependedValue$prependSeparator$value") | ||
| case (Some(prepend), None) => Some(prepend) | ||
| case (None, Some(value)) => Some(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: avoid shadowing value ? ... that would also allow us to simply return value for case 3 (and maybePrependedValue for case 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
I found that we can actually go a bit further and collapse all but the first case into a single case: if we don't match (Some(prependedValue), Some(value)) then we can just return value.orElse(prependedValue) since only one of the two options will be non-empty in that branch.
allisonwang-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Merged to master. |
What changes were proposed in this pull request?
This PR proposes two micro-optimizations for
SparkConf.get(ConfigEntry):Regex.replaceAllInfor variable substitution: if the config value does not contain the substring${then it cannot possibly contain any variables, so we can completely skip the regex evaluation in such cases.List.flattenandAbstractIterable.mkString, for the common case where a configuration does not define a prepended configuration key.Why are the changes needed?
Improve performance.
This is primarily motivated by unit testing and benchmarking scenarios but it will also slightly benefit production queries.
Spark tries to avoid excessive configuration reading in hot paths (e.g. via changes like #46979). If we do accidentally introduce such read paths, though, then this PR's optimizations will help to greatly reduce the associated perf. penalty.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Correctness should be covered by existing unit tests.
To measure performance, I did some manual benchmarking by running
followed by
10 million times in a loop.
On my laptop, the optimized code is ~7.5x higher throughput than the original.
We can also compare the before-and-after flamegraphs from a
while(true)configuration reading loop, showing a clear difference in hotspots before and after this change:Before:
After:
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Github Copilot