-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22791] [SQL] [SS] Redact Output of Explain #19985
Conversation
override protected def sparkConf: SparkConf = super.sparkConf | ||
.set("spark.redaction.string.regex", "file:/[\\w_]+") | ||
|
||
test("explain - redaction") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a SS example. I believe we have more such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because you want to use explainInternal
, so you write a SS test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because I found a potential security issue in SS.
Test build #84946 has finished for PR 19985 at commit
|
Test build #84949 has finished for PR 19985 at commit
|
retest this please |
* Redact the sensitive information in the given string. | ||
*/ | ||
private def withRedaction(message: => String): String = { | ||
Utils.redact(SparkSession.getActiveSession.map(_.sparkContext.conf).orNull, message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make Utils.redact
accept a regex parameter so that we can use session conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different query may need to use a different regex
, a session conf will be better and the user can change it during runtime.
|
||
val explainWithoutExtended = q.explainInternal(false) | ||
assert(explainWithoutExtended.contains(replacement)) | ||
assert(explainWithoutExtended.contains("StateStoreRestore")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add assert(!explainWithoutExtended.contains("file:/"))
to verify it does replace the correct content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
val explainWithExtended = q.explainInternal(true) | ||
assert(explainWithExtended.contains(replacement)) | ||
assert(explainWithExtended.contains("StateStoreRestore")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Looks good. My major comment is using a session conf instead. |
Test build #84968 has finished for PR 19985 at commit
|
Test build #85002 has finished for PR 19985 at commit
|
retest this please |
Test build #85011 has finished for PR 19985 at commit
|
retest this please |
cc @zsxwing |
.doc("Regex to decide which parts of strings produced by Spark contain sensitive " + | ||
"information. When this regex matches a string part, that string part is replaced by a " + | ||
"dummy value. This is currently used to redact the output of SQL explain commands. " + | ||
"When this conf is not set, the value from `spark.sql.redaction.string.regex` is used.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spark.sql.redaction.string.regex
-> spark.redaction.string.regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one nit
Test build #85077 has finished for PR 19985 at commit
|
/** | ||
* Redact the sensitive information in the given string. | ||
*/ | ||
private def withRedaction(message: => String): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> String
looks not very useful here, we need to materialize anyway when calling Utils.redact
assert(isIncluded(df.queryExecution, replacement)) | ||
|
||
assert(isIncluded(df.queryExecution, "FileScan")) | ||
assert(!isIncluded(df.queryExecution, "file:/")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what the difference between these 3 lines and https://github.com/apache/spark/pull/19985/files#diff-0c515221ed6e6eadcec71b3b9ad3a3e1R70 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
Test build #85090 has finished for PR 19985 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
When calling explain on a query, the output can contain sensitive information. We should provide an admin/user to redact such information.
Before this PR, the plan of SS is like this
After this PR, we can get the following output if users set
spark.redaction.string.regex
tofile:/[\\w_]+
How was this patch tested?
Added a test case