Skip to content

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Aug 14, 2021

What changes were proposed in this pull request?

This PR fixes an issue that ThriftServer doesn't recognize spark.sql.redaction.string.regex.
The problem is that sensitive information included in queries can be exposed.
thrift-password1
thrift-password2

Why are the changes needed?

Bug fix.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Ran ThriftServer, connect to it and execute CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde"); with spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')
Then, confirmed UI.

thrift-hide-password1
thrift-hide-password2

@github-actions github-actions bot added the SQL label Aug 14, 2021
@sarutak sarutak changed the title [SPARK-36400][SQL] Make ThriftServer recognize spark.sql.redaction.string.regex [SPARK-36400][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex Aug 14, 2021
@SparkQA
Copy link

SparkQA commented Aug 14, 2021

Test build #142467 has finished for PR 33743 at commit a7218c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46974/

@dnskr
Copy link

dnskr commented Aug 14, 2021

Hi @sarutak,
Thank you for the PR! Looks like it also closes SPARK-36398 issue.
Could you please check it?

@HyukjinKwon
Copy link
Member

Yeah looks like it.

@HyukjinKwon
Copy link
Member

cc @juliuszsompolski and @wangyum FYI

@sarutak sarutak changed the title [SPARK-36400][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex [SPARK-36400]SPARK-36398 [SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex Aug 15, 2021
@sarutak sarutak changed the title [SPARK-36400]SPARK-36398 [SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex Aug 15, 2021
@sarutak
Copy link
Member Author

sarutak commented Aug 15, 2021

@dnskr Thank you for letting me know. Yes, SPARK-36398 seems to be resolved by this change too.

@wangyum
Copy link
Member

wangyum commented Aug 15, 2021

Looks ok to me. Could we add a test for this change?

@sarutak
Copy link
Member Author

sarutak commented Aug 16, 2021

@wangyum Yeah it's better to have, but UISeleniumSuite in sql/hive-thriftserver is broken so we should resolve SPARK-36512 (#33741) first.

@sarutak
Copy link
Member Author

sarutak commented Aug 18, 2021

Merging to master, branch-3.2 and branch-3.1 to meet Spark 3.2.
I'll add test to UISeleniumSuite after #33741 merged.

@sarutak sarutak closed this in b914ff7 Aug 18, 2021
sarutak added a commit that referenced this pull request Aug 18, 2021
…ark.sql.redaction.string.regex

### What changes were proposed in this pull request?

This PR fixes an issue that ThriftServer doesn't recognize `spark.sql.redaction.string.regex`.
The problem is that sensitive information included in queries can be exposed.
![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png)
![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png)

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
Then, confirmed UI.

![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png)
![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png)

Closes #33743 from sarutak/thrift-redact.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(cherry picked from commit b914ff7)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
sarutak added a commit that referenced this pull request Aug 18, 2021
…ark.sql.redaction.string.regex

### What changes were proposed in this pull request?

This PR fixes an issue that ThriftServer doesn't recognize `spark.sql.redaction.string.regex`.
The problem is that sensitive information included in queries can be exposed.
![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png)
![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png)

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
Then, confirmed UI.

![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png)
![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png)

Closes #33743 from sarutak/thrift-redact.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(cherry picked from commit b914ff7)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Comment on lines 188 to +189
logInfo(s"Submitting query '$statement' with $statementId")
val redactedStatement = SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarutak it would be great to also log the redactedStatement in the logInfo one line above, otherwise the credentials needlessly leak to log4j logs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, let me open a followup PR. Thanks.

sarutak added a commit that referenced this pull request Sep 2, 2021
…ation in UI by config

### What changes were proposed in this pull request?

This PR adds a test for SPARK-36400 (#33743).

### Why are the changes needed?

SPARK-36512 (#33741) was fixed so we can add this test now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New test.

Closes #33885 from sarutak/add-reduction-test.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…ark.sql.redaction.string.regex

### What changes were proposed in this pull request?

This PR fixes an issue that ThriftServer doesn't recognize `spark.sql.redaction.string.regex`.
The problem is that sensitive information included in queries can be exposed.
![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png)
![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png)

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
Then, confirmed UI.

![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png)
![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png)

Closes apache#33743 from sarutak/thrift-redact.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(cherry picked from commit b914ff7)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants