Skip to content

Conversation

@markgrover
Copy link
Member

What changes were proposed in this pull request?

This change redacts senstive information (based on spark.redaction.regex property)
from the Spark Submit console logs. Such sensitive information is already being
redacted from event logs and yarn logs, etc.

How was this patch tested?

Testing was done manually to make sure that the console logs were not printing any
sensitive information.

Here's some output from the console:

Spark properties used, including those specified through
 --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))

There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future.

Running unit tests to make sure nothing else is broken by this change.

…bmit console output

This change redacts senstive information (based on spark.redaction.regex property)
from the Spark Submit console logs. Such sensitive information is already being
redacted from event logs and yarn logs, etc.

Testing was done manually to make sure that the console logs were not printing any
sensitive information.
Here's some output from the console:
Spark properties used, including those specified through
 --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73385 has finished for PR 17047 at commit 000efb1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* @param kvs
* @return
*/
def redact(kvs: Map[String, String]): Seq[(String, String)] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Nit: I'd omit param and return if they're not filled in.)
So this is used in cases where there isn't a conf object available yet, but the argument itself has the redaction config? I was slightly worried about the parallel implementation but that would be a reasonable reason to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, that's exactly the use case - where there isn't a conf object available yet. I will update the Javadoc. Thanks for reviewing!

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73554 has finished for PR 17047 at commit 7753998.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good. Can you replace "spark submit" with "core" in the title (since that's the right component)?

private def redact(redactionPattern: Regex, kvs: Seq[(String, String)]): Seq[(String, String)] = {
kvs.map { kv =>
redactionPattern.findFirstIn(kv._1)
.map { ignore => (kv._1, REDACTION_REPLACEMENT_TEXT) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/ignore/_/

.longConf
.createWithDefault(4 * 1024 * 1024)

private[spark] val SECRET_REDACTION_PROPERTY = "spark.redaction.regex"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, instead of this you could use SECRET_REDACTION_PATTERN.key and SECRET_REDACTION_PATTERN.defaultValue in Utils.scala.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vanzin Updated.

@markgrover markgrover changed the title [SPARK-19720][SPARK SUBMIT] Redact sensitive information from SparkSubmit console [SPARK-19720][CORE] Redact sensitive information from SparkSubmit console Feb 28, 2017
kvs.map { kv =>
redactionPattern.findFirstIn(kv._1)
.map { ignore => (kv._1, REDACTION_REPLACEMENT_TEXT) }
.map {ignore => (kv._1, REDACTION_REPLACEMENT_TEXT) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: replace "ignore" with "_" (in case my previous comment wasn't clear). also missing a space after '{'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right I misunderstood that - I took your comment as a bash + scala way to say eliminate space (partly because it's hard to understand spacing in github comments). My bad, let me fix that.

@SparkQA
Copy link

SparkQA commented Mar 1, 2017

Test build #73625 has finished for PR 17047 at commit 4edaf67.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@markgrover
Copy link
Member Author

Seems unrelated.

@markgrover
Copy link
Member Author

Jenkins, test this again, please.

@vanzin
Copy link
Contributor

vanzin commented Mar 1, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Mar 1, 2017

Test build #73626 has finished for PR 17047 at commit d6a04b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2017

Test build #73651 has finished for PR 17047 at commit d6a04b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Mar 2, 2017

Merging to master.

@asfgit asfgit closed this in 5ae3516 Mar 2, 2017
@markgrover
Copy link
Member Author

Thanks @vanzin

dmvieira pushed a commit to dmvieira/spark that referenced this pull request Jul 29, 2017
…sole

This change redacts senstive information (based on default password and secret regex)
from the Spark Submit console logs. Such sensitive information is already being
redacted from event logs and yarn logs, etc.

Testing was done manually to make sure that the console logs were not printing any
sensitive information.

Here's some output from the console:

```
Spark properties used, including those specified through
 --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```

```
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```
There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future.

Running unit tests to make sure nothing else is broken by this change.

Using reference from Mark Grover <mark@apache.org>

Closes apache#17047 for 2.1.2 spark vesion.
dmvieira pushed a commit to dmvieira/spark that referenced this pull request Aug 1, 2017
…sole

## What changes were proposed in this pull request?
This change redacts senstive information (based on `spark.redaction.regex` property)
from the Spark Submit console logs. Such sensitive information is already being
redacted from event logs and yarn logs, etc.

## How was this patch tested?
Testing was done manually to make sure that the console logs were not printing any
sensitive information.

Here's some output from the console:

```
Spark properties used, including those specified through
 --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```

```
System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
```
There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future.

Running unit tests to make sure nothing else is broken by this change.

Author: Mark Grover <mark@apache.org>

Closes apache#17047 from markgrover/master_redaction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants