[SPARK-19720][CORE] Redact sensitive information from SparkSubmit console #17047

markgrover · 2017-02-24T01:54:10Z

What changes were proposed in this pull request?

This change redacts senstive information (based on spark.redaction.regex property)
from the Spark Submit console logs. Such sensitive information is already being
redacted from event logs and yarn logs, etc.

How was this patch tested?

Testing was done manually to make sure that the console logs were not printing any
sensitive information.

Here's some output from the console:

Spark properties used, including those specified through
 --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
  (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
  (spark.authenticate,false)
  (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))

System properties:
(spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
(spark.authenticate,false)
(spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))

There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future.

Running unit tests to make sure nothing else is broken by this change.

…bmit console output This change redacts senstive information (based on spark.redaction.regex property) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))

SparkQA · 2017-02-24T04:29:56Z

Test build #73385 has finished for PR 17047 at commit 000efb1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-02-25T03:27:58Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+   * @param kvs
+   * @return
+   */
+  def redact(kvs: Map[String, String]): Seq[(String, String)] = {


(Nit: I'd omit param and return if they're not filled in.)
So this is used in cases where there isn't a conf object available yet, but the argument itself has the redaction config? I was slightly worried about the parallel implementation but that would be a reasonable reason to do it.

Correct, that's exactly the use case - where there isn't a conf object available yet. I will update the Javadoc. Thanks for reviewing!

SparkQA · 2017-02-28T05:30:21Z

Test build #73554 has finished for PR 17047 at commit 7753998.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin

Look good. Can you replace "spark submit" with "core" in the title (since that's the right component)?

vanzin · 2017-02-28T18:33:16Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+  private def redact(redactionPattern: Regex, kvs: Seq[(String, String)]): Seq[(String, String)] = {
    kvs.map { kv =>
      redactionPattern.findFirstIn(kv._1)
        .map { ignore => (kv._1, REDACTION_REPLACEMENT_TEXT) }


nit: s/ignore/_/

vanzin · 2017-02-28T23:00:14Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

    .longConf
    .createWithDefault(4 * 1024 * 1024)

+  private[spark] val SECRET_REDACTION_PROPERTY = "spark.redaction.regex"


Actually, instead of this you could use SECRET_REDACTION_PATTERN.key and SECRET_REDACTION_PATTERN.defaultValue in Utils.scala.

Thanks @vanzin Updated.

vanzin · 2017-03-01T00:36:45Z

core/src/main/scala/org/apache/spark/util/Utils.scala

    kvs.map { kv =>
      redactionPattern.findFirstIn(kv._1)
-        .map { ignore => (kv._1, REDACTION_REPLACEMENT_TEXT) }
+        .map {ignore => (kv._1, REDACTION_REPLACEMENT_TEXT) }


nit: replace "ignore" with "_" (in case my previous comment wasn't clear). also missing a space after '{'.

Ah, right I misunderstood that - I took your comment as a bash + scala way to say eliminate space (partly because it's hard to understand spacing in github comments). My bad, let me fix that.

SparkQA · 2017-03-01T02:31:43Z

Test build #73625 has finished for PR 17047 at commit 4edaf67.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

markgrover · 2017-03-01T02:33:53Z

Seems unrelated.

markgrover · 2017-03-01T02:34:04Z

Jenkins, test this again, please.

vanzin · 2017-03-01T03:06:34Z

retest this please

SparkQA · 2017-03-01T03:18:09Z

Test build #73626 has finished for PR 17047 at commit d6a04b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-01T05:35:08Z

Test build #73651 has finished for PR 17047 at commit d6a04b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-03-02T18:33:29Z

Merging to master.

markgrover · 2017-03-02T19:10:44Z

Thanks @vanzin

…sole This change redacts senstive information (based on default password and secret regex) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: ``` Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` ``` System properties: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future. Running unit tests to make sure nothing else is broken by this change. Using reference from Mark Grover <mark@apache.org> Closes apache#17047 for 2.1.2 spark vesion.

…sole ## What changes were proposed in this pull request? This change redacts senstive information (based on `spark.redaction.regex` property) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. ## How was this patch tested? Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: ``` Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` ``` System properties: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future. Running unit tests to make sure nothing else is broken by this change. Author: Mark Grover <mark@apache.org> Closes apache#17047 from markgrover/master_redaction.

srowen reviewed Feb 25, 2017

View reviewed changes

Removing unnecessary javadoc

7753998

vanzin reviewed Feb 28, 2017

View reviewed changes

markgrover changed the title ~~[SPARK-19720][SPARK SUBMIT] Redact sensitive information from SparkSubmit console~~ [SPARK-19720][CORE] Redact sensitive information from SparkSubmit console Feb 28, 2017

More review feedback

4edaf67

vanzin reviewed Mar 1, 2017

View reviewed changes

Replacing ignore with _

d6a04b9

asfgit closed this in 5ae3516 Mar 2, 2017

dmvieira mentioned this pull request Jul 29, 2017

[SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive information from SparkSubmit console #18765

Closed

[SPARK-19720][CORE] Redact sensitive information from SparkSubmit console #17047

[SPARK-19720][CORE] Redact sensitive information from SparkSubmit console #17047

Uh oh!

Conversation

markgrover commented Feb 24, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 24, 2017

Uh oh!

srowen Feb 25, 2017

Choose a reason for hiding this comment

Uh oh!

markgrover Feb 28, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 28, 2017

Uh oh!

vanzin left a comment

Choose a reason for hiding this comment

Uh oh!

vanzin Feb 28, 2017

Choose a reason for hiding this comment

Uh oh!

vanzin Feb 28, 2017

Choose a reason for hiding this comment

Uh oh!

markgrover Mar 1, 2017

Choose a reason for hiding this comment

Uh oh!

vanzin Mar 1, 2017

Choose a reason for hiding this comment

Uh oh!

markgrover Mar 1, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 1, 2017

Uh oh!

markgrover commented Mar 1, 2017

Uh oh!

markgrover commented Mar 1, 2017

Uh oh!

vanzin commented Mar 1, 2017

Uh oh!

SparkQA commented Mar 1, 2017

Uh oh!

SparkQA commented Mar 1, 2017

Uh oh!

vanzin commented Mar 2, 2017

Uh oh!

markgrover commented Mar 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants