-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17783] [SQL] Hide Credentials in CREATE and DESC FORMATTED/EXTENDED a PERSISTENT/TEMP Table for JDBC #15358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #66370 has finished for PR 15358 at commit
|
|
Can you also put after the fix behavior in the description? |
|
|
||
| override def toString: String = { | ||
| val maskedProperties = properties.map { | ||
| case (password, _) if password.toLowerCase == "password" => (password, "###") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should rename the variable "password" to just key. For a moment I thought that was the password. Same thing with url.
Also in the case of URL, do we really want to hide the URL completely? Can't we just hide the password?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right.
At the beginning, I also thought maybe we just need to hide the password value instead of hiding the whole URL. Later, when I read another PR: #10452, I thought maybe we can hide the whole url. After rethinking it, users might need to see the other field values. Will follow your suggestions to hide password values only. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Oracle, the specification of password and user in url does not follow the same pattern we expect. Below is an example:
jdbc:oracle:thin:scott/tiger@localhost:1521:orcl
The reference link: http://docs.oracle.com/cd/B28359_01/java.111/b31224/urls.htm
So far, the dialect does not help. Let me know whether we should add more code changes for utilizing dialects. Thanks! @rxin
| * for JDBC data sources. | ||
| */ | ||
| def maskCredentials(options: Map[String, String]): Map[String, String] = { | ||
| options.map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should consolidate this code path with the one above. Otherwise the two will diverge over time. Perhaps add them to some utils class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! I did not find a good place to combine both. Will add it into util class
|
@rxin Sorry, I did not finish the PR description last night. The connection was broken in the train. Will fix it soon. |
|
@gatorsmile do you have time to get this ready for 2.1? |
1 similar comment
|
@gatorsmile do you have time to get this ready for 2.1? |
|
Let me finish this now. Thanks! |
|
Test build #68754 has started for PR 15358 at commit |
|
The PR description is updated. |
|
retest this please |
|
Test build #68791 has finished for PR 15358 at commit
|
|
cc @rxin @hvanhovell |
|
retest this please |
|
Test build #69115 has started for PR 15358 at commit |
|
retest this please |
|
Test build #69140 has finished for PR 15358 at commit
|
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor things, otherwise LGTM
| } | ||
| val serdePropsToString = CatalogUtils.maskCredentials(properties) match { | ||
| case props if props.isEmpty => "" | ||
| case props => s"Properties: " + props.map(p => p._1 + "=" + p._2).mkString("[", ", ", "]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: string interpolation not needed.
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ?
|
|
||
| override def argString: String = { | ||
| s"[tableIdent:$tableIdent " + | ||
| userSpecifiedSchema.getOrElse("") + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Thanks! It needs a space:
ExecutedCommand
+- CreateTempViewUsing [tableIdent:`jsonTable` StructType(StructField(a,IntegerType,true), StructField(b,StringType,true))replace:true provider:org.apache.spark.sql.json.DefaultSource Map(path -> /private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/spark-88202701-92c6-44f2-9945-7f388551729b)
|
Test build #69176 has finished for PR 15358 at commit
|
|
LGTM. Merging to master/2.1. |
…NDED a PERSISTENT/TEMP Table for JDBC ### What changes were proposed in this pull request? We should never expose the Credentials in the EXPLAIN and DESC FORMATTED/EXTENDED command. However, below commands exposed the credentials. In the related PR: #10452 > URL patterns to specify credential seems to be vary between different databases. Thus, we hide the whole `url` value if it contains the keyword `password`. We also hide the `password` property. Before the fix, the command outputs look like: ``` SQL CREATE TABLE tab1 USING org.apache.spark.sql.jdbc OPTIONS ( url 'jdbc:h2:mem:testdb0;user=testUser;password=testPass', dbtable 'TEST.PEOPLE', user 'testUser', password '$password') DESC FORMATTED tab1 DESC EXTENDED tab1 ``` Before the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |jdbc:h2:mem:testdb0;user=testUser;password=testPass | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |testPass | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass]))| | ``` After the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |### | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |### | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###]))| | ``` ### How was this patch tested? Added test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #15358 from gatorsmile/maskCredentials. (cherry picked from commit 9f273c5) Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
|
@gatorsmile I cannot merge this into branch 2.0. Could you open a PR if you feel that we should have this in 2.0? |
|
Will do it soon. Thanks! |
…FORMATTED/EXTENDED a PERSISTENT/TEMP Table for JDBC ### What changes were proposed in this pull request? This PR is to backport #15358 to Spark 2.0. ------ We should never expose the Credentials in the EXPLAIN and DESC FORMATTED/EXTENDED command. However, below commands exposed the credentials. In the related PR: #10452 > URL patterns to specify credential seems to be vary between different databases. Thus, we hide the whole `url` value if it contains the keyword `password`. We also hide the `password` property. Before the fix, the command outputs look like: ``` SQL CREATE TABLE tab1 USING org.apache.spark.sql.jdbc OPTIONS ( url 'jdbc:h2:mem:testdb0;user=testUser;password=testPass', dbtable 'TEST.PEOPLE', user 'testUser', password '$password') DESC FORMATTED tab1 DESC EXTENDED tab1 ``` Before the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |jdbc:h2:mem:testdb0;user=testUser;password=testPass | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |testPass | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass]))| | ``` After the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |### | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |### | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###]))| | ``` ### How was this patch tested? Added test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #16047 from gatorsmile/backPortSPARK-17783.
…NDED a PERSISTENT/TEMP Table for JDBC ### What changes were proposed in this pull request? We should never expose the Credentials in the EXPLAIN and DESC FORMATTED/EXTENDED command. However, below commands exposed the credentials. In the related PR: apache#10452 > URL patterns to specify credential seems to be vary between different databases. Thus, we hide the whole `url` value if it contains the keyword `password`. We also hide the `password` property. Before the fix, the command outputs look like: ``` SQL CREATE TABLE tab1 USING org.apache.spark.sql.jdbc OPTIONS ( url 'jdbc:h2:mem:testdb0;user=testUser;password=testPass', dbtable 'TEST.PEOPLE', user 'testUser', password '$password') DESC FORMATTED tab1 DESC EXTENDED tab1 ``` Before the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |jdbc:h2:mem:testdb0;user=testUser;password=testPass | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |testPass | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass]))| | ``` After the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |### | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |### | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###]))| | ``` ### How was this patch tested? Added test cases Author: gatorsmile <gatorsmile@gmail.com> Closes apache#15358 from gatorsmile/maskCredentials.
…NDED a PERSISTENT/TEMP Table for JDBC ### What changes were proposed in this pull request? We should never expose the Credentials in the EXPLAIN and DESC FORMATTED/EXTENDED command. However, below commands exposed the credentials. In the related PR: apache#10452 > URL patterns to specify credential seems to be vary between different databases. Thus, we hide the whole `url` value if it contains the keyword `password`. We also hide the `password` property. Before the fix, the command outputs look like: ``` SQL CREATE TABLE tab1 USING org.apache.spark.sql.jdbc OPTIONS ( url 'jdbc:h2:mem:testdb0;user=testUser;password=testPass', dbtable 'TEST.PEOPLE', user 'testUser', password '$password') DESC FORMATTED tab1 DESC EXTENDED tab1 ``` Before the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |jdbc:h2:mem:testdb0;user=testUser;password=testPass | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |testPass | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass]))| | ``` After the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |### | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |### | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###]))| | ``` ### How was this patch tested? Added test cases Author: gatorsmile <gatorsmile@gmail.com> Closes apache#15358 from gatorsmile/maskCredentials.
What changes were proposed in this pull request?
We should never expose the Credentials in the EXPLAIN and DESC FORMATTED/EXTENDED command. However, below commands exposed the credentials.
In the related PR: #10452
Thus, we hide the whole
urlvalue if it contains the keywordpassword. We also hide thepasswordproperty.Before the fix, the command outputs look like:
Before the fix,
DESC FORMATTEDDESC EXTENDEDAfter the fix,
DESC FORMATTEDDESC EXTENDEDHow was this patch tested?
Added test cases