-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask SQL String in the MSQTaskQueryMaker for secrets #13231
Conversation
indexing-service/src/main/java/org/apache/druid/indexing/worker/executor/ExecutorLifecycle.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I agree with the approach here. It seems an overkill to do a complex regex replace just to not log some specific fields.
I think the better thing to do is to just continue logging the secrets. This would be a good reason to encourage users to provide secrets using a EnvironmentVariableDynamicConfigProvider
or similar.
If we do however feel the need to this, we should look for a simpler approach. For example, just ellipsizing the string starting at the first position where a sensitive key is encountered (using String.indexOf
or something). The key motivation being to encourage users to adopt preferred techniques, not make it easier for them to use less preferred ones.
Yeah, but since the SQL was a string, there seemed to be no other alternative, esp. since druid planner isn't available in the Peons.
We should then update the documentation since the method here only masks it in the logs in the ExecutorLifecycle class. Any deserialization of the controller's task spec would reveal the query.
All of the approaches (that I was able to think off) would be based on a text search and replace, so in my opinion, we can choose any one of them. Regarding the ellipses approach suggested, while the advantage is that it is simpler (so less room for error in the logic), the downside is that it will mask the rest of the query, which might be helpful while debugging. |
IMHO the current approach seems fine as most of the companies feed the logs to a log aggregator, like Splunk, and have alerts set up for leaking secrets. |
@@ -282,6 +299,23 @@ private static Map<String, ColumnType> buildAggregationIntermediateTypeMap(final | |||
return retVal; | |||
} | |||
|
|||
@VisibleForTesting | |||
static String maskSensitiveJsonKeys(String taskJson) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a new util class for this method in org.apache.sql.msq.util
, make it public, and then it should be easy enough to test as well.
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskQueryMaker.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskQueryMaker.java
Outdated
Show resolved
Hide resolved
Nit: Please update the title of the PR as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @LakshSingla
+1 non-binding
CI failures seem unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
public void maskSensitiveJsonKeys() | ||
{ | ||
|
||
String sql1 = "\"REPLACE INTO table OVERWRITE ALL\\nWITH ext AS (SELECT *\\nFROM TABLE(\\n EXTERN(\\n '{\\\"type\\\":\\\"s3\\\",\\\"prefixes\\\":[\\\"s3://prefix\\\"],\\\"properties\\\":{\\\"accessKeyId\\\":{\\\"type\\\":\\\"default\\\",\\\"password\\\":\\\"secret_pass\\\"},\\\"secretAccessKey\\\":{\\\"type\\\":\\\"default\\\",\\\"password\\\":\\\"secret_pass\\\"}}}',\\n '{\\\"type\\\":\\\"json\\\"}',\\n '[{\\\"name\\\":\\\"time\\\",\\\"type\\\":\\\"string\\\"},{\\\"name\\\":\\\"name\\\",\\\"type\\\":\\\"string\\\"}]'\\n )\\n))\\nSELECT\\n TIME_PARSE(\\\"time\\\") AS __time,\\n name,\\n country FROM ext\\nPARTITIONED BY DAY\""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : wrapping these strings over would be easier to understand (the SQL and the assertions
Thanks, @LakshSingla for the securing druid even more. |
Description
With MSQ submitting the tasks containing SQL which might have sensitive keys like AWS's secrets in them, they can get logged in the file if the SQL contains them.
While this is not a secure way to specify the credentials since they might get logged & reported at multiple places, this PR aims to mask these sensitive keys in the log files using a simple regex search and replace.
For GCS and Azure input sources, the credentials cannot be specified in the InputSource itself therefore we donot need to search and mask their keys in the SQL.
Key changed/added classes in this PR
MSQTaskQueryMaker