Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mask SQL String in the MSQTaskQueryMaker for secrets #13231

Merged
merged 13 commits into from
Nov 3, 2022

Conversation

LakshSingla
Copy link
Contributor

@LakshSingla LakshSingla commented Oct 17, 2022

Description

With MSQ submitting the tasks containing SQL which might have sensitive keys like AWS's secrets in them, they can get logged in the file if the SQL contains them.
While this is not a secure way to specify the credentials since they might get logged & reported at multiple places, this PR aims to mask these sensitive keys in the log files using a simple regex search and replace.

For GCS and Azure input sources, the credentials cannot be specified in the InputSource itself therefore we donot need to search and mask their keys in the SQL.


Key changed/added classes in this PR
  • MSQTaskQueryMaker
  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I agree with the approach here. It seems an overkill to do a complex regex replace just to not log some specific fields.

I think the better thing to do is to just continue logging the secrets. This would be a good reason to encourage users to provide secrets using a EnvironmentVariableDynamicConfigProvider or similar.

If we do however feel the need to this, we should look for a simpler approach. For example, just ellipsizing the string starting at the first position where a sensitive key is encountered (using String.indexOf or something). The key motivation being to encourage users to adopt preferred techniques, not make it easier for them to use less preferred ones.

@LakshSingla
Copy link
Contributor Author

LakshSingla commented Oct 18, 2022

I am not sure I agree with the approach here. It seems an overkill to do a complex regex replace just to not log some specific fields.

Yeah, but since the SQL was a string, there seemed to be no other alternative, esp. since druid planner isn't available in the Peons.

I think the better thing to do is to just continue logging the secrets. This would be a good reason to encourage users to provide secrets using a EnvironmentVariableDynamicConfigProvider or similar.

We should then update the documentation since the method here only masks it in the logs in the ExecutorLifecycle class. Any deserialization of the controller's task spec would reveal the query.

If we do however feel the need to this, we should look for a simpler approach

All of the approaches (that I was able to think off) would be based on a text search and replace, so in my opinion, we can choose any one of them. Regarding the ellipses approach suggested, while the advantage is that it is simpler (so less room for error in the logic), the downside is that it will mask the rest of the query, which might be helpful while debugging.

@cryptoe
Copy link
Contributor

cryptoe commented Oct 19, 2022

IMHO the current approach seems fine as most of the companies feed the logs to a log aggregator, like Splunk, and have alerts set up for leaking secrets.

@@ -282,6 +299,23 @@ private static Map<String, ColumnType> buildAggregationIntermediateTypeMap(final
return retVal;
}

@VisibleForTesting
static String maskSensitiveJsonKeys(String taskJson)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a new util class for this method in org.apache.sql.msq.util, make it public, and then it should be easy enough to test as well.

@cryptoe
Copy link
Contributor

cryptoe commented Oct 19, 2022

Nit: Please update the title of the PR as well.

@LakshSingla LakshSingla changed the title Mask SQL in ExecutorLifecycle logs for AWS Credentials Mask SQL String in the MSQTaskQueryMaker for secrets Oct 19, 2022
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @LakshSingla
+1 non-binding

@LakshSingla
Copy link
Contributor Author

CI failures seem unrelated

Copy link
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

public void maskSensitiveJsonKeys()
{

String sql1 = "\"REPLACE INTO table OVERWRITE ALL\\nWITH ext AS (SELECT *\\nFROM TABLE(\\n EXTERN(\\n '{\\\"type\\\":\\\"s3\\\",\\\"prefixes\\\":[\\\"s3://prefix\\\"],\\\"properties\\\":{\\\"accessKeyId\\\":{\\\"type\\\":\\\"default\\\",\\\"password\\\":\\\"secret_pass\\\"},\\\"secretAccessKey\\\":{\\\"type\\\":\\\"default\\\",\\\"password\\\":\\\"secret_pass\\\"}}}',\\n '{\\\"type\\\":\\\"json\\\"}',\\n '[{\\\"name\\\":\\\"time\\\",\\\"type\\\":\\\"string\\\"},{\\\"name\\\":\\\"name\\\",\\\"type\\\":\\\"string\\\"}]'\\n )\\n))\\nSELECT\\n TIME_PARSE(\\\"time\\\") AS __time,\\n name,\\n country FROM ext\\nPARTITIONED BY DAY\"";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : wrapping these strings over would be easier to understand (the SQL and the assertions

@cryptoe cryptoe merged commit ccc55ef into apache:master Nov 3, 2022
@cryptoe
Copy link
Contributor

cryptoe commented Nov 3, 2022

Thanks, @LakshSingla for the securing druid even more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants