Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(instrumentation-aws-sdk): remove un-sanitised db.statement span attribute from DynamoDB spans #1748

Merged
merged 8 commits into from
Dec 7, 2023

Conversation

ramesius
Copy link
Contributor

@ramesius ramesius commented Oct 20, 2023

Which problem is this PR solving?

DynamoDB spans include the db.statement span attribute which can include un-sanitised sensitive data.

Short description of the changes

Remove the db.statement attribute from DynamoDB spans due to not being sanitised of sensitive data.

According to the semantic conventions spec:

(db.statement) Should be collected by default only if there is sanitisation that excludes sensitive information.

I have updated the tests to explicitly assert that db.statement remains undefined.

Contributes to #1552

@ramesius ramesius requested a review from a team October 20, 2023 10:40
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 20, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@andymac4182
Copy link

andymac4182 commented Oct 23, 2023

@carolabadeer @blumamir Are you able to review please? This is impacting on our services. It is pushing private database values to traces.

@srprash
Copy link

srprash commented Oct 30, 2023

It is indeed not a good practice to record raw query as db.statement span attribute, and thanks for raising this PR.
I propose that we follow a similar approach as the redis instrumentation, where a custom function can be provided to serialize the db.statement value.

By default, this instrumentation should either not record db.statement at all, or record only the unsensitive part of the queries.

@ramesius Would you be able to add the feature in this PR?

@ramesius
Copy link
Contributor Author

@srprash Yeah I can take care of that in this PR.
The naming of the configuration will match.

I will raise the change with a default implementation that redacts the whole db.statement for now.

@ramesius
Copy link
Contributor Author

ramesius commented Nov 1, 2023

This now adds the configuration dynamoDBStatementSerializer to customise the statement serialization.

@ramesius
Copy link
Contributor Author

ramesius commented Nov 1, 2023

@srprash Good for another review when you are ready.

@ramesius
Copy link
Contributor Author

ramesius commented Nov 6, 2023

@srprash Do you know when you will be able to review this again?

Copy link
Member

@blumamir blumamir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.
Added few minor comments.

Sorry for taking long time to review 🙏

const spanAttributes = {
[SemanticAttributes.DB_SYSTEM]: DbSystemValues.DYNAMODB,
[SemanticAttributes.DB_NAME]: normalizedRequest.commandInput?.TableName,
[SemanticAttributes.DB_OPERATION]: operation,
[SemanticAttributes.DB_STATEMENT]: JSON.stringify(
[SemanticAttributes.DB_STATEMENT]: dbStatementSerializer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we record this attribute if config.dynamoDBStatementSerializer is undefined or if the config function returned undefined?

const spanAttributes = {
[SemanticAttributes.DB_SYSTEM]: DbSystemValues.DYNAMODB,
[SemanticAttributes.DB_NAME]: normalizedRequest.commandInput?.TableName,
[SemanticAttributes.DB_OPERATION]: operation,
[SemanticAttributes.DB_STATEMENT]: JSON.stringify(
[SemanticAttributes.DB_STATEMENT]: dbStatementSerializer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function is supplied by the user, the instrumentation should protect. itself from any exception thrown from this function and prevent the patch code from crashing in this case.

You can see many examples in other instrumentations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most definitely, I will add this in 👍🏻

expect(
JSON.parse(attrs[SemanticAttributes.DB_STATEMENT] as string)
).toEqual(params);
expect(attrs).not.toHaveProperty(SemanticAttributes.DB_STATEMENT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we also assert it's expected value and not only it's present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something missed, thanks for picking it up, will sort that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, since this is testing default behaviour and the upcoming change to not set the property if the serializer is not configured this can probably stay as is?

Comment on lines 702 to 706
const dynamoDBStatementSerializer: AwsSdkDynamoDBStatementSerializer = (
_command: CommandInput
): string => {
return SERIALIZED_DB_STATEMENT;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a test where this serializer throw an exception to verify the instrumentation do not crash in this case

);
});

it('should properly execute the db statement serializer for CreateTable operation', done => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the gain from testing this feature with few operations? is there anything in the code that should behave differently based on the operation that determines how to serializer is executed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a little excessive in hindsight.

Thinking about https://github.com/open-telemetry/opentelemetry-js-contrib/pull/1748/files/52d09b71cc206a6a40a77cfcf7ab69f7809805d2#r1384667968 multiple tests that assert the operation would probably be the correct direction.

Comment on lines 67 to 69
export type AwsSdkDynamoDBStatementSerializer = (
commandInput: CommandInput
) => string | undefined;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the serializer should also get the operation to be able to properly sanitize. I guess the sanitization logic can probably work differently for different operations?

This can also be added in the future if someone brings up the need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion. As pointed out elsewhere, the sanitize function is supplied as part of the public API so I think including operation now just avoids breaking changes later 👍🏻

Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Merging #1748 (a61251c) into main (86a21d7) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1748   +/-   ##
=======================================
  Coverage   91.44%   91.45%           
=======================================
  Files         144      144           
  Lines        7400     7406    +6     
  Branches     1481     1483    +2     
=======================================
+ Hits         6767     6773    +6     
  Misses        633      633           
Files Coverage Δ
...entelemetry-instrumentation-aws-sdk/src/aws-sdk.ts 97.48% <100.00%> (ø)
...ntation-aws-sdk/src/services/ServicesExtensions.ts 100.00% <100.00%> (ø)
...y-instrumentation-aws-sdk/src/services/dynamodb.ts 100.00% <100.00%> (ø)
...try-instrumentation-aws-sdk/src/services/lambda.ts 97.77% <100.00%> (ø)
...emetry-instrumentation-aws-sdk/src/services/sns.ts 94.11% <100.00%> (ø)
...emetry-instrumentation-aws-sdk/src/services/sqs.ts 100.00% <100.00%> (ø)

@ramesius
Copy link
Contributor Author

I think I have addressed all the feedback, ready for another round 👍🏻

@ramesius
Copy link
Contributor Author

@srprash @blumamir Would you mind reviewing this again to hopefully get this merged soon? 🤞🏻

@pichlermarc pichlermarc changed the title fix: remove un-sanitised db.statement span attribute from DynamoDB spans fix(instrumentation-aws-sdk): remove un-sanitised db.statement span attribute from DynamoDB spans Nov 27, 2023
@blumamir blumamir merged commit cdbb29f into open-telemetry:main Dec 7, 2023
@dyladan dyladan mentioned this pull request Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants