Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serverless] Add DynamoDB Span Pointers #4912

Merged
merged 35 commits into from
Dec 18, 2024

Conversation

nhulston
Copy link
Contributor

@nhulston nhulston commented Nov 19, 2024

Large PR, but 1000+ lines are tests. Only ~250 lines are actual code from this PR. All these tests are necessary to test all the different possible cases. Please review commit by commit to make the review process easier.

What does this PR do?

Adds span pointers in DynamoDB for putItem, updateItem, deleteItem, transactWriteItems, and batchWriteItem requests.

For more info on the details of how span pointers work for each of these operations, see https://github.com/DataDog/dd-span-pointer-rules/blob/main/AWS/DynamoDB/Item/README.md.

Span pointers are similar to Span Links, but for cases when it is impossible to pass the Trace ID and Span ID between the spans that need to be linked.

When the calculated hashes for the upstream and downstream lambdas match, the Datadog backend will automatically link the two traces together.

Screenshot 2024-11-19 at 4 41 31 PM When clicking on the linked span, a new tab opens linking to the downstream Lambda function that was triggered by this Dynamo object update.

Motivation

This feature already exists in Python, and I'm working on adding it to all other runtimes (Node, .NET, Java, Golang).

Additional Notes

  • This handles the 'upstream' case, for Lambdas that make Dynmo requests. I will have another PR in datadog-lambda-js to handle the 'downstream' case, for Lambdas that are triggered by Dynamo updates.
  • My changes in packages/dd-trace/src/util.js are located there because it is a public API, since datadog-lambda-js will reuse the exported fields.
  • UpdateItem and DeleteItem require the primary key(s) as parameters, which makes our lives much easier. However, for PutItem, we have no way of determining which keys are the primary keys, so the user must set an environment variable specifying the primary key(s).
  • I add the telemetry key to Datadog backend in https://github.com/DataDog/dd-go/pull/161069

Testing Guidelines

Easy: Checkout this span, enable the feature flag, and you will see that it's pointing to the downstream Lambda.

Thorough testing: Run this Lambda function with the event payload

{
  "shouldUseV2": false,
  "twoKeyTable": false,
  "shouldPutItem": false,
  "shouldUpdateItem": false,
  "shouldDeleteItem": false,
  "shouldBatchWriteItems": false,
  "shouldTransactWriteItems": false
}

and change one of the bools to true. Also, try both with AWS SDK v2 and v3, and try with twoKeyTable vs oneKeyTable. Enable the span pointers feature flag, and check Datadog to ensure that the spans are properly linked.

I also added tests:

# Unit tests
yarn test packages/dd-trace/test/util.spec.js

# Integration tests
docker run -d -p 4566:4566 localstack/localstack
PLUGINS=aws-sdk SERVICES=localstack DD_DATA_STREAMS_ENABLED=true yarn test:plugins:ci 

Copy link

github-actions bot commented Nov 19, 2024

Overall package size

Self size: 8.38 MB
Deduped: 94.85 MB
No deduping: 95.36 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.3.0 | 29.43 MB | 29.43 MB | | @datadog/native-appsec | 8.3.0 | 19.37 MB | 19.38 MB | | @datadog/native-iast-taint-tracking | 3.2.0 | 13.9 MB | 13.91 MB | | @datadog/pprof | 5.4.1 | 9.76 MB | 10.13 MB | | protobufjs | 7.2.5 | 2.77 MB | 5.16 MB | | @datadog/native-iast-rewriter | 2.6.1 | 2.59 MB | 2.73 MB | | @opentelemetry/core | 1.14.0 | 872.87 kB | 1.47 MB | | @datadog/native-metrics | 3.1.0 | 1.06 MB | 1.46 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.11.2 | 112.74 kB | 826.22 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | lru-cache | 7.18.3 | 133.92 kB | 133.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.0 | 109.9 kB | 109.9 kB | | semver | 7.6.3 | 95.82 kB | 95.82 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 5.3.1 | 51.46 kB | 51.46 kB | | shell-quote | 1.8.1 | 44.96 kB | 44.96 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.3.1 | 25.21 kB | 25.21 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | dc-polyfill | 0.1.4 | 23.1 kB | 23.1 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@nhulston nhulston force-pushed the nicholas.hulston/dynamodb-span-pointers branch from 32c17f9 to 9db6a98 Compare November 19, 2024 21:47
@pr-commenter
Copy link

pr-commenter bot commented Nov 19, 2024

Benchmarks

Benchmark execution time: 2024-12-18 16:39:19

Comparing candidate commit bcc78fa in PR branch nicholas.hulston/dynamodb-span-pointers with baseline commit 6cda847 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 777 metrics, 20 unstable metrics.

scenario:plugin-graphql-with-async-hooks-20

  • 🟩 max_rss_usage [-112.094MB; -107.470MB] or [-17.431%; -16.712%]

module.exports = {
isTrue,
isFalse,
isError,
globMatch,
calculateDDBasePath,
hasOwn,
generatePointerHash
generatePointerHash,
encodeValue,
Copy link
Contributor Author

@nhulston nhulston Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to export encodeValue for unit tests, but it's not used anywhere else; it's just a helper function for extractPrimaryKeys. Is this fine to export for testing, or is there something better practice I can do?

Copy link

codecov bot commented Nov 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.80%. Comparing base (823cfd4) to head (ed337d0).
Report is 6 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #4912       +/-   ##
===========================================
+ Coverage   65.05%   88.80%   +23.75%     
===========================================
  Files         304      146      -158     
  Lines       13950     5308     -8642     
===========================================
- Hits         9075     4714     -4361     
+ Misses       4875      594     -4281     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nhulston nhulston force-pushed the nicholas.hulston/dynamodb-span-pointers branch 2 times, most recently from 9caba79 to a65dd20 Compare November 22, 2024 22:44
@nhulston nhulston force-pushed the nicholas.hulston/dynamodb-span-pointers branch from a65dd20 to 90bfefa Compare December 3, 2024 18:41
* { UserTable: new Set(['userId', 'timestamp']) }
* )
*/
static calculatePutItemHash (tableName, item, primaryKeyConfig) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateItem and DeleteItem require the primary key(s) as parameters, which makes our lives much easier. However, for PutItem, we have no way of determining which keys are the primary keys, so the user must set an environment variable specifying the primary key(s).

@nhulston nhulston marked this pull request as ready for review December 3, 2024 18:53
@nhulston nhulston requested review from a team as code owners December 3, 2024 18:53
@khanayan123 khanayan123 self-requested a review December 3, 2024 19:03
Copy link

@apiarian-datadog apiarian-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! i'm a little worried about the error handling flows.

also, can we add copies at least a couple of the "known hash" tests from either the rules repo or from the python tests? until we have system tests and self-monitoring covering all this stuff, i'd feel more confident that our hashing is consistent that way.

log.debug('Unable to calculate hash because missing parameters')
return
}
const keyValues = extractPrimaryKeys(keys, keys)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we're treating the keys as a set and as a data object at the same time? this is a little bit surprising to read. maybe we create a helper for this sort of situation to make it less surprising?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a little confusing. We could do something like this:

const keyNamesSet = new Set(Object.keys(keysObject));
const keyValues = extractPrimaryKeys(keyNamesSet, keys)

which is definitely more readable. However, we're sacrificing a little bit of memory and runtime to do so, so I'm not sure this change is worth it. WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i generally lean towards readability over raw performance, but perhaps the tracer team has a different approach here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also bury that logic (and not even do the set creation) under a new function name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel like performance is more important than readability, since this will be ran on every Lambda execution that updates S3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with readability in this case, it is a little confusing. I think we're missing an additional helper function prior to calling extractPrimaryKeys(keys, keys). Specifically, a function that prepares the key set to be passed into extractPrimaryKeys—rather than using the same object for both parameters WDYT?

packages/dd-trace/src/util.js Outdated Show resolved Hide resolved
packages/dd-trace/src/util.js Outdated Show resolved Hide resolved
packages/dd-trace/src/util.js Outdated Show resolved Hide resolved
packages/dd-trace/test/util.spec.js Outdated Show resolved Hide resolved
packages/dd-trace/test/util.spec.js Outdated Show resolved Hide resolved
packages/datadog-plugin-aws-sdk/test/dynamodb.spec.js Outdated Show resolved Hide resolved
packages/datadog-plugin-aws-sdk/test/dynamodb.spec.js Outdated Show resolved Hide resolved
packages/datadog-plugin-aws-sdk/test/dynamodb.spec.js Outdated Show resolved Hide resolved
@nhulston nhulston force-pushed the nicholas.hulston/dynamodb-span-pointers branch from 3ff9603 to f5649ec Compare December 9, 2024 20:03
@nhulston nhulston force-pushed the nicholas.hulston/dynamodb-span-pointers branch from 0b2058c to 0d33cf2 Compare December 9, 2024 20:52
@bengl bengl force-pushed the nicholas.hulston/dynamodb-span-pointers branch from eb2b5ed to 707b9c0 Compare December 12, 2024 18:40
@bengl bengl enabled auto-merge (squash) December 12, 2024 18:40
khanayan123
khanayan123 previously approved these changes Dec 18, 2024
# Conflicts:
#	packages/dd-trace/test/fixtures/telemetry/config_norm_rules.json
@bengl bengl merged commit 216bf5d into master Dec 18, 2024
307 checks passed
@bengl bengl deleted the nicholas.hulston/dynamodb-span-pointers branch December 18, 2024 18:02
rochdev pushed a commit that referenced this pull request Dec 18, 2024
* Add span pointer support for updateItem and deleteItem

* putItem support

* transactWriteItem support

* batchWriteItem support

* Add unit+integration tests (very large commit)

* Move `DD_AWS_SDK_DYNAMODB_TABLE_PRIMARY_KEYS` parsing logic to config.js

* Code refactoring

* Move util functions to packages/datadog-plugin-aws-sdk/

* lint

* log when encountering errors in `encodeValue`; fix test

* Send config env var as string to telemetry; handle parsing logic in dynamodb.js

* Update config_norm_rules.json

* fix test

* Add unit tests for DynamoDB generatePointerHash

* better logging + checks
@rochdev rochdev mentioned this pull request Dec 18, 2024
rochdev pushed a commit that referenced this pull request Dec 18, 2024
* Add span pointer support for updateItem and deleteItem

* putItem support

* transactWriteItem support

* batchWriteItem support

* Add unit+integration tests (very large commit)

* Move `DD_AWS_SDK_DYNAMODB_TABLE_PRIMARY_KEYS` parsing logic to config.js

* Code refactoring

* Move util functions to packages/datadog-plugin-aws-sdk/

* lint

* log when encountering errors in `encodeValue`; fix test

* Send config env var as string to telemetry; handle parsing logic in dynamodb.js

* Update config_norm_rules.json

* fix test

* Add unit tests for DynamoDB generatePointerHash

* better logging + checks
@rochdev rochdev mentioned this pull request Dec 18, 2024
rochdev pushed a commit that referenced this pull request Dec 18, 2024
* Add span pointer support for updateItem and deleteItem

* putItem support

* transactWriteItem support

* batchWriteItem support

* Add unit+integration tests (very large commit)

* Move `DD_AWS_SDK_DYNAMODB_TABLE_PRIMARY_KEYS` parsing logic to config.js

* Code refactoring

* Move util functions to packages/datadog-plugin-aws-sdk/

* lint

* log when encountering errors in `encodeValue`; fix test

* Send config env var as string to telemetry; handle parsing logic in dynamodb.js

* Update config_norm_rules.json

* fix test

* Add unit tests for DynamoDB generatePointerHash

* better logging + checks
nhulston added a commit to DataDog/datadog-lambda-js that referenced this pull request Dec 18, 2024
rochdev pushed a commit that referenced this pull request Dec 18, 2024
* Add span pointer support for updateItem and deleteItem

* putItem support

* transactWriteItem support

* batchWriteItem support

* Add unit+integration tests (very large commit)

* Move `DD_AWS_SDK_DYNAMODB_TABLE_PRIMARY_KEYS` parsing logic to config.js

* Code refactoring

* Move util functions to packages/datadog-plugin-aws-sdk/

* lint

* log when encountering errors in `encodeValue`; fix test

* Send config env var as string to telemetry; handle parsing logic in dynamodb.js

* Update config_norm_rules.json

* fix test

* Add unit tests for DynamoDB generatePointerHash

* better logging + checks
rochdev pushed a commit that referenced this pull request Dec 18, 2024
* Add span pointer support for updateItem and deleteItem

* putItem support

* transactWriteItem support

* batchWriteItem support

* Add unit+integration tests (very large commit)

* Move `DD_AWS_SDK_DYNAMODB_TABLE_PRIMARY_KEYS` parsing logic to config.js

* Code refactoring

* Move util functions to packages/datadog-plugin-aws-sdk/

* lint

* log when encountering errors in `encodeValue`; fix test

* Send config env var as string to telemetry; handle parsing logic in dynamodb.js

* Update config_norm_rules.json

* fix test

* Add unit tests for DynamoDB generatePointerHash

* better logging + checks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants