Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: datadog tracer scope #895

Closed
wants to merge 2 commits into from
Closed

Conversation

ngraef
Copy link
Contributor

@ngraef ngraef commented Mar 24, 2021

📝 Description

  • Support dd-trace's AsyncLocalStorage and AsyncResource scope manager implementations (introduced in v0.27.0 and v0.28.0) and clean up the AsyncHooks implementation.
  • Fix activation of moleculer spans in dd-trace, allowing for propagation to external modules. This also allows for log injection to connect logs and traces.
  • Allow moleculer to continue a trace that was started in an external module. For example, this will show the moleculer-web handler as a child span of the node http listener.

🎯 Relevant issues

Resolves #690
Resolves #889

💎 Type of change

  • Bug fix (non-breaking change which fixes an issue)

🚦 How Has This Been Tested?

  • Unit and integration tests added
  • Tested an environment with 5 distributed services (each running in separate containers) sending traces to Datadog. The sample screenshots below show cross-service calls, custom spans in an action handler, and modules auto-instrumented by dd-trace (http-client, tcp, dns, fs, redis, postgres).
A simple call through moleculer-web to a remote action that makes some HTTP calls and interacts with a postgres database

Screen Shot 2021-04-18 at 11 39 09 AM

A call that includes some nested remote action calls, one of which calls redis and returns an error

Screen Shot 2021-04-18 at 11 40 06 AM

The same call as the previous example, but with a remote service subscribing to an event it emits

Screen Shot 2021-04-18 at 11 16 12 PM

The initial call to a lazy-loaded module with a bunch of fs reads needed to require the library

Screen Shot 2021-04-18 at 11 40 33 AM

An action handler with some custom spans via ctx.startSpan

Screen Shot 2021-04-18 at 11 41 49 AM

🏁 Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have commented my code, particularly in hard-to-understand areas

@ngraef ngraef force-pushed the fix-datadog-scope branch 2 times, most recently from b3c7688 to bdbb137 Compare March 25, 2021 00:38
@ngraef ngraef marked this pull request as ready for review April 19, 2021 06:27
@ngraef
Copy link
Contributor Author

ngraef commented Apr 19, 2021

@icebob This is now ready for review when you have some time.

@@ -736,6 +738,7 @@ class Transit {
level: ctx.level,
tracing: ctx.tracing,
parentID: ctx.parentID,
traceID: ctx.traceID,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a breaking change in the protocol, so it can be merged only in 0.15 or it can't work with schema-based serializers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need it to work with schema-based serializers at the moment. Will the serializer implementations safely ignore the new property if they don't support it? The tracing middleware handles an undefined traceID in the context.

@icebob
Copy link
Member

icebob commented Apr 19, 2021

please merge changes from the master, some test is not up-to-date.

@ngraef ngraef force-pushed the fix-datadog-scope branch 2 times, most recently from 835e688 to 1745336 Compare April 26, 2021 04:58
@ngraef
Copy link
Contributor Author

ngraef commented Apr 26, 2021

please merge changes from the master, some test is not up-to-date.

This branch was already rebased onto master. I've rebased again to pull in the latest typing fixes.

I believe the failing test is either an incorrect implementation in src/async-storage.js or an incorrect assumption in the test.

@icebob
Copy link
Member

icebob commented Apr 27, 2021

It's an old module, it is not used in the library anymore. But somehow this PR affects the async_hooks if the unit test failed, but in the master it doesn't.

@ngraef
Copy link
Contributor Author

ngraef commented May 6, 2021

But somehow this PR affects the async_hooks if the unit test failed, but in the master it doesn't.

I looked into this a bit more. The failure might not occur on a development machine by running npm test because jest will spawn multiple workers by default, and the datadog and async-storage tests might not run in the same process. The failure can be reproduced more easily (depending on test execution order) by setting the jest option --maxWorkers=1. This is the value used by default in the 2-core GitHub Actions runners.

The tests on the master branch don't have promise execution tracking enabled because storage.enable() is never called, so the hook is never enabled. Therefore, the executionAsyncId is 0 in both then callbacks and the shared resource is retrieved.

const storage = new AsyncStorage(broker);
const context = { a: 5 };
return Promise.resolve()
.then(() => {
storage.setSessionData(context);
expect(storage.getSessionData()).toBe(context);
})
.then(() => {
expect(storage.getSessionData()).toBe(context);
});

However, this PR adds tests for dd-trace's various scope implementations, which enable promise execution tracking for the process. In that case, the async-storage test fails because the then callbacks have different executionAsyncIds, and context propagation from parent to child async resources doesn't occur because the hook in the AsyncStorage class is not enabled. Enabling it in the test with storage.enable() will still fail because the context propagation implementation is broken.

Regardless, I've pushed a change that cleans up the hooks used in dd-trace and restores the previous passing behavior for the async-storage test.

@ngraef ngraef requested a review from icebob May 10, 2021 04:47
@icebob
Copy link
Member

icebob commented May 11, 2021

@ngraef I see, I don't forget it, but I'm hesitating because it contains a lot of changes affecting the tracer, context, and the communication protocol. I don't feel it can be merged into the 0.14, instead on the next 0.15.
In the new version I will drop the schema-based serializer, and I plan to add headers feature in the Context, to store any middleware-based or user-based information which is transferred to remote nodes without causing breaking change in the protocol.

@icebob
Copy link
Member

icebob commented Sep 5, 2021

what do you think about it? DataDog/dd-trace-js#1598

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datadog tracing not working Active span not correctly set in dd-trace
3 participants