-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datadog-exporter: don't set tags which are None #272
datadog-exporter: don't set tags which are None #272
Conversation
2a58e32
to
1cbc87d
Compare
One of the tests failed with a "performance alert" when I merged master in, not sure what that's about? |
@@ -97,7 +97,7 @@ def inject( | |||
self.SAMPLING_PRIORITY_KEY, | |||
str(constants.AUTO_KEEP if sampled else constants.AUTO_REJECT), | |||
) | |||
if constants.DD_ORIGIN in span.context.trace_state: | |||
if span.context.trace_state.get(constants.DD_ORIGIN) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will not happen from the latest version. Earlier TraceState
was simple dict and there were no validation upon add/update key-value pair. We have fixed that in latest release and key-value pair much adhere to W3C tracestate format rules otherwise they will be discarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah very nice. I'll have to see if I can recreate the scenario I described in #262 with the current versions. Though, I am a little concerned that by disallowing that to be None
we'll have moved the error from this code I patched, to whatever code was actually causing it in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, rebased on the current main branch, and after some more testing, this PR still fixes the issue described in #262, so whatever change you're describing did not fix things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strange that you are still running into that error. From here every value in the tracestate must be a str type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, and I still haven't figured out where it's coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you update the opentelemetry-api
and propagator versions you are using? Can you provide a reproducible example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is with code from the main branch of both main and contrib repositories.
I'll see if I can strip down something - so far I've only seen when making gRPC client calls to a gRPC service running the botocore instrumentation and datadog exporter, so it's a little complex to reproduce, but I might be able to simplify more. Give me a day or two.
5e4b639
to
206eafa
Compare
926fcba
to
da614b6
Compare
Aha, so I've managed to nearly re-recreate the scenario in a simplified way, and I've seen this message in the log:
I assume this is the thing you were talking about @lonewolf3739 - it does make my patch not necessary. So it's quite possible this is fixed, let me recreate all of my virtualenvs and pull in the latest, I must have missed something. edit: actually what I should be doing is trying to find the source of that log message ^ - clearly that's the actual bug. |
@alertedsnake |
@lzchen yeah, there's some more detail in #262 - at this point I have found where the problem happens, but I don't really know the intent of the code that's setting this field to None, was asking if you guys knew more about what it's supposed to be doing. This comment specifically: #262 (comment) |
da614b6
to
98a3a65
Compare
This fix no longer seems necessary as there isn't an exception thrown anymore, but the underlying bug does still seem present, see further discussion in #262 . |
Hi, so I'm going to re-open this, since it does actually fix an issue I'm finding with
The header that's incorrect looks like this:
The code in this PR does indeed fix the issue. But again, this is only fixing an incorrect issue from propagating, it's not fixing the issue from happening - see all of the previous discussion about that. |
98a3a65
to
d4095d0
Compare
Can you share some minimal reproducible example for this problem? I don't see the possibility of tracestate value to be set to None. |
As I mentioned, there's a lot of previous discussion wherein I went through this in detail, both here and in the linked issue. In this case, I've got a gRPC service that makes a query to DynamoDB, and it's when that Dynamo call gets made (with the botocore instrumentation) that one of the HTTP headers on the request is In this message I mention the code that causes the problem and asked for some insight from others who have worked on this - getting an answer here would help quite a bit: There isn't really a minimal case, I could probably make one up if necessary, but I think it's pretty clear where the problem occurs from that question. |
I haven't followed that discussion closely. If this indeed fixes the problem then there is a possibility that same problem exists in other propagators as well. If you could provide some reproducible example then we could find the root of problem and fix that instead of adding checks to prevent None getting propagated. I don't think this fix changes anything by looking at the |
Here is my test case: https://github.com/alertedsnake/otel-test with some instructions on how to run it and how to discover the problem. Let me know if this helps. |
Thanks, I will check it. |
Looks like I need api keys/creds of various service providers otherwise I won't be able to run this example. I am fine with this getting in although I think this doesn't really fix the underlying problem. @alertedsnake Did you try pip editable install or something similar to see if this really fixed your issue? |
Yeah, you'll need data dog keys, or to use one of the agent mocking Docker images, I can try to get that working. As for the fix in this PR, yes, it does prevent the exception. But yes, it doesn't fix the actual problem - I was still hoping to get some insight from the original author or someone who knows the suspect code well. |
@lonewolf3739 Ok I updated the test case to not require Datadog credentials, and added a line to the readme about the AWS credentails - apparently you don't need either to see the error happen. |
@alertedsnake Thanks for sharing the updated example. I have done the troubleshooting and found the underlying problem. This line
And this line |
This is the kind of thing I was hoping someone with more knowledge of OTEL could figure out, nice work! I guess this means my fix in this PR is the actual fix, I thought I was just fixing the symptom. |
This is often `None`, but tags are always strings, and so things get broken when spans get passed along to other client calls. Fixes open-telemetry#262
d4095d0
to
1a50126
Compare
No this is not the actual fix. We need to change tracestate to behave correctly with membership. Once we do that this is no longer needed. |
Okay, that sounds even better. I'm happy to close this in favor of a ticket in the main repo. |
In fact, I've got a fix, will make a PR shortly. |
open-telemetry/opentelemetry-python#1773 - I don't think it needs anything more than this, but let me know if so. Happy to open an issue in that repo if we think it needs one. |
@lonewolf3739 |
This can be closed. |
Description
This is often
None
, but tags are always strings, and so things get broken when spans get passed along to other client calls.Fixes #262
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
This passes existing tests, and I've added one to verify when the field is set to
None
, it's skipped.Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.