-
Notifications
You must be signed in to change notification settings - Fork 45
Infer API Gateway spans #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infer API Gateway spans #172
Conversation
n.b. @Hesperide the inferred spans are still showing up as the same color as the Lambda spans, and they're the same node on the trace map. I've tried setting the service name to the API Gateway URL (or equivalent), and it doesn't seem to want to take. |
How can I help unblock this? Do AJ/Darcy have any ideas? |
README.md
Outdated
@@ -87,6 +87,16 @@ Initialize the Datadog tracer when set to `true`. Defaults to `false`. | |||
|
|||
Set to `true` to merge the X-Ray trace and the Datadog trace, when using both the X-Ray and Datadog tracing. Defaults to `false`. | |||
|
|||
### DD_INFERRED_SPANS (beta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you were planning to update this later, but when we're GA this should default to true.
We should also flag as "experimental" instead of "beta".
This is probably due to the backend tag resolution overriding? I believe the backend applies the service tag value to every span in the same trace payload, because it assumes that spans in the same trace payload come from the same service. One potential solution is to put the inferred trigger span in its own trace payload, perhaps by calling tracer.write()? |
Hey Tian, I tried explicitly calling Update: Darcy points out that I'm sending via the forwarder, and the forwarder is probably rewriting my service names. I'll try to make this work with the extension. |
…s running and the tag is not set on the function
self.inferred_span = None | ||
self.make_inferred_span = ( | ||
os.environ.get("DD_INFERRED_SPANS", "false").lower() == "true" | ||
and should_use_extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not support inferred spans when using the forwarder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The forwarder adds tags from the tag cache to the trace payload(s) being sent back to the trace intake endpoint. This is an additional variable and an additional thing we would need to fix before the inferred spans show up as their own services. We can always remove this check and add forwarder support later.
datadog_lambda/wrapper.py
Outdated
if self.inferred_span: | ||
if status_code: | ||
self.inferred_span.set_tag("http.status_code", status_code) | ||
self.inferred_span.finish() | ||
logger.debug("datadog_lambda_wrapper _after() done") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log line is no longer at the right place.
def create_function_execution_span( | ||
context, | ||
function_name, | ||
is_cold_start, | ||
trace_context_source, | ||
merge_xray_traces, | ||
trigger_tags, | ||
upstream=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "parent" a better name?
@@ -402,6 +546,7 @@ def create_function_execution_span( | |||
else None, | |||
"datadog_lambda": datadog_lambda_version, | |||
"dd_trace": ddtrace_version, | |||
"span.name": "aws.lambda", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? Doesn't span = tracer.trace("aws.lambda", **args)
(later in this function) set the span name to aws.lambda
?
datadog_lambda/tracing.py
Outdated
SPAN_TYPE_INFERRED = "inferred" | ||
|
||
|
||
class InferredSpanFilter(TraceFilter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this class definitely deserve a comprehensive docstring to explain why it's needed and the background.
datadog_lambda/tracing.py
Outdated
return None | ||
|
||
|
||
def detect_inferrable_span_type(event): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reuse (maybe some refactoring required) the same logic from https://github.com/DataDog/datadog-lambda-python/blob/main/datadog_lambda/trigger.py? Want to avoid inconsistency between what's shown by the trace and the invocation.
datadog_lambda/tracing.py
Outdated
tags = { | ||
"operation_name": "aws.apigateway.websocket", | ||
"service.name": domain, | ||
"url": domain + endpoint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http_tags["http.url"] = request_context["domainName"] |
http.url
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits
|
||
Inferred Spans are spans that Datadog can create based on incoming event metadata. | ||
Set `DD_INFERRED_SPANS` to `true` to infer spans based on Lambda events. | ||
Inferring upstream spans is only supported if you are using the [Datadog Lambda Extension](https://docs.datadoghq.com/serverless/libraries_integrations/extension/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually true? I know we said we wouldn't do any specific work to make it function in the forwarder, but the span will still get generated. I thin originally I said the forwarder might remap the service name, but I don't know if we verified that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did verify it. The forwarder adds tags from the tags cache to the trace payloads it sends back. If we want, we can always fix this and add forwarder support later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i guess the forwarder applies the service tag on all the trace payloads from the same event object? if so, even the trace payload carrying the inferred span has no lambda function arn, it still assumes it's from the same function with other payloads and apply the tag? If yes, we would need to make some changes in the forwarder, hopefully a small one.
datadog_lambda/tracing.py
Outdated
SPAN_TYPE_INFERRED = "inferred" | ||
|
||
|
||
class InferredSpanFilter(TraceFilter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this warrants a comment explaining why we are using this.
datadog_lambda/tracing.py
Outdated
return None | ||
|
||
|
||
def detect_inferrable_span_type(event): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can plug this into the existing trigger detection logic. I'm wondering if we are duplicating the functionality somewhat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You and Tian are really on the same brain wave here 😄
Currently, only API Gateway events have subtypes, but I imagine we might see more in the | ||
future. | ||
This was added to support the difference in handling of e.g. HTTP-API and Websocket events vs | ||
vanilla API-Gateway events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 This will be helpful in making a distinction in the future if we have to for any reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
* Infer API Gateway spans (#172) * Infer spans from API Gateway events * Adding some prints. remove later * Change some info on the API Gateway span * Rename something * black * >:( * black * fix time * Support various API Gateway, HTTPAPI, and Websocket events * black * Add DD_INFERRED_SPANS env var to turn inferred spans on and off * infer spans in integration tests * specify which env var to set true in order to enable inferred spans * try setting inferred span name to inferred span URL * s/beta/experimental/ * Correctly create spans in separate services, assuming the extension is running and the tag is not set on the function * Remove function_name * Flush after closing spans * black * black * update snapshots * Make the snapshots valid json * black * Remove the inferredSpansFilter * Refactor inferred-span event type detection to use the trigger event type code * remove unused import * lines too long >=( * Finish refactor using _EventSource object * lol, remove println debugging * Update snapshots * Inferred spans get meta.span_type:inferred * [SLS-1594] Inferred spans for SQS & SNS (#190) * Simplify event source parsing logic * Add sns & sqs inferred spans * Fix default evt source bug & update comments * Use poetry for running unit tests * Add additional api gateway trigger tests * Fix bugs & refactor * Add inferred spans unit tests * Update integration tests * Run all tests for all python versions * Add missing change * Update _dd.span_type to span_type * Remove dependence on extension * Additional api gateway test * Remove log line & add todo * Fix api gateway integration test * Move call to get_first_record * Revert to using pip inside docker * [SLS-1671] Inferred Spans for Kineses, EventBridge, S3, DynamoDB (#191) * Add kinesis inferred span * Add dynamodb inferred span * Add s3 inferred span * Add eventbridge custom event inferred span * Comment out flaky units tests (TODO) * Add integration tests * Remove log line * Use timestamp instead of strftime (#194) * [SLS-1683] Add sync/async tag and set inferred span end time based on value (#195) * Add sync/async tag to inferred spans * End inferred span based on sync/async tag * Update tests * Fix issue with comparison * Fix small issues * Update integration test snapshots * [SLS-1713] Add tag for whether span should inherit service from lambda (#196) * Use service arg rather than service.name tag * Refactor inferred span metadata tags * black * Update inferred span keys * [SLS-1674] Additional attributes for different managed services (#200) * Additional attributes for sqs spans * Add additional attributes for sns * Add event subscription arn for sns * Add additional attributes for kinesis spans * Add additional attributes for dynamodb spans * Add additional attributes for s3 spans * Additional attribute for eventbridge spans * Add additional attributes for websocket spans * Fix websocket resource name * Additional attributes for apigateway v1 spans * Additional attributes for api-gateway v2 spans * Fix api-gateway v1 resource name * Fix merge issue * Add space to resource name * Update tests * black * Add typing_extensions to pyproject.toml * Update snapshots * Fix snapshots Co-authored-by: chris.agocs <chris.agocs@datadoghq.com> * [SLS-1824] SNS trace extractor (#201) * Extract trace context from sns payload * Fix extractor issue * Handle SNS message inside SQS event * Add extractor for Eventbridge context (#202) * Update expected breaking change date (#114) * change inferred_span to _inferred_span * Add extractor for eventbridge trace context * Add another test to test eventbridge extraction * Get tracing.Literal for pythons that don't have it already * Add _datadog to eventbridge extractor * Update integration tests * Remove init complete and main start logs Co-authored-by: Jorie Helwig <jorie.helwig@datadoghq.com> Co-authored-by: Harvinder Ghotra <ghotra.harvinder@gmail.com> * Remove duplicate entry * Kinesis extractor plus small fixes * Add snapshots * Add tests Co-authored-by: Christopher Agocs <agocs@users.noreply.github.com> Co-authored-by: Jorie Helwig <jorie.helwig@datadoghq.com> * Update stash * Address feedback * Default DD_TRACE_MANAGED_SERVICES to true Co-authored-by: Christopher Agocs <agocs@users.noreply.github.com> Co-authored-by: chris.agocs <chris.agocs@datadoghq.com> Co-authored-by: Jorie Helwig <jorie.helwig@datadoghq.com>
What does this PR do?
Gives dd-lambda-py the ability to infer spans for API Gateway, HTTPAPI, and Websocket invocations.
Known issue
The inferred span's service is still being overwritten with the value of the lambda function's
service
tag by logs backend. I think I've exhausted my options for fixing that on the lambda library side. I'll start working on a fix for the backend.Motivation
Testing Guidelines
I've added JSON events and unit tests for
I've also deployed a lambda function in the Sandbox account, tied to the Datadog Serverless org, instrumented with this version of the code, and have been hitting it with CURL requests and via websocket in POSTMAN.
Additional Notes
Types of Changes
Check all that apply