Skip to content

Infer API Gateway spans #172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

agocs
Copy link
Contributor

@agocs agocs commented Sep 7, 2021

What does this PR do?

Gives dd-lambda-py the ability to infer spans for API Gateway, HTTPAPI, and Websocket invocations.

image

image

Known issue

The inferred span's service is still being overwritten with the value of the lambda function's service tag by logs backend. I think I've exhausted my options for fixing that on the lambda library side. I'll start working on a fix for the backend.

Motivation

Testing Guidelines

I've added JSON events and unit tests for

  • non-proxy API Gateway requests
  • API Gateway websocket events
  • and HTTP API requests

I've also deployed a lambda function in the Sandbox account, tied to the Datadog Serverless org, instrumented with this version of the code, and have been hitting it with CURL requests and via websocket in POSTMAN.

Additional Notes

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

@agocs agocs marked this pull request as ready for review September 10, 2021 19:53
@agocs agocs requested a review from a team as a code owner September 10, 2021 19:53
@agocs agocs changed the title Chris.agocs/inferring api gateway spans poc Infer API Gateway spans Sep 10, 2021
@agocs
Copy link
Contributor Author

agocs commented Sep 10, 2021

n.b. @Hesperide the inferred spans are still showing up as the same color as the Lambda spans, and they're the same node on the trace map. I've tried setting the service name to the API Gateway URL (or equivalent), and it doesn't seem to want to take.

@Hesperide
Copy link

n.b. @Hesperide the inferred spans are still showing up as the same color as the Lambda spans, and they're the same node on the trace map. I've tried setting the service name to the API Gateway URL (or equivalent), and it doesn't seem to want to take.

How can I help unblock this? Do AJ/Darcy have any ideas?

README.md Outdated
@@ -87,6 +87,16 @@ Initialize the Datadog tracer when set to `true`. Defaults to `false`.

Set to `true` to merge the X-Ray trace and the Datadog trace, when using both the X-Ray and Datadog tracing. Defaults to `false`.

### DD_INFERRED_SPANS (beta)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you were planning to update this later, but when we're GA this should default to true.

We should also flag as "experimental" instead of "beta".

@tianchu
Copy link
Collaborator

tianchu commented Sep 13, 2021

n.b. @Hesperide the inferred spans are still showing up as the same color as the Lambda spans, and they're the same node on the trace map. I've tried setting the service name to the API Gateway URL (or equivalent), and it doesn't seem to want to take.

How can I help unblock this? Do AJ/Darcy have any ideas?

This is probably due to the backend tag resolution overriding? I believe the backend applies the service tag value to every span in the same trace payload, because it assumes that spans in the same trace payload come from the same service. One potential solution is to put the inferred trigger span in its own trace payload, perhaps by calling tracer.write()?

@agocs
Copy link
Contributor Author

agocs commented Sep 14, 2021

This is probably due to the backend tag resolution overriding? I believe the backend applies the service tag value to every span in the same trace payload, because it assumes that spans in the same trace payload come from the same service. One potential solution is to put the inferred trigger span in its own trace payload, perhaps by calling tracer.write()?

Hey Tian, I tried explicitly calling tracer.write() and it seems to do what we think it does, but the tracer still flushes the whole trace (with both spans) at the end of the execution, to the same effect.
image

Update: Darcy points out that I'm sending via the forwarder, and the forwarder is probably rewriting my service names. I'll try to make this work with the extension.

self.inferred_span = None
self.make_inferred_span = (
os.environ.get("DD_INFERRED_SPANS", "false").lower() == "true"
and should_use_extension
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not support inferred spans when using the forwarder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forwarder adds tags from the tag cache to the trace payload(s) being sent back to the trace intake endpoint. This is an additional variable and an additional thing we would need to fix before the inferred spans show up as their own services. We can always remove this check and add forwarder support later.

if self.inferred_span:
if status_code:
self.inferred_span.set_tag("http.status_code", status_code)
self.inferred_span.finish()
logger.debug("datadog_lambda_wrapper _after() done")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line is no longer at the right place.

def create_function_execution_span(
context,
function_name,
is_cold_start,
trace_context_source,
merge_xray_traces,
trigger_tags,
upstream=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "parent" a better name?

@@ -402,6 +546,7 @@ def create_function_execution_span(
else None,
"datadog_lambda": datadog_lambda_version,
"dd_trace": ddtrace_version,
"span.name": "aws.lambda",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? Doesn't span = tracer.trace("aws.lambda", **args) (later in this function) set the span name to aws.lambda?

SPAN_TYPE_INFERRED = "inferred"


class InferredSpanFilter(TraceFilter):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class definitely deserve a comprehensive docstring to explain why it's needed and the background.

return None


def detect_inferrable_span_type(event):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reuse (maybe some refactoring required) the same logic from https://github.com/DataDog/datadog-lambda-python/blob/main/datadog_lambda/trigger.py? Want to avoid inconsistency between what's shown by the trace and the invocation.

tags = {
"operation_name": "aws.apigateway.websocket",
"service.name": domain,
"url": domain + endpoint,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http_tags["http.url"] = request_context["domainName"]
sounds like the standard field is http.url?

Copy link
Contributor

@DarcyRaynerDD DarcyRaynerDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits


Inferred Spans are spans that Datadog can create based on incoming event metadata.
Set `DD_INFERRED_SPANS` to `true` to infer spans based on Lambda events.
Inferring upstream spans is only supported if you are using the [Datadog Lambda Extension](https://docs.datadoghq.com/serverless/libraries_integrations/extension/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually true? I know we said we wouldn't do any specific work to make it function in the forwarder, but the span will still get generated. I thin originally I said the forwarder might remap the service name, but I don't know if we verified that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did verify it. The forwarder adds tags from the tags cache to the trace payloads it sends back. If we want, we can always fix this and add forwarder support later on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i guess the forwarder applies the service tag on all the trace payloads from the same event object? if so, even the trace payload carrying the inferred span has no lambda function arn, it still assumes it's from the same function with other payloads and apply the tag? If yes, we would need to make some changes in the forwarder, hopefully a small one.

SPAN_TYPE_INFERRED = "inferred"


class InferredSpanFilter(TraceFilter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this warrants a comment explaining why we are using this.

return None


def detect_inferrable_span_type(event):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can plug this into the existing trigger detection logic. I'm wondering if we are duplicating the functionality somewhat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You and Tian are really on the same brain wave here 😄

@agocs agocs changed the base branch from main to inferred-spans-feature-branch October 26, 2021 19:23
Comment on lines +49 to +52
Currently, only API Gateway events have subtypes, but I imagine we might see more in the
future.
This was added to support the difference in handling of e.g. HTTP-API and Websocket events vs
vanilla API-Gateway events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 This will be helpful in making a distinction in the future if we have to for any reason.

Copy link
Contributor

@hghotra hghotra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@agocs agocs merged commit c7440a5 into inferred-spans-feature-branch Oct 27, 2021
@agocs agocs deleted the chris.agocs/inferring_api_gateway_spans_poc branch October 27, 2021 15:18
agocs added a commit that referenced this pull request Feb 9, 2022
* Infer API Gateway spans (#172)

* Infer spans from API Gateway events

* Adding some prints. remove later

* Change some info on the API Gateway span

* Rename something

* black

* >:(

* black

* fix time

* Support various API Gateway, HTTPAPI, and Websocket events

* black

* Add DD_INFERRED_SPANS env var to turn inferred spans on and off

* infer spans in integration tests

* specify which env var to set true in order to enable inferred spans

* try setting inferred span name to inferred span URL

* s/beta/experimental/

* Correctly create spans in separate services, assuming the extension is running and the  tag is not set on the function

* Remove function_name

* Flush after closing spans

* black

* black

* update snapshots

* Make the snapshots valid json

* black

* Remove the inferredSpansFilter

* Refactor inferred-span event type detection to use the trigger event type code

* remove unused import

* lines too long >=(

* Finish refactor using _EventSource object

* lol, remove println debugging

* Update snapshots

* Inferred spans get meta.span_type:inferred

* [SLS-1594] Inferred spans for SQS & SNS (#190)

* Simplify event source parsing logic

* Add sns & sqs inferred spans

* Fix default evt source bug & update comments

* Use poetry for running unit tests

* Add additional api gateway trigger tests

* Fix bugs & refactor

* Add inferred spans unit tests

* Update integration tests

* Run all tests for all python versions

* Add missing change

* Update _dd.span_type to span_type

* Remove dependence on extension

* Additional api gateway test

* Remove log line & add todo

* Fix api gateway integration test

* Move call to get_first_record

* Revert to using pip inside docker

* [SLS-1671] Inferred Spans for Kineses, EventBridge, S3, DynamoDB (#191)

* Add kinesis inferred span

* Add dynamodb inferred span

* Add s3 inferred span

* Add eventbridge custom event inferred span

* Comment out flaky units tests (TODO)

* Add integration tests

* Remove log line

* Use timestamp instead of strftime (#194)

* [SLS-1683] Add sync/async tag and set inferred span end time based on value (#195)

* Add sync/async tag to inferred spans

* End inferred span based on sync/async tag

* Update tests

* Fix issue with comparison

* Fix small issues

* Update integration test snapshots

* [SLS-1713] Add tag for whether span should inherit service from lambda (#196)

* Use service arg rather than service.name tag
* Refactor inferred span metadata tags

* black

* Update inferred span keys

* [SLS-1674] Additional attributes for different managed services (#200)

* Additional attributes for sqs spans

* Add additional attributes for sns

* Add event subscription arn for sns

* Add additional attributes for kinesis spans

* Add additional attributes for dynamodb spans

* Add additional attributes for s3 spans

* Additional attribute for eventbridge spans

* Add additional attributes for websocket spans

* Fix websocket resource name

* Additional attributes for apigateway v1 spans

* Additional attributes for api-gateway v2 spans

* Fix api-gateway v1 resource name

* Fix merge issue

* Add space to resource name

* Update tests

* black

* Add typing_extensions to pyproject.toml

* Update snapshots

* Fix snapshots

Co-authored-by: chris.agocs <chris.agocs@datadoghq.com>

* [SLS-1824] SNS trace extractor (#201)

* Extract trace context from sns payload

* Fix extractor issue

* Handle SNS message inside SQS event

* Add extractor for Eventbridge context (#202)

* Update expected breaking change date (#114)

* change inferred_span to _inferred_span

* Add extractor for eventbridge trace context

* Add another test to test eventbridge extraction

* Get tracing.Literal for pythons that don't have it already

* Add _datadog to eventbridge extractor

* Update integration tests

* Remove init complete and main start logs

Co-authored-by: Jorie Helwig <jorie.helwig@datadoghq.com>
Co-authored-by: Harvinder Ghotra <ghotra.harvinder@gmail.com>

* Remove duplicate entry

* Kinesis extractor plus small fixes

* Add snapshots

* Add tests

Co-authored-by: Christopher Agocs <agocs@users.noreply.github.com>
Co-authored-by: Jorie Helwig <jorie.helwig@datadoghq.com>

* Update stash

* Address feedback

* Default DD_TRACE_MANAGED_SERVICES to true

Co-authored-by: Christopher Agocs <agocs@users.noreply.github.com>
Co-authored-by: chris.agocs <chris.agocs@datadoghq.com>
Co-authored-by: Jorie Helwig <jorie.helwig@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants