Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphQL] Instrument document parsing & validation and include variables in metadata #245

Merged

Conversation

alloy
Copy link
Contributor

@alloy alloy commented Aug 17, 2018

So the logic in this PR may seem a little unintuitive at first, but my understanding of the existing code is that we really want the operation span to be the parent of all GraphQL related spans and we want to use the operation name as its name.

This means we can’t create the parse and validation spans until the operation span has been created (otherwise we don’t have access to the parsed operation name), which is why my changes only record the timing but defer creation of the spans until the operation span is created.

I also ended up simplifying the tests a little, because they were a bit cumbersome to work with when adding new spans. Rather than assuming spans exist at a certain index, I added a helper function (findSpan) to find the spans based on data that the test previously asserted on. Additionally, most tests don’t really seem to need a total span count assertion, so I only left those in 2 places, which would make updating those a lot easier in the future.

@alloy
Copy link
Contributor Author

alloy commented Aug 17, 2018

Here’s a (sorted) example trace from our application:

[{
  "trace_id": 8574998873912514000,
  "span_id": 8574998873912514000,
  "parent_id": null,
  "name": "express.request",
  "resource": "POST",
  "service": "metaphysics.request",
  "error": 0,
  "meta": {
      "span.kind": "server",
      "http.url": "http://localhost:5001/",
      "http.method": "POST",
      "http.status_code": "200"
  },
  "start": 1535142832099484000,
  "duration": 504396973,
  "type": "http"
}, {
  "trace_id": 8574998873912514000,
  "span_id": 16801975833354732000,
  "parent_id": 8574998873912514000,
  "name": "graphql.query",
  "resource": "query",
  "service": "metaphysics.graphql",
  "error": 0,
  "meta": {
    "graphql.document": "query MyQuery($artistID: String!) {\n  artist(id: $artistID) {\n    ... on Artist {\n      name\n    }\n  }\n}",
    "graphql.variables": "{\"artistID\":\"banksy\"}"
  },
  "start": 1535142832102923800,
  "duration": 500329590
}, {
  "trace_id": 8574998873912514000,
  "span_id": 8415699621711013000,
  "parent_id": 16801975833354732000,
  "name": "graphql.parse",
  "resource": "graphql.parse",
  "service": "metaphysics.graphql",
  "error": 0,
  "meta": {},
  "start": 1535142832102923800,
  "duration": 92285
}, {
  "trace_id": 8574998873912514000,
  "span_id": 4505033126767931400,
  "parent_id": 16801975833354732000,
  "name": "graphql.validate",
  "resource": "graphql.validate",
  "service": "metaphysics.graphql",
  "error": 0,
  "meta": {},
  "start": 1535142832103034600,
  "duration": 619873
}, {
  "trace_id": 8574998873912514000,
  "span_id": 561450236758304900,
  "parent_id": 16801975833354732000,
  "name": "graphql.execute",
  "resource": "graphql.execute",
  "service": "metaphysics.graphql",
  "error": 0,
  "meta": {},
  "start": 1535142832103850500,
  "duration": 499399902
}, {
  "trace_id": 8574998873912514000,
  "span_id": 9994452044197814000,
  "parent_id": 561450236758304900,
  "name": "graphql.field",
  "resource": "artist",
  "service": "metaphysics.graphql",
  "error": 0,
  "meta": {},
  "start": 1535142832103961900,
  "duration": 499188232
}, {
  "trace_id": 8574998873912514000,
  "span_id": 13825540088684149000,
  "parent_id": 9994452044197814000,
  "name": "graphql.resolve",
  "resource": "artist",
  "service": "metaphysics.graphql",
  "error": 0,
  "meta": {},
  "start": 1535142832104014600,
  "duration": 499114990
}, {
  "trace_id": 8574998873912514000,
  "span_id": 3287111429158728000,
  "parent_id": 13825540088684149000,
  "name": "http.request",
  "resource": "GET",
  "service": "metaphysics.http-client",
  "error": 0,
  "meta": {
      "span.kind": "client",
      "http.method": "GET",
      "http.url": "https://stagingapi.artsy.net:443/api/v1/artist/banksy%3F",
      "http.status_code": "200"
  },
  "start": 1535142832108498200,
  "duration": 493988770,
  "type": "web"
}, {
  "trace_id": 8574998873912514000,
  "span_id": 10236300075184361000,
  "parent_id": 13825540088684149000,
  "name": "http.request",
  "resource": "GET",
  "service": "metaphysics.http-client",
  "error": 0,
  "meta": {
      "span.kind": "client",
      "http.method": "GET",
      "http.url": "https://stagingapi.artsy.net:443/api/v1/artist/banksy%3F",
      "http.status_code": "200"
  },
  "start": 1535142832108619800,
  "duration": 493860107,
  "type": "web"
}]

@@ -2,14 +2,21 @@

const agent = require('./agent')

function findSpan (spans, name, resource, service) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function removes some of the functionality of the previous tests, such as validating the span creation order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok 👍


expect(spans).to.have.length(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep these as historically there have been bugs found from those checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok 👍

}
if (!contextValue._datadog_operation_start_time) {
Object.defineProperties(contextValue, {
_datadog_operation_start_time: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instrumenting at this layer is not safe as libraries such as express-graphql skip it and use parse and execute directly. Maybe this check can be replaced by middleware instrumentation on the web server instead? The timing of any GraphQL middleware should roughly match the timing of the entire operation. It would however mean that if used without a parent, parse, validate and query would end up as different traces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe this could start some kind of graphql.operation span that would be the parent when available. Then it would be possible to start a span directly in parse and validate instead of deferring them. When used with wrappers such as express-graphql the graphql.operation span would be skipped since this file is not used, but then you have the parent from express.

Do you think that would make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When used with wrappers such as express-graphql the graphql.operation span would be skipped since this file is not used, but then you have the parent from express.

In my naive thoughts it seems like that may not provide the greatest experience when drilling down into the data. Ideally we do want the named operation to be the parent of these, no?

Is it possible for this plugin to optionally hook into another package? i.e. if the express-graphql package is installed wrap graphqlHTTP from it and then do the same recording of _datadog_operation_start_time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it some more, maybe we can just, in order, check for the existence of document._datadog_parse_time and then document._datadog_validate_time. If either of those exists we use their start time as the start time of the operation span. That way we don’t need to hook either the graphql function or anything in express-graphql. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work. I would still prefer to distinguish between the entire operation and the execution as 2 different spans though. So basically there would be a root operation span, with parse, validate and execute each having their own spans with the operation as the parent.

The only problem I see with this is that if parse or validate fails, then execute is never called and you would not get a trace at all. I don't know however if there is a use case to get a trace in that case, especially since if there is a parent, the parse/validate error could be reported in the error.* tags of the parent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. The only problem I have is what to ‘name’ the root span. Currently the span gets the name of the operation, but that requires parsing first. We can lazily create the root span in parse, validate, and execute and use the parsed operation name, but what name do we use when parsing fails?

Also, I assume you’d want to attach the full document and variables to the root span?

)

if (document._datadog_parse_time) {
createFinishedSpan(tracer, config, 'parse-document', document._datadog_parse_time, span)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply parse?

createFinishedSpan(tracer, config, 'parse-document', document._datadog_parse_time, span)
}
if (document._datadog_validate_time) {
createFinishedSpan(tracer, config, 'validate-document', document._datadog_validate_time, span)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply validate?

docs/API.md Outdated
| Option | Default | Description |
|-----------------|--------------------------------------------------|------------------------------------------------------------------|
| service | *Service name of the app suffixed with -graphql* | The service name for this integration. |
| filterVariables | `variables => variables` | A callback that allows filtering of variables before submission. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a non-verb name to allow extending this in the future to support for example an array of object paths or a definition object such as blacklists or whitelists.

Copy link
Member

@rochdev rochdev Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the default should be to not store the variables at all. The user must provide a filter. This will avoid sensitive data to be sent by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, that’s way better 👍

@alloy alloy changed the title [GraphQL] Instrument document parsing and validation [GraphQL] Instrument document parsing & validation and include variables in metadata Aug 22, 2018
@alloy alloy force-pushed the graphql-trace-query-parsing-and-validation branch 3 times, most recently from 16cb68a to 8e94d2c Compare August 24, 2018 20:12
@alloy
Copy link
Contributor Author

alloy commented Aug 24, 2018

Alright, this is ready for another review. Also see the updated example trace above.

Copy link
Member

@rochdev rochdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR should be rebased on v0.6.0 since it contains new features.

@@ -26,12 +27,33 @@ function createWrapExecute (tracer, config, defaultFieldResolver, responsePathAs
schema._datadog_patched = true
}

if (!contextValue._datadog_operation) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this check removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it was being done too late.

docs/API.md Outdated
| Option | Default | Description |
|-----------------|--------------------------------------------------|------------------------------------------------------------------------|
| service | *Service name of the app suffixed with -graphql* | The service name for this integration. |
| filterVariables | `undefined` *No variables will be recorded* | To enable recording provide a callback. E.g. `variables => variables`. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the naming here. This is technically no longer a filter. I also still think it should be a noun. Maybe just variables? Then it could eventually be either a list of variable names or a callback (just a callback for now).

'graphql.document': source
}
if (variableValues && config.filterVariables) {
tags['graphql.variables'] = JSON.stringify(config.filterVariables(variableValues))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a single tag or multiple tags? (similar to another PR that you made with the HTTP headers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, just like the GraphQL document, you can copy-paste these directly into a debugging session.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a use case for individual tags though (i.e. when using Trace Search)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh wait, I’m still a bit of a newb with the UI. Yeah, being able to filter traces within a named operation group would be great indeed! I’ll make the change 👍

@alloy alloy force-pushed the graphql-trace-query-parsing-and-validation branch from 5dfd48e to 12550bd Compare September 1, 2018 22:39
@alloy
Copy link
Contributor Author

alloy commented Sep 3, 2018

Alright, addressed all feedback.

@alloy
Copy link
Contributor Author

alloy commented Sep 4, 2018

@rochdev Btw, I’ve been wondering if it makes filtering traces easier to rename the graphql.query and graphql.mutation spans to just graphql.operation and add a graphql.operation: query/mutation tag?

@rochdev
Copy link
Member

rochdev commented Sep 4, 2018

@alloy Really interested to know your reasoning about the graphql.operation vs query/mutation since this has actually been a subject of discussion internally when we first implemented GraphQL. We mostly went with query/mutation since it's what felt the most natural in the UI, but there was never a clear winner.

I don't have a strong opinion about this, but in order to make such a change I'd like to know the clear benefits so that we don't end up re-having this discussion.

Copy link
Member

@rochdev rochdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few last small things but otherwise LGTM!

@alloy
Copy link
Contributor Author

alloy commented Sep 4, 2018

@rochdev Regarding graphql.operation vs query/mutation, I noticed that some of my co-workers in their head refer to operations as queries, regardless of type. One co-worker was creating a view in the DD UI that was filtering just by graphql.query, thus overlooking all the mutations coming in. Taking away this ambiguity might make working with the data slightly more intuitive.

I just want to stress that I myself have lots to learn about the UI still, so my suggestion is completely contingent on that you can equally filter on resource names or resource names + tags.

@rochdev
Copy link
Member

rochdev commented Sep 4, 2018

I see, that makes sense. So basically you're saying there is a use case to search through all types of operations at the same time? I didn't think this was necessary. But I'm no GraphQL expert either :)

Let's do this in another PR because it would be a breaking change, especially since it's the root span of the <service>-graphql service.

@alloy alloy force-pushed the graphql-trace-query-parsing-and-validation branch from 82761da to 6d7e2ac Compare September 5, 2018 10:14
@alloy alloy changed the base branch from master to v0.6.0 September 5, 2018 10:19
@alloy alloy force-pushed the graphql-trace-query-parsing-and-validation branch from 6d7e2ac to bde0be2 Compare September 5, 2018 10:19
@alloy
Copy link
Contributor Author

alloy commented Sep 5, 2018

All done 👍

I’ll initially open a ticket for that operation change, because while I’m knowledgeable about GraphQL, I’m no ops expert :) So maybe others have thoughts on this too.

@rochdev rochdev added this to the 0.6.0 milestone Sep 5, 2018
@rochdev rochdev merged commit a246b60 into DataDog:v0.6.0 Sep 5, 2018
@alloy alloy deleted the graphql-trace-query-parsing-and-validation branch September 5, 2018 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants