Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing requests in metrics #105

Closed
samuelg opened this issue May 29, 2020 · 6 comments
Closed

Missing requests in metrics #105

samuelg opened this issue May 29, 2020 · 6 comments
Assignees

Comments

@samuelg
Copy link

samuelg commented May 29, 2020

We recently noticed that swagger-stats appears to be missing some requests in the exposed swagger-stats/metrics route. The application in question is using Open API using express-openapi. We also have metrics for the HA proxies fronting the application as well as a bunyan middleware logging all requests within the same application that happens to also report request-finish events along with a status code.

Looking at the latest metrics, we see a little over 1.7M 200s and 100K 401s being reported by HA proxy, whereas swagger-stats is reporting a little less than 700K 200s and only 282 401s. The logs being written to by the bunyan middleware match what HA proxy is reporting.

We are looking specifically at the api_request_total metrics.

Our package versions are:

{
  "bunyan": "^1.8.12",
  "bunyan-middleware": "^1.0.0",
  "express": "^4.17.1",
  "express-async-handler": "^1.1.4",
  "express-openapi": "^6.0.0",
  "swagger-stats": "^0.95.17",
}

We are configuring the swagger-stats middleware as follows:

app.use(
  swStats.getMiddleware({
    swaggerSpec: apiDoc,
    swaggerOnly: true,
    authentication: true,
    onAuthenticate: (req, username, password) => {
        // check auth for metrics
    })
  })
);

In case that helps here is how we are configuring the bunyan middleware:

app.use(
  bunyanMiddleware({
    obscureHeaders: [],
    logger,
    requestStart: true,
  })
);

We are using version 12.16.2 of Node.js.

The application where swagger-stats is installed has been running for 10 days straight without a restart so we don't believe this is due to stats being reset due to a restart.

The issue is also not that some routes are not in the swagger definition and therefore not being kept track of as we do see some metrics for all routes, just nowhere near as many as we would expect to see.

The swagger-stats/metrics route is being scrapped by Prometheus against the application directly as opposed to going through HA Proxy so we also do not believe that is to blame for the discrepancies, nor would that explain the difference in 401s we are seeing.

We are at a loss as to why swagger-stats appears to be missing so many requests.

Any help tracking this down would be appreciated.

@sv2 sv2 self-assigned this May 29, 2020
@sv2
Copy link
Collaborator

sv2 commented May 29, 2020

Is it possible to run app for some time with swagger-stats debug enabled - with env variable DEBUG=sws:* ?
If you could provide captured debug log, that could help to spot the issue

As you mentioned HA Proxy - are you by any chance running multiple nodes of app behind HA Proxy used as load balancer ? if so, each node would see subset of all API requests

@samuelg
Copy link
Author

samuelg commented Jun 1, 2020

We can work to add the debug ENV var to our deployment to gather logs.

We do have more than one node behind the HA proxy yes, but metrics are being collected by a single Prometheus and we are looking at a sum of the metrics. We also have a single Logstash server receiving logs from all nodes, and the counts there match the counts from the HA proxy.

@samuelg
Copy link
Author

samuelg commented Jun 2, 2020

After deploying the application with swagger-stats debugging enabled and analyzing logs it appears the issue is that we are using swaggerOnly: true in our config but some of our clients (which we do not control) are using a trailing / when accessing some of our API routes. The routes are being routed correctly by openapi but because they are not technically an exact match due to the trailing / swagger-stats is not tracking them. Is that expected? If the route is part of the openapi definition and being routed correctly I would expect swagger-stats to track it. For now we're going to disable swaggerOnly as a workaround.

@sv2
Copy link
Collaborator

sv2 commented Jun 2, 2020

Thanks for letting me know! This calls for improvement in matching. Will try to reproduce and handle trailing / properly. Just in case, can you give me couple of examples of your routes ?

@samuelg
Copy link
Author

samuelg commented Jun 3, 2020

Here is a truncated view of our api definition:

/v1/tokens:
    post:
      operationId: createToken
/v1/sessions:
    post:
      operationId: createSession

We are seeing access to the following in the logs:

  • /v1/tokens
  • /v1/tokens/
  • /v1/sessions
  • /v1/sessions/

The ones with a trailing / are the ones missing the swagger-stats. None of these routes have parameters in the route path. All parameters are passed via a POST body or a header depending on the route.

@sv2
Copy link
Collaborator

sv2 commented Apr 3, 2021

Addressed v0.99.1

@sv2 sv2 closed this as completed Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants