Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📈 PaaS-friendly metrics #4874

Merged
merged 86 commits into from
Apr 19, 2020
Merged

📈 PaaS-friendly metrics #4874

merged 86 commits into from
Apr 19, 2020

Conversation

platan
Copy link
Member

@platan platan commented Apr 5, 2020

This PR tries to fix #3946 :-)

Current metrics

Problem we want to solve

After moving to Heroku (or possibly other PaaS), we are not able to access metrics (/metrics endpoint) from specific instance.

Solution

Solution was proposed in this comment

  1. Because the individual servers can't be reached externally, metrics will need to be generated on each server and sent to the metrics server.
    This PR is implementation of this idea.
  • We are still using prom-client to collect all metrics (default + custom). No change in code is needed
  • All Shields servers convert metrics from returned by prom-client as a JSON to influx format and push it to metrics.shields.io/telegraf every 15 seconds
  • https://github.com/influxdata/telegraf is running at metrics.shields.io, it accepts metrics send by Shields servers in influx format, it is also converts metrics to prometheus format and expose them via HTTP, metrics expire in Telegraf after 20 seconds (Telegraf can expire metrics automatically, which pushgateway cannot)
  • Prometheus collect metrics from the Telegraf instance
  • Metrics are available in Grafana

Why converting to Influx format?

Currently Telegraf cannot accept metrics in Prometheus format in HTTP listener, but there are plans to add such feature.

How to identify instances?

Currently metrics are identified by 2 labels (can be observed at https://metrics.shields.io/d/g_1B7zhik/prom-client-default-metrics):

  • env - identifies service environment, value „shields-io-production” indicates http://shields.io
  • instance - identifies particular instance, values: „s0.shields-server.com:443”, „s1.shields-server.com:443” and „s2.shields-server.com:443”

Both labels are added to metrics by Prometheus, metrics in /metrics endpoint do not contain such information.

Now we have add information described above to metrics.
The instance value source can be set using public.metrics.influx.instanceIdFrom configuration property. 3 values are allowed:

  • env-var- a value will be read from an environment variable with a name defined by the public.metrics.influx.instanceIdEnvVarName configuration property
  • hostname - a value will be equal to https://nodejs.org/api/os.html#os_os_hostname`. This value can be mapped using hostnameAliases
  • random, e.g. fyytvm7el

The env value can be defined using public.metrics.influx.envLabel configuration property.

There is also a new configuration property public.metrics.prometheus.endpointEnabled, which allows to enable/disable the /metrics endpoint.

production (VPS at OVH)

We can leave current metrics as they are. But we can also use the new approach. This configuration should enable new metrics and disable the old one:

public:
  metrics:
    prometheus:
      endpointEnabled: false
    influx:
      enabled: true
      url: https://metrics.shields.io/telegraf
      timeoutMilliseconds: 1000
      intervalSeconds: 15
      instanceIdFrom: hostname
      envLabel:  shields-io-production
      hostnameAliases:
        vps71670: s0.servers.shields.io
        vps244529: s1.servers.shields.io
        vps117870: s2.servers.shields.io

+ INFLUX_USERNAME and INFLUX_PASSWORD environment variables.

The code below shows what os.hostname() returns at OVH metrics server:

m@vps580707:~$ hostname
vps580707
m@vps580707:~$ hostname -f
vps580707.ovh.net
m@vps580707:~$ cat hostname.js
const os = require('os');

console.log('hostname: ', os.hostname());
m@vps580707:~$ ./node-v12.16.1-linux-x64/bin/node hostname.js
hostname:  vps580707

I guess our servers should return vps71670, vps244529 and vps117870 for os.hostname() (https://github.com/badges/shields/blob/master/doc/production-hosting.md#badge-servers)

Heroku

I suggest to use https://devcenter.heroku.com/articles/dyno-metadata#dyno-metadata to identify instances.
HEROKU_DYNO_ID can be used as an instance id.

If we want to use Dyno metadata, we have to enable it.

In the staging app(https://shields-staging.herokuapp.com) and in review apps we can use app name (HEROKU_APP_NAME) as an env value. This will give values shields-staging and shields-staging-pr-XXXX. We can send metrics from all env (production, staging, review apps) to http://metrics.shields.io/.

Telegraf is already running at http://metrics.shields.io/ and at http://metrics-test.shields.platan.space (a separate instance I use for testing). Configuration can be found here. I already send some metric from my locally running app to https://metrics-test.shields.platan.space/d/g_1B7zhik/prom-client-default-metrics?orgId=1&from=1586112499614&to=1586113011638&var-env=development&var-instance=rpy2hz7cd

Follow ups:

@platan platan temporarily deployed to shields-staging-pr-4874 April 15, 2020 18:44 Inactive
server.js Outdated
const server = (module.exports = new Server(config))
const server = (module.exports = new Server(config, {
id: process.env.INSTANCE_ID,
env: process.env.INSTANCE_ENV || process.env.NODE_CONFIG_ENV || 'unknown',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulmelnikow would you like to handle env similarly to instanceIdFrom? I think we can simply use https://github.com/badges/shields/blob/master/config/custom-environment-variables.yml for reading env.

If we want to remove env from the piece of code above, I suggest to remove instance metadata object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, to keep things simple I'd suggest we do one of these two:

  1. Always use NODE_CONFIG_ENV
  2. Specify the influx env in config as a string literal, and rely on setting it using the env var set in custom_environment_variables

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose option 2, because this will work for review apps - we want to set the env value to the PR review application name.

@platan platan temporarily deployed to shields-staging-pr-4874 April 17, 2020 15:37 Inactive
@platan platan temporarily deployed to shields-staging-pr-4874 April 17, 2020 19:20 Inactive
@platan
Copy link
Member Author

platan commented Apr 18, 2020

@paulmelnikow I think I fixed all issues. I will also update PR description.

I would like to enable dyno metadata in our staging/review apps. Is it OK for you that I will do it? https://devcenter.heroku.com/articles/dyno-metadata#usage

@platan platan temporarily deployed to shields-staging-pr-4874 April 18, 2020 11:24 Inactive
@platan platan temporarily deployed to shields-staging-pr-4874 April 18, 2020 11:31 Inactive
Copy link
Member

@paulmelnikow paulmelnikow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get it merged!

We probably need to add the production influx config, though that could happen in a follow-on PR.

@shields-deployment
Copy link

This pull request was merged to master branch. This change is now waiting for deployment, which will usually happen within a few days. Stay tuned by joining our #ops channel on Discord!

After deployment, changes are copied to gh-pages branch:

@platan
Copy link
Member Author

platan commented Apr 19, 2020

@paulmelnikow and @chris48s, thank you for the review 👍
Metrics from https://shields-staging.herokuapp.com are available at https://metrics.shields.io/d/g_1B7zhik/prom-client-default-metrics 🎉 (you have to change env to shields-staging) .

@calebcartwright
Copy link
Member

Glad to see this landed! Well done everyone 🎉

@paulmelnikow
Copy link
Member

@platan Do you want to send me some credentials for the production server through Keybase? Or should I snag them from staging?

@platan
Copy link
Member Author

platan commented Apr 20, 2020

I created a separate user for the production. I send you credentials through Keybase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PaaS-friendly metrics
6 participants