Skip to content

Conversation

@trentm
Copy link
Member

@trentm trentm commented Sep 15, 2021

Also guard against:

  • metricsInterval:null resulting in metrics being enabled without any
    log message, which is perhaps slightly surprising.
  • metricsInterval:-1s (any negative value) resulting in metrics being
    collected as fast as possible via
    setInterval(collectAllTheMetrics, -1000).

An invalid value for this config var, and the other "POSTIVE_TIME_OPTS",
will result in (a) a log.warning and (b) falling back to the default
value.

A slight semantic change on the internal Metrics object means that the
getOrCreate...() metric functions now return undefined when metrics
are disabled via metricsInterval. This doesn't affect any public API.

Repro

// no-metrics-please.js

require('./').start({
  serviceName: 'foo',
  metricsInterval: '0s'
})

const http = require('http')

const hostname = '127.0.0.1'
const port = 3000
const server = http.createServer((req, res) => {
  res.end('pong')
})
server.listen(port, hostname, () => {
  console.log(`Server running at http://${hostname}:${port}/`)
})

Run the above script, then do at least one request via curl -i localhost:3000/.

The agent's internal Metrics object will have enabled === false and calls:

      this[registrySymbol] = new MetricsRegistry(this[agentSymbol], {
        reporterOptions: {
          defaultReportingIntervalInSeconds: 0,
          enabled: false,
          unrefTimers: !refTimers,
          logger: new NoopLogger()
        }
      })

The enabled: false there is just for our MetricsRegistry class.
Internally the Reporter from measured-reporting gets the defaultReportingIntervalInSeconds: 0 and does this:

    this._defaultReportingIntervalInSeconds =
      options.defaultReportingIntervalInSeconds || DEFAULT_REPORTING_INTERVAL_IN_SECONDS;

hence defaulting to its internal 10s interval. Then the first time <reporter>.reportMetricOnInterval(metricKey) is called (which happens for breakdown metrics on that curl ... call above), that defaultReportingIntervalInSeconds: 10 is used in a setInterval to call metrics collectors.

We don't report any metrics to APM server because of this in MetricsRegistry:

  _reportMetrics (metrics) {
    if (!this.enabled) return
...

However, collection can still be happening, which is work that shouldn't be done.

Repro metricsInterval=-1s

With a negative value you get approaching 100% CPU usage as system and runtime metrics are collected in a setInterval with a negative delay -- which means run as fast as possible (subject to a JS engine-defined minimum delay).

Checklist

  • Implement code
  • Update documentation
  • Add CHANGELOG.asciidoc entry

@trentm trentm self-assigned this Sep 15, 2021
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Sep 15, 2021
@ghost
Copy link

ghost commented Sep 15, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-09-20T16:16:25.921+0000

  • Duration: 18 min 19 sec

  • Commit: 60b9d2a

Test stats 🧪

Test Results
Failed 0
Passed 20
Skipped 0
Total 20

Trends 🧪

Image of Build Times

Image of Tests

…val=0s

Also guard against:
- `metricsInterval:null` resulting in metrics being enabled without any
  log message, which is perhaps slightly surprising.
- `metricsInterval:-1s` (any negative value) resulting in metrics being
  collected *as fast as possible* via
  `setInterval(collectAllTheMetrics, -1000)`.

An invalid value for this config var, and the other "POSTIVE_TIME_OPTS",
will result in (a) a log.warning and (b) falling back to the default
value.

A slight semantic change on the internal Metrics object means that the
`getOrCreate...()` metric functions now return undefined when metrics
are disabled via `metricsInterval`. This doesn't affect any public API.
@trentm trentm force-pushed the trentm/metrics-actually-disabled branch from 8fa97e4 to a6f562a Compare September 15, 2021 23:26
@trentm trentm marked this pull request as ready for review September 15, 2021 23:26
@trentm trentm requested a review from astorm September 15, 2021 23:26
astorm
astorm previously approved these changes Sep 16, 2021
Copy link
Contributor

@astorm astorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Overall looks like a positive incremental improvement of the agent. A few questions/suggestions below but I'm comfortable with a merge as is. Approving.

logger: new NoopLogger()
}
})
if (enabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 All methods in this class have a guard clause that should prevent Cannot read property .. of undefined errors, and the symbol effectively makes this a private property.

normalizeNumbers(opts)
normalizeBytes(opts)
normalizeArrays(opts)
normalizePositiveTime(opts, logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I ran some tests and this function appears to behave as expected -- non-positive and null/undefined values use the values from DEFAULTS.

astorm
astorm previously approved these changes Sep 16, 2021
@trentm
Copy link
Member Author

trentm commented Sep 20, 2021

Failure in GH Action CI:

not ok 16 captureError is fast (<3ms): 3.688309ms

is due to the inherent race in disable-send.test.js with possibly-slow CI:

      // Test captureError speed as a proxy for testing that it avoids
      // stacktrace collection when disableSend=true. It isn't a perfect way
      // to test that.
      const durationMs = duration[0] / 1e3 + duration[1] / 1e6
      const THRESHOLD_MS = 3 // Is this long enough for slow CI?
      t.ok(durationMs < THRESHOLD_MS, `captureError is fast (<${THRESHOLD_MS}ms): ${durationMs}ms`)

Just retrying for now.

@trentm
Copy link
Member Author

trentm commented Sep 20, 2021

but I'm comfortable with a merge as is. Approving.

I'm taking that as a +1.

@trentm trentm merged commit 56c6763 into master Sep 20, 2021
@trentm trentm deleted the trentm/metrics-actually-disabled branch September 20, 2021 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-nodejs Make available for APM Agents project planning.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants