Skip to content

Commit

Permalink
Add error rate definition
Browse files Browse the repository at this point in the history
  • Loading branch information
felixbarny committed Aug 20, 2020
1 parent c390771 commit 362e2a1
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 0 deletions.
4 changes: 4 additions & 0 deletions specs/agents/tracing-spans.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,12 @@ the error rate of service B is 100% from service A's perspective.
However, as service B doesn't receive any requests, the error rate is 0% from service B's perspective.
The `span.outcome` also allows reasoning about error rates of external services.

#### Outcome API

Agents should expose an API to manually override the outcome.
This value must always take precedence over the automatically determined value.
The documentation should clarify that spans with `unknown` outcomes are ignored in the error rate calculation.

#### Span stack traces

Spans may have an associated stack trace, in order to locate the associated source code that caused the span to occur. If there are many spans being collected this can cause a significant amount of overhead in the application, due to the capture, rendering, and transmission of potentially large stack traces. It is possible to limit the recording of span stack traces to only spans that are slower than a specified duration, using the config variable `ELASTIC_APM_SPAN_FRAMES_MIN_DURATION`.
Expand Down
20 changes: 20 additions & 0 deletions specs/agents/tracing-transactions.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,25 @@ If an agent doesn't report the `outcome` (or reports `null`), the APM Server set

What counts as a failed or successful request depends on the protocol and does not depend on whether there are error documents associated with a transaction.

##### Error rate

The error rate of a transaction group is based on the `outcome` of its transactions.

error_rate = failure / (failure + success)

Note that when calculating the error rate,
transactions with an `unknown` or non-existent outcome are not considered.

The calculation just looks at the subset of transactions where the result is known and extrapolates the error rate for the total population.
This avoids that `unknown` or non-existant outcomes reduce the error rate,
which would happen when looking at a mix of old and new agents,
or when looking at RUM data (as page load transactions have an `unknown` outcome).

Also note that this only reflects the error rate as perceived from the application itself.
The error rate perceived from its clients is greater or equal to that.

##### Outcome API

Agents should expose an API to manually override the outcome.
This value must always take precedence over the automatically determined value.
The documentation should clarify that transactions with `unknown` outcomes are ignored in the error rate calculation.

0 comments on commit 362e2a1

Please sign in to comment.