forked from Rmadillo/spc_healthcare_with_r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
04-RunCharts.Rmd
executable file
·50 lines (32 loc) · 3.77 KB
/
04-RunCharts.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Run charts {#runcharts}
Run charts are meant to show a metric of interest over time. They do not rely on parametric statistical theory, so they cannot distinguish between common cause and special cause variation. However, improperly-constructed control charts can't either, so run charts can be more readily used where statistical knowledge is limited. So while run charts are not as always as powerful as control charts, they can still provide practical monitoring and useful insights into the process.
Run charts typically employ the median for the reference line. Run charts help you determine whether there are unusual runs in the data, which suggest non-random variation. They tend to be better than control charts in trying to detect moderate (~1.5$\sigma$) changes in process than using the control charts' typical 3$\sigma$ limits rule alone. In other words, a run chart can be more useful when trying to detect improvement while that improvement work is still going on, where a control chart may be inappropriate
## Interpreting run charts
There are basically two "tests" for run charts (an astronomical data point or looking for cycles aren't tests *per se*):
- *Process shift:* A non-random run is a set of $log_2(n) + 3$ consecutive data points (rounded to the nearest integer) that are all above or all below the median line, where *n* is the number of points that do *not* fall directly on the median line. For example, if you have 34 points, and 2 fall on the median, you have 32 observations for *n*. Thus in this case, the longest run should be 8 points.
- *Number of crossings:* Too many or too few runs---or more simply, median line crossings---suggest a pattern inconsistent with natural variation. You can use the binomial distribution (`qbinom(0.05, n-1, 0.50)` in R for a 5% false positive rate and because we expect 50% of values to lie on each side of the median) or a table (e.g., [Table 1 in Anhøj & Olesen 2014](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0113825#pone-0113825-t001)) to find the minimum number of crossings you would expect. Using the same data as in the example introduced in [Chapter 1](#where), we would expect the time series to cross the median at least 11 times.
```{r qichart, fig.width=8, fig.height=3.5}
# The qic function in the qicharts package creates easy run and control charts
# The runvals option shows the test results on the plot
qic(point.signal, runvals=TRUE, ylab="Value", main="")
```
## Run charts with a trend {#runtrend}
When you have a trend like that seen above, you'll generally wait until the process has settled to a new, stable mean, and reset the central line and control limits accordingly.
But sometimes you have a trend that lasts for a long time, or is continuously trending. You can difference the data to remove the trend (create a new dataset by subtracting the value at time *t* from the value at time *t+1*), but that makes it harder to interpret. Perhaps a better idea is use quantile regression to obtain the median line, which allows you to keep the data on the original scale.
```{r quantreg, fig.height=3}
# Generate fake process data with an upward trend
set.seed(81)
n = 36
x = seq(1:n)
mb = data.frame(x = x, y = 10000 + (x * 1.25) + (rnorm(n, 0, 5)))
# Plot with quantile regression for median line
ggplot(mb, aes(x, y)) +
xlab("Subgroup") +
ylab("Value") +
geom_line(color="gray70") +
geom_point() +
geom_quantile(quantiles = 0.5)
```
**Process shift "test"**: $log_2(n) + 3$ where $n$ is 33 points that do not touch the median is `r log2(33) + 3`, so there should be no more than 8 points in any given run.
**Crossings "test"**: `qbinom(0.05, 33, 0.50)` is `r qbinom(0.05, 33, 0.50)`, the minimum number of crossings we'd expect.
Both tests suggest there is no non-random variation in this process.