-
Notifications
You must be signed in to change notification settings - Fork 188
/
06-pooling_effect_sizes.Rmd
1642 lines (1082 loc) · 97.6 KB
/
06-pooling_effect_sizes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Pooling Effect Sizes {#pooling-es}
---
<img src="_figs/pooling_es.jpg" />
<br></br>
<span class="firstcharacter">A</span> long and winding road already lies behind us. Fortunately, we have now reached the core part of every meta-analysis: the pooling of effect sizes. We hope that you were able to resist the temptation of starting directly with this chapter. We have already discussed various topics in this book, including the definition of research questions, guidelines for searching, selecting, and extracting study data, as well as how to prepare our effect sizes.
Thorough preparation is a key ingredient of a good meta-analysis, and will be immensely helpful in the steps that are about to follow. We can assure you that the time you spent working through the previous chapters was well invested.
\index{meta Package}
There are many packages which allow us to pool effect sizes in _R_. Here, we will focus on functions of the **{meta}** package, which we already installed in Chapter \@ref(packages). This package is very user-friendly and provides us with nearly all important meta-analysis results using just a few lines of code. In the previous chapter, we covered that effect sizes come in different "flavors", depending on the outcome of interest. The **{meta}** package contains specialized meta-analysis functions for each of these effect size metrics. All of the functions also follow nearly the same structure.
Thus, once we have a basic understanding of how **{meta}** works, coding meta-analyses becomes straightforward, no matter which effect size we are focusing on. In this chapter, we will cover the general structure of the **{meta}** package. And of course, we will also explore the meta-analysis functions of the package in greater detail using hands-on examples.
The **{meta}** package allows us to tweak many details about the way effect sizes are pooled. As we previously mentioned, meta-analysis comes with many "researcher degrees of freedom". There are a myriad of choices concerning the statistical techniques and approaches we can apply, and if one method is better than the other often depends on the context.
\index{Fixed-Effect Model}
\index{Random-Effects Model}
Before we begin with our analyses in _R_, we therefore have to get a basic understanding of the statistical assumptions of meta-analyses, and the maths behind it. Importantly, we will also discuss the "idea" behind meta-analyses. In statistics, this "idea" translates to a **model**, and we will have a look at what the meta-analytic model looks like.
As we will see, the nature of the meta-analysis requires us to make a fundamental decision right away: we have to assume either a **fixed-effect model** or a **random-effects model**. Knowledge of the concept behind meta-analytic pooling is needed to make an informed decision which of these two models, along with other analytic specifications, is more appropriate in which context.
<br></br>
## The Fixed-Effect and Random-Effects Model {#fem-rem}
---
Before we specify the meta-analytic model, we should first clarify what a statistical model actually is. Statistics is full of "models", and it is likely that you have heard the term in this context before. There are "linear models", "generalized linear models", "mixture models", "gaussian additive models", "structural equation models", and so on.
The ubiquity of models in statistics indicates how important this concept is. In one way or the other, models build the basis of virtually all parts of our statistical toolbox. There is a model behind $t$-tests, ANOVAs, and regression. Every hypothesis test has its corresponding statistical model.
When defining a statistical model, we start with the information that is already given to us. This is, quite literally, our **data**^["Data" is derived from the Latin word **datum**, meaning "a thing that is given".]. In meta-analyses, the data are effect sizes that were observed in the included studies. Our model is used to describe the process through which these observed data were generated.
The data are seen as the product of a **black box**, and our model aims to illuminate what is going on inside that black box.
```{r model, out.width='50%', message = F, echo = F, fig.align='center'}
library(OpenImageR)
knitr::include_graphics('images/model_concept_sep.png')
```
Typically, a statistical model is like a special type of **theory**. Models try to explain the mechanisms that generated our observed data, especially when those mechanisms themselves cannot be directly observed. They are an **imitation of life**, using a mathematical formula to describe processes in the world around us in an idealized way.
This explanatory character of models is deeply ingrained in modern statistics, and meta-analysis is no exception. The conceptualization of models as a vehicle for explanation is the hallmark of a statistical "culture" to which, as Breiman [-@breiman2001statistical] famously estimated, 98% of all statisticians adhere.
By specifying a statistical model, we try to find an approximate representation of the "reality" behind our data. We want a mathematical formula that explains how we can find the **true** effect size underlying all of our studies, based on their observed results. As we learned in Chapter \@ref(what-are-mas), one of the ultimate goals of meta-analysis is to find one numerical value that characterizes our studies **as a whole**, even though the observed effect sizes vary from study to study. A meta-analysis model must therefore explain the reasons why and how much observed study results differ, even though there is only one overall effect.
There are two models which try to answer exactly this question, the **fixed-effect model** and the **random-effects model**. Although both are based on different assumptions, there is still a strong link between them, as we will soon see.
<br></br>
### The Fixed-Effect Model {#fem}
---
\index{Fixed-Effect Model}
\index{Sampling Error}
The fixed-effect model assumes that all effect sizes stem from a single, homogeneous population. It states that all studies share the **same** true effect size. This true effect is the overall effect size we want to calculate in our meta-analysis, denoted with $\theta$.
According to the fixed-effect model, the only reason why a study $k$'s observed effect size $\hat\theta_k$ deviates from $\theta$ is because of its sampling error $\epsilon_k$. The fixed-effect model tells us that the process generating studies' different effect sizes, the content of the "black box", is simple: all studies are estimators of the same true effect size. Yet, because every study can only draw somewhat bigger or smaller samples of the infinitely large study population, results are burdened by sampling error. This sampling error causes the observed effect to deviate from the overall, true effect.
We can describe the relationship like this [@borenstein2011introduction, chapter 11]:
\begin{equation}
\hat\theta_k = \theta + \epsilon_k
(\#eq:pes1)
\end{equation}
To the alert reader, this formula may seem oddly similar to the one in Chapter \@ref(what-is-es). You are not mistaken. In the previous formula, we defined that an observed effect size $\hat\theta_k$ of some study $k$ is an estimator of that study's true effect size $\theta_k$, burdened by the study's sampling error $\epsilon_k$.
There is only a tiny, but insightful difference between the previous formula, and the one of the fixed-effect model. In the formula of the fixed-effect model, the true effect size is not symbolized by $\theta_k$, but by $\theta$; the subscript $k$ is dropped.
Previously, we only made statements about the true effect size of **one** individual study $k$. The fixed-effect model goes one step further. It tells us that if we find the true effect size of study $k$, this effect size is not only true for $k$ specifically, but for **all** studies in our meta-analysis. A **study's** true effect size $\theta_k$, and the **overall**, pooled effect size $\theta$, are **identical**.
```{block2, type='boxinfo'}
The **idea behind the fixed-effect model** is that observed effect sizes may vary from study to study, but this is only because of the sampling error. In reality, their true effect sizes are **all the same**: they are fixed. For this reason, the fixed-effect model is sometimes also referred to as the **"equal effects"** or **"common effect"** model.^[The term "equal effects model" is used by the **{metafor}** package, while **{meta}** uses the term "common effect model".]
```
The formula of the fixed-effect models tells us that there is only one reason why observed effect sizes $\theta_k$ deviate from the true overall effect: because of the sampling error $\epsilon_k$. In Chapter \@ref(what-is-es), we already discussed that there is a link between the sampling error and the sample size of a study. All things being equal, as the sample size becomes larger, the sampling error becomes smaller. We also learned that the sampling error can be represented numerically by the **standard error**, which also grows smaller when the sample size increases.
Although we do not know the true overall effect size of our studies, we can exploit this relationship to arrive at the best possible estimate of the true overall effect, $\hat\theta$. We know that a smaller standard error corresponds with a smaller sampling error; therefore, studies with a small standard error should be better estimators of the true overall effect than studies with a large standard error.
We can illustrate this with a simulation. Using the `rnorm` function we already used before, we simulated a selection of studies in which the true overall effect is $\theta = 0$. We took several samples but varied the sample size so that the standard error differs between the "observed" effects. The results of the simulation can be found in Figure \@ref(fig:funnel1).
```{r funnel1, fig.height=5, fig.width=5, fig.cap='Relationship between effect size and standard error.', echo=F, fig.align='center', message=FALSE, out.width="50%"}
library(plotrix)
library(data.table)
library(ggplot2)
set.seed(1234)
res = list()
for (i in 5:54){
vec = list()
for (x in 1:50){
dat = rnorm(i, 0, sd = 10)
vec[[x]] = data.frame(value = mean(dat),
SE = std.error(dat))
}
vec = do.call(rbind, vec)
res[[i-4]] = vec
}
res = rbindlist(res)
ggplot(data = res, aes(x = value, y = log(SE))) +
geom_point(alpha = 0.5, size = 0.8) +
scale_y_reverse() +
geom_vline(xintercept = 0, color = "gray", size = 2, alpha = 0.5) +
theme_classic() +
xlab("Effect Size") +
ylab("log-Standard Error") +
theme(panel.background = element_rect(fill = "#FFFEFA",
size = 0),
plot.background = element_rect(fill = "#FFFEFA",
size = 0))
```
The results of the simulation show an interesting pattern. We see that effect sizes with a small sampling error are tightly packed around the true effect size $\theta = 0$. As the standard error on the y-axis^[We log-transformed the standard error before plotting so that the pattern can be more easily seen.] increases, the **dispersion** of effect sizes becomes larger and larger, and the observed effects deviate more and more from the true effect.
This behavior can be predicted by the formula of the fixed-effect model. We know that studies with a smaller standard error have a smaller sampling error, and their estimate of the overall effect size is therefore more likely to be closer to the truth.
\index{Weight}
We have seen that, while all observed effect sizes are estimators of the true effect, some are better than others. When we pool the effects in our meta-analysis, we should therefore give effect sizes with a higher **precision** (i.e. a smaller standard error) a greater **weight**. If we want to calculate the pooled effect size under the fixed-effect model, we therefore simply use a **weighted average** of all studies.
To calculate the weight $w_k$ for each study $k$, we can use the standard error, which we square to obtain the **variance** $s^2_k$ of each effect size. Since a **lower** variance indicates higher precision, the **inverse** of the variance is used to determine the weight of each study.
\begin{equation}
w_k = \frac{1}{s^2_k}
(\#eq:pes2)
\end{equation}
Once we know the weights, we can calculate the weighted average, our estimate of the true pooled effect $\hat\theta$. We only have to multiply each study's effect size $\hat\theta_k$ with its corresponding weight $w_k$, sum the results across all studies $K$ in our meta-analysis, and then divide by the sum of all the individual weights.
\begin{equation}
\hat\theta = \frac{\sum^{K}_{k=1} \hat\theta_kw_k}{\sum^{K}_{k=1} w_k}
(\#eq:pes3)
\end{equation}
\index{Inverse-Variance Weighting}
\index{Mantel-Haenszel Method}
\index{Peto Method}
This method is the most common approach to calculate average effects in meta-analyses. Because we use the inverse of the variance, it is often called **inverse-variance weighting** or simply **inverse-variance meta-analysis**.
For binary effect size data, there are alternative methods to calculate the weighted average, including the **Mantel-Haenszel**, **Peto**, or the sample size weighting method by Bakbergenuly [-@bakbergenuly2020methods]. We will discuss these methods in Chapter \@ref(pooling-or-rr).
The **{meta}** package makes it very easy to perform a fixed-effect meta-analysis. Before, however, let us try out the inverse-variance pooling "manually" in _R_. In our example, we will use the `SuicidePrevention` data set, which we already imported in Chapter \@ref(data-prep-R).
\index{dmetar Package}
```{block2, type='boxdmetar'}
**The "SuicidePrevention" Data Set**
The `SuicidePrevention` data set is also included directly in the **{dmetar}** package. If you have installed **{dmetar}**, and loaded it from your library, running `data(SuicidePrevention)` automatically saves the data set in your _R_ environment. The data set is then ready to be used. If you do not have **{dmetar}** installed, you can download the data set as an _.rda_ file from the [Internet](https://www.protectlab.org/meta-analysis-in-r/data/suicideprevention.rda), save it in your working directory, and then click on it in your R Studio window to import it.
```
\index{esc Package}
\index{Standardized Mean Difference}
\index{Hedges' \textit{g}}
The `SuicidePrevention` data set contains raw effect size data, meaning that we have to calculate the effect sizes first. In this example, we calculate the small-sample adjusted standardized mean difference (Hedges' $g$). To do this, we use the `esc_mean_sd` function in the **{esc}** package (Chapter \@ref(b-group-smd)).
The function has an additional argument, `es.type`, through which we can specify that the small-sample correction should be performed (by setting `es.type = "g"`; Chapter \@ref(hedges-g)).
Since the release of _R_ version 4.2.1, we additionally have to plug our call to `esc_mean_sd` into the `pmap_dfr` function so that a standardized mean difference is calculated for each row in our data set:
```{r, message=F, eval=F}
# Load dmetar, esc and tidyverse (for pipe)
library(dmetar)
library(esc)
library(tidyverse)
# Load data set from dmetar
data(SuicidePrevention)
# Calculate Hedges' g and the Standard Error
# - We save the study names in "study".
# - We use the pmap_dfr function to calculate the effect size
# for each row.
SP_calc <- pmap_dfr(SuicidePrevention,
function(mean.e, sd.e, n.e, mean.c,
sd.c, n.c, author, ...){
esc_mean_sd(grp1m = mean.e,
grp1sd = sd.e,
grp1n = n.e,
grp2m = mean.c,
grp2sd = sd.c,
grp2n = n.c,
study = author,
es.type = "g") %>%
as.data.frame()})
# Let us catch a glimpse of the data
# The data set contains Hedges' g ("es") and standard error ("se")
glimpse(SP_calc)
```
```
## Rows: 9
## Columns: 9
## $ study <chr> "Berry et al.", "DeVries et …
## $ es <dbl> -0.14279447, -0.60770928, -0…
## $ weight <dbl> 46.09784, 34.77314, 14.97625…
## $ sample.size <dbl> 185, 146, 60, 129, 100, 220,…
## $ se <dbl> 0.1472854, 0.1695813, 0.2584…
## $ var <dbl> 0.02169299, 0.02875783, 0.06…
## $ ci.lo <dbl> -0.4314686, -0.9400826, -0.6…
## $ ci.hi <dbl> 0.145879624, -0.275335960, 0…
## $ measure <chr> "g", "g", "g", "g", "g", "g"…
```
Next, we use these results to apply the formula of the fixed-effect model:
```{r, message=F, eval=F}
# Calculate the inverse variance-weights for each study
SP_calc$w <- 1/SP_calc$se^2
# Then, we use the weights to calculate the pooled effect
pooled_effect <- sum(SP_calc$w*SP_calc$es)/sum(SP_calc$w)
pooled_effect
```
```
## [1] -0.2311121
```
The results of our calculations reveal that the pooled effect size, assuming a fixed-effect model, is $g \approx$ -0.23.
<br></br>
### The Random-Effects Model {#rem}
---
\index{Random-Effects Model}
As we have seen, the fixed-effect model is one way to conceptualize the genesis of our meta-analysis data, and how effects can be pooled. However, the important question is: does this approach adequately reflect reality?
The fixed-effect model assumes that all our studies are part of a homogeneous population and that the only cause for differences in observed effects is the sampling error of studies. If we were to calculate the effect size of each study without sampling error, all true effect sizes would be absolutely the same.
\index{Heterogeneity}
Subjecting this notion to a quick reality check, we see that the assumptions of the fixed-effect model might be too simplistic in many real-world applications. It is simply unrealistic that studies in a meta-analysis are always completely homogeneous. Studies will very often differ, even if only in subtle ways. The outcome of interest may have been measured in different ways. Maybe the type of treatment was not exactly the same or the intensity and length of the treatment. The target population of the studies may not have been exactly identical, or maybe there were differences in the control groups that were used.
It is likely that the studies in your meta-analysis will not only vary on one of these aspects but several ones at the same time. If this is true, we can anticipate considerable between-study **heterogeneity** in the true effects.
All of this casts the validity of the fixed-effect model into doubt. If some studies used different types of a treatment, for example, it seems perfectly normal that one format is more effective than the other. It would be far-fetched to assume that these differences are only noise, produced by the studies' sampling error.
Quite the opposite, there may be countless reasons why **real** differences exist in the **true** effect sizes of studies. The random-effects model addresses this concern. It provides us with a model that often reflects the reality behind our data much better.
\index{Sampling Error}
In the random-effects model, we want to account for the fact that effect sizes show more variance than when drawn from a single homogeneous population [@hedges1998fixed]. Therefore, we assume that effects of individual studies do not only deviate due to sampling error alone but that there is **another** source of variance.
This additional variance component is introduced by the fact that studies do not stem from one single population. Instead, each study is seen as an independent draw from a “universe” of populations.
```{block2, type='boxinfo'}
The random-effects model assumes that there is not only one true effect size but a **distribution** of true effect sizes. The goal of the random-effects model is therefore not to estimate the one true effect size of all studies, but the **mean** of the **distribution** of true effects.
```
Let us see how the random-effects model can be expressed in a formula. Similar to the fixed-effect model, the random-effects model starts by assuming that an observed effect size $\hat\theta_k$ is an estimator of the study's true effect size $\theta_k$, burdened by sampling error $\epsilon_k$:
\begin{equation}
\hat\theta_k = \theta_k + \epsilon_k
(\#eq:pes4)
\end{equation}
The fact that we use $\theta_k$ instead of $\theta$ already points to an important difference. The random-effects model only assumes that $\theta_k$ is the true effect size of **one** single study $k$. It stipulates that there is a second source of error, denoted by $\zeta_k$. This second source of error is introduced by the fact that even the true effect size $\theta_k$ of study $k$ is only part of an over-arching distribution of true effect sizes with mean $\mu$.
\begin{equation}
\theta_k = \mu + \zeta_k
(\#eq:pes5)
\end{equation}
The random-effects model tells us that there is a hierarchy of two processes happening inside our black box [@thompson2001multilevel]: the observed effect sizes of a study deviate from their true value because of the sampling error. But even the true effect sizes are only a draw from a universe of true effects, whose mean $\mu$ we want to estimate as the pooled effect of our meta-analysis.
By plugging the second formula into the first one (i.e. replacing $\theta_k$ with its definition in the second formula), we can express the random-effects model in one line [@borenstein2011introduction, chapter 12]:
\begin{equation}
\hat\theta_k = \mu + \zeta_k + \epsilon_k
(\#eq:pes6)
\end{equation}
This formula makes it clear that our observed effect size deviates from the pooled effect $\mu$ because of two error terms, $\zeta_k$ and $\epsilon_k$. This relationship is visualized in Figure \@ref(fig:random).
\index{Exchangeability Assumption}
A crucial assumption of the random-effects model is that the size of $\zeta_k$ is **independent** of $k$. Put differently, we assume that there is nothing which indicates **a priori** that $\zeta_k$ in one study is higher than in another. We presuppose that the size of $\zeta_k$ is a product of chance, and chance alone.
This is known as the **exchangeability** assumption of the random-effects model [@higgins2009re; @lunn2012bugs, chapter 10.1]. All true effect sizes are assumed to be exchangeable in so far as we have nothing that could tell us how big $\zeta_k$ will be in some study $k$ before seeing the data.
\index{Heterogeneity}
```{block2, type='boxinfo'}
**Which Model Should I Use?**
\vspace{2mm}
In practice, is it very uncommon to find a selection of studies that is perfectly homogeneous. This is true even when we follow best practices, and try to make the scope of our analysis as precise as possible through our PICO (Chapter \@ref(research-question)).
\vspace{4mm}
In many fields, including medicine and the social sciences, it is therefore conventional to **always** use a random-effects model, since some degree of between-study heterogeneity can virtually always be anticipated. A fixed-effect model may only be used when we could not detect any between-study heterogeneity (we will discuss how this is done in Chapter \@ref(heterogeneity)) **and** when we have very good reasons to assume that the true effect is fixed. This may be the case when, for example, only exact replications of a study are considered, or when we meta-analyze subsets of one big study. Needless to say, this is seldom the case, and applications of the fixed-effect model "in the wild" are rather rare.
\vspace{4mm}
Even though it is conventional to use the random-effects model a priori, this approach is not undisputed. The random-effects model pays more attention to small studies when calculating the overall effect of a meta-analysis [@schwarzer2015meta, chapter 2.3]. Yet, small studies in particular are often fraught with biases (see Chapter \@ref(small-study-effects)). This is why some have argued that the fixed-effect model is (sometimes) preferable [@poole1999random; @furukawa2003low]. Stanley, Doucouliagos, and Ioannidis [-@stanley2022beyond] make a similar point and argue that, in some disciplines, a so-called "unrestricted weighted least squares" (UWLS) model should be used instead of the random-effects model.
```
```{r random, fig.cap='Illustration of parameters of the random-effects model.', out.width='60%', message = F, echo = F, fig.align='center'}
library(OpenImageR)
knitr::include_graphics('images/rem_sep2.png')
```
<br></br>
#### Estimators of the Between-Study Heterogeneity {#tau-estimators}
---
\index{Weight}
The challenge associated with the random-effects model is that we have to take the error $\zeta_k$ into account. To do this, we have to estimate the **variance** of the distribution of true effect sizes. This variance is known as $\tau^2$, or **tau-squared**. Once we know the value of $\tau^2$, we can include the between-study heterogeneity when determining the inverse-variance weight of each effect size.
In the random-effects model, we therefore calculate an adjusted **random-effects weight** $w^*_k$ for each observation. The formula looks like this:
\begin{equation}
w^*_k = \frac{1}{s^2_k+\tau^2}
(\#eq:pes7)
\end{equation}
\index{Inverse-Variance Weighting}
Using the adjusted random-effects weights, we then calculate the pooled effect size using the inverse variance method, just like we did using the fixed-effect model:
\begin{equation}
\hat\theta = \frac{\sum^{K}_{k=1} \hat\theta_kw^*_k}{\sum^{K}_{k=1} w^*_k}
(\#eq:pes8)
\end{equation}
There are several methods to estimate $\tau^2$, most of which are too complicated to do by hand. Luckily, however, these estimators are implemented in the functions of the **{meta}** package, which does the calculations automatically for us. Here is a list of the most common estimators, and the code by which they are referenced in **{meta}**:
\index{DerSimonian-Laird Estimator}
\index{Restricted Maximum Likelihood Estimator}
\index{Maximum Likelihood}
\index{Sidik-Jonkman Estimator}
* The **DerSimonian-Laird** (`"DL"`) estimator [@dersimonian1986meta].
* The **Restricted Maximum Likelihood** (`"REML"`) or **Maximum Likelihood** (`"ML"`) procedures [@viechtbauer2005bias].
* The **Paule-Mandel** (`"PM"`) procedure [@paule1982consensus].
* The **Empirical Bayes** (`"EB"`) procedure [@sidik2019note], which is practically identical to the Paule-Mandel method.
* The **Sidik-Jonkman** (`"SJ"`) estimator [@sidik2005simple].
It is an ongoing research question which of these estimators performs best for different kinds of data. If one of the approaches is better than the other often depends on parameters such as the number of studies $k$, the number of participants $n$ in each study, how much $n$ varies from study to study, and how big $\tau^2$ is. Several studies have analyzed the bias of $\tau^2$ estimators under these varying scenarios [@veroniki2016methods; @viechtbauer2005bias; @sidik2007comparison; @langan2019comparison].
\index{Review Manager (RevMan)}
\index{Comprehensive Meta-Analysis (CMA)}
Arguably, the most frequently used estimator is the one by DerSimonian and Laird. The estimator is implemented in software that has commonly been used by meta-analysts in the past, such as **RevMan** (a program developed by Cochrane) or **Comprehensive Meta-Analysis**. It also used to be the default estimator used in **{meta}**^[In newer versions of **{meta}**, the Restricted Maximum Likelihood (`"REML"`) estimator is now used as the default.]. Due to this historic legacy, one often finds research papers in which "using a random-effects model" is used synonymous with employing the DerSimonian-Laird estimator.
However, it has been found that this estimator can be biased, particularly when the number of studies is small and heterogeneity is high [@hartung1999alternative; @hartung2001refined; @hartung2001tests; @follmann1999valid; @makambi2004effect]. This is quite problematic because it is very common to find meta-analyses with few studies and high heterogeneity.
\index{Paule-Mandel Estimator}
\index{metafor Package}
\index{Sidik-Jonkman Estimator}
\index{Restricted Maximum Likelihood Estimator}
In an overview paper, Veroniki and colleagues [-@veroniki2016methods] reviewed evidence on the robustness of various $\tau^2$ estimators. They recommended the Paule-Mandel method for both binary and continuous effect size data, and the restricted maximum likelihood estimator for continuous outcomes. The restricted maximum-likelihood estimator is also the default method used by the **{metafor}** package.
A more recent simulation study by Langan and colleagues [-@langan2019comparison] came to a similar result but found that the Paule-Mandel estimator may be suboptimal when the sample size of studies varies drastically. Another study by Bakbergenuly and colleagues [-@bakbergenuly2020methods] found that the Paule-Mandel estimator is well suited especially when the number of studies is small. The Sidik-Jonkman estimator, also known as the **model error variance method**, is only well suited when $\tau^2$ is very large [@sidik2007comparison].
```{block2, type='boxinfo'}
**Which Estimator Should I Use?**
\vspace{2mm}
There are no iron-clad rules when exactly which estimator should be used. In many cases, there will only be minor differences in the results produced by various estimators, meaning that you should not worry about this issue **too** much.
\vspace{4mm}
When in doubt, you can always rerun your analyses using different $\tau^2$ estimators, and see if this changes the interpretation of your results. Here are a few tentative guidelines that you may follow in your own meta-analysis:
1. For effect sizes based on continuous outcome data, the restricted maximum likelihood estimator may be used as a first start.
2. For binary effect size data, the Paule-Mandel estimator is a good first choice, provided there is no extreme variation in the sample sizes.
3. When you have very good reason to believe that the heterogeneity of effects in your sample is very large, and if avoiding false positives has a very high priority, you may use the Sidik-Jonkman estimator.
4. If you want that others can replicate your results as precisely as possible outside _R_, the DerSimonian-Laird estimator is the method of choice.
\vspace{2mm}
```
Overall, estimators of $\tau^2$ fall into two categories. Some, like the DerSimonian-Laird and Sidik-Jonkman estimator, are based on **closed-form expressions**, meaning that they can be directly calculated using a formula.
The (restricted) maximum likelihood, Paule-Mandel and empirical Bayes estimator find the optimal value of $\tau^2$ through an **iterative algorithm**. Latter estimators may therefore sometimes take a little longer to calculate the results. In most real-world cases, however, these time differences are minuscule at best.
<br></br>
#### Knapp-Hartung Adjustments {#knapp-hartung}
---
\index{Knapp-Hartung Adjustment}
\index{Wald-Type Test}
\index{t-Distribution}
In addition to our selection of the $\tau^2$ estimator, we also have to decide if we want to apply so-called Knapp-Hartung adjustments^[This approach is also known as "Hartung-Knapp adjustments" or the "Hartung-Knapp-Sidik-Jonkman" (HKSJ) method.] [@knapp2003improved; @sidik2002simple]. These adjustments affect the way the standard error (and thus the confidence intervals) of our pooled effect size $\hat\theta$ is calculated.
The Knapp-Hartung adjustment tries to control for the uncertainty in our estimate of the between-study heterogeneity. While significance tests of the pooled effect usually assume a normal distribution (so-called **Wald-type** tests), the Knapp-Hartung method is based on a $t$-distribution. Knapp-Hartung adjustments can only be used in random-effects models, and usually cause the confidence intervals of the pooled effect to become slightly larger.
\index{Heterogeneity}
```{block, type='boxreport'}
**Reporting the Type of Model Used In Your Meta-Analysis**
\vspace{2mm}
It is highly advised to specify the type of model you used in the methods section of your meta-analysis report. Here is an example:
> _"As we anticipated considerable between-study heterogeneity, a random-effects model was used to pool effect sizes. The restricted maximum likelihood estimator (Viechtbauer, 2005) was used to calculate the heterogeneity variance $\tau^2$. We used Knapp-Hartung adjustments (Knapp & Hartung, 2003) to calculate the confidence interval around the pooled effect."_
```
Applying a Knapp-Hartung adjustment is usually sensible. Several studies [@inthout2014hartung; @langan2019comparison] showed that these adjustments can reduce the chance of false positives, especially when the number of studies is small.
The use of the Knapp-Hartung adjustment, however, is not uncontroversial. Wiksten and colleagues [-@wiksten2016hartung], for example, argued that the method can cause anti-conservative results in (seldom) cases when the effects are very homogeneous.
<br></br>
## Effect Size Pooling in _R_ {#pooling-es-r}
---
\index{meta Package}
Time to put what we learned into practice. In the rest of this chapter, we will explore how we can run meta-analyses of different effect sizes directly in _R_. The **{meta}** package we will use to do this has a special structure. It contains several meta-analysis functions which are each focused on one type of effect size data. There is a set of parameters which can be specified in the same way across all of these functions; for example if we want to apply a fixed- or random-effects model, or which $\tau^2$ estimator should be used. Apart from that, there are **function-specific** arguments which allow us to tweak details of our meta-analysis that are only relevant for a specific type of data.
Figure \@ref(fig:metaflow) provides an overview of **{meta}**'s structure. To determine which function to use, we first have to clarify what kind of effect size data we want to synthesize. The most fundamental distinction is the one between **raw** and **pre-calculated** effect size data. We speak of "raw" data when we have all the necessary information needed to calculate the desired effect size stored in our data frame but have not yet calculated the actual effect size. The `SuicidePrevention` data set we used earlier contains raw data: the mean, standard deviation and sample size of two groups, which is needed to calculate the standardized mean difference.
We call effect size data "pre-calculated", on the other hand, when they already contain the final effect size of each study, as well as the standard error. If we want to use a corrected version of an effect metric (such as Hedges' $g$, Chapter \@ref(hedges-g)), it is necessary that this correction has already been applied to pre-calculated effect size data before we start the pooling.
```{r metaflow, out.width='100%', fig.cap='Conceptual overview of meta-analysis functions.', message = F, echo = F, fig.align='center'}
library(OpenImageR)
knitr::include_graphics('images/meta_flow_sep.png')
```
If possible, it is preferable to use raw data in our meta-analysis. This makes it easier for others to understand how we calculated the effect sizes, and replicate the results. Yet, using raw data is often not possible in practice, because studies often report their results in a different way (Chapter \@ref(es-formats-different)).
This leaves us no other choice than to pre-calculate the desired effect size for each study right away so that all have the same format. Chapter \@ref(es-calc) in the “Helpful Tools” part of this book presents a few formulas which can help you to convert a reported effect size into the desired metric.
The function of choice for pre-calculated effect sizes is `metagen`. Its name stands for **gen**eric inverse variance meta-analysis. If we use `metagen` with binary data (e.g. proportions, risk ratios, odds ratios), it is important, as we covered in Chapter \@ref(ratios), that the effect sizes are log-transformed before the function is used.
When we can resort to raw effect size data, **{meta}** provides us with a specialized function for each effect size type. We can use the `metamean`, `metacont` and `metacor` function for means, (standardized) mean differences and correlations, respectively. We can pool (incidence) rates, proportions and incidence rate ratios using the `metarate`, `metaprop` and `metainc` functions. The `metabin` function can be employed when we are dealing with risk or odds ratios.
All meta-analysis functions in **{meta}** follow the same structure. We have to provide the functions with the (raw or pre-calculated) effect size data, as well as further arguments, which control the specifics of the analysis. There are six core arguments which can be specified in each function:
* **`studlab`**. This argument associates each effect size with a **study label**. If we have the name or authors of our studies stored in our data set, we simply have to specify the name of the respective column (e.g. `studlab = author`).
* **`sm`**. This argument controls the **summary measure**, the effect size metric we want to use in our meta-analysis. This option is particularly important for functions using raw effect size data. The **{meta}** package uses codes for different effect size formats, for example `"SMD"` or `"OR"`. The available summary measures are not the same in each function, and we will discuss the most common options in each case in the following sections.
* **`fixed`**. We need to provide this argument with a logical (`TRUE` or `FALSE`), indicating if a fixed-effect model meta-analysis should be calculated^[In older versions of **{meta}** (before version 5.0-0), this argument is called `comb.fixed`.].
* **`random`**. In a similar fashion, this argument controls if a random-effects model should be used. If both `comb.fixed` and `comb.random` are set to `TRUE`, both models are calculated and displayed^[In older versions of **{meta}** (before version 5.0-0), this argument is called `comb.random`.].
* **`method.tau`**. This argument defines the $\tau^2$ estimator. All functions use the codes for different estimators that we already presented in the previous chapter (e.g. for the DerSimonian-Laird method: `method.tau = "DL"`).
* **`method.random.ci`**. This argument controls how confidence intervals should be calculated when using the random-effects model. When setting `method.random.ci = "HK"`, the Knapp-Hartung adjustment will be applied. In older versions of **{meta}**, this feature was controlled by setting the `hakn` argument to `TRUE` instead. Using the Knapp-Hartung adjustment is advisable in most cases (see Chapter \@ref(knapp-hartung)).
* **`data`**. In this argument, we provide **{meta}** with the name of our meta-analysis data set.
* **`title`** (**not mandatory**). This argument takes a character string with the name of the analysis. While it is not essential to provide input for this argument, it can help us to identify the analysis later on.
There are also a few additional arguments which we will get to know in later chapters. In this guide, we will not be able to discuss **all** arguments of the **{meta}** functions: there are more than 100.
Thankfully, most of these arguments are rarely needed or have sensible defaults. When in doubt, you can always run the name of the function, preceded by a question mark (e.g. `?metagen`) in the _R_ console; this will open the function documentation.
\index{Function Argument}
\index{Position Matching}
\index{Documentation, _R_}
```{block, type='boxinfo'}
**Default Arguments & Position Matching**
\vspace{2mm}
For _R_ beginners, it is often helpful to learn about **default arguments** and **position-based matching** in functions.
\vspace{2mm}
Default arguments are specified by the person who wrote the function. They set a function argument to a predefined value, which is automatically used unless we explicitly provide a different value. In **{meta}** many, but not all arguments have default values.
\vspace{2mm}
Default values are displayed in the "usage" section of the function documentation. If a function has defined a default value for an argument, it is not necessary to include it in our function call, unless we are not satisfied with the default behavior.
\vspace{2mm}
Arguments **without** default values always need to be specified in our function call. The **{meta}** package has a convenience function called `gs` which we can use to check the default value used for a specific argument. For example, try running `gs("method.tau")`. If there is no default value, `gs` will return `NULL`.
\vspace{2mm}
Another interesting detail about _R_ functions is position matching. Usually, we have to write down the name of an argument and its value in a function call. Through position matching, however, we can leave out the name of the argument, and only have to type in the argument value. We can do this if we specify the argument in the same **position** in which it appears in the documentation.
Take the `sqrt` function. A written out call of this function would be `sqrt(x = 4)`. However, because we know that `x`, the number, is the first argument, we can simply type in `sqrt(4)` with the same result.
```
<br></br>
### Pre-Calculated Effect Size Data {#pre-calculated-es}
---
Let us begin our tour of meta-analysis functions with `metagen`. As we learned, this function can be used for pre-calculated effect size data. In our first example, we will use the function to perform a meta-analysis of the `ThirdWave` data set.
\index{Hedges' \textit{g}}
\index{Standardized Mean Difference}
This data set contains studies examining the effect of so-called "third wave" psychotherapies on perceived stress in college students. For each study, the standardized mean difference between a treatment and control group at post-test was calculated, and a small sample correction was applied. The effect size measure used in this meta-analysis, therefore, is Hedges' $g$. Let us have a look at the data.
\index{dmetar Package}
```{block, type='boxdmetar'}
**The "ThirdWave" Data Set**
\vspace{2mm}
The `ThirdWave` data set is included directly in the **{dmetar}** package. If you have installed **{dmetar}**, and loaded it from your library, running `data(ThirdWave)` automatically saves the data set in your _R_ environment. The data set is then ready to be used. If you do not have **{dmetar}** installed, you can download the data set as an _.rda_ file from the [Internet](https://www.protectlab.org/meta-analysis-in-r/data/thirdwave.rda), save it in your working directory, and then click on it in your R Studio window to import it.
```
```{r, message=F}
library(tidyverse) # needed for 'glimpse'
library(dmetar)
library(meta)
data(ThirdWave)
glimpse(ThirdWave)
```
We see that the data set has eight columns, the most important of which are `Author`, `TE` and `seTE`. The `TE` column contains the $g$ value of each study, and `seTE` is the standard error of $g$. The other columns represent variables describing the subgroup categories that each study falls into. These variables are not relevant for now.
We can now start to think about the type of meta-analysis we want to perform. Looking at the subgroup columns, we see that studies vary at least with respect to their risk of bias, control group, intervention duration, intervention type, and mode of delivery.
This makes it quite clear that some between-study heterogeneity can be expected, and that it makes no sense to assume that all studies have a fixed true effect. We may therefore use the random-effects model for pooling. Given its robust performance in continuous outcome data, we choose the restricted maximum likelihood (`"REML"`) estimator in this example. We will also use the Knapp-Hartung adjustments to reduce the risk of a false positive result.
Now that we have these fundamental questions settled, the specification of our call to `metagen` becomes fairly straightforward. There are two function-specific arguments which we always have to specify when using the function:
* **`TE`**. The name of the column in our data set which contains the calculated effect sizes.
* **`seTE`**. The name of the column in which the standard error of the effect size is stored.
The rest are generic **{meta}** arguments that we already covered in the last chapter. Since the analysis deals with standardized mean differences, we also specify `sm = "SMD"`. However, in this example, this has no actual effect on the results, since effect sizes are already calculated for each study. It will only tell the function to label effect sizes as SMDs in the output.
This gives us all the information we need to set up our first call to `metagen`. We will store the results of the function in an object called `m.gen`.
```{r}
m.gen <- metagen(TE = TE,
seTE = seTE,
studlab = Author,
data = ThirdWave,
sm = "SMD",
fixed = FALSE,
random = TRUE,
method.tau = "REML",
method.random.ci = "HK",
title = "Third Wave Psychotherapies")
```
Our `m.gen` object now contains all the meta-analysis results. An easy way to get an overview is to use the `summary` function^[In older versions of **{meta}** (before version 5.0-0), this overview can be printed without using `summary`. Simply call your newly created meta-analysis object directly in the _R_ console.].
```{r, eval=F}
summary(m.gen)
```
```
## Review: Third Wave Psychotherapies
## SMD 95%-CI %W(random)
## Call et al. 0.7091 [ 0.1979; 1.2203] 5.0
## Cavanagh et al. 0.3549 [-0.0300; 0.7397] 6.3
## DanitzOrsillo 1.7912 [ 1.1139; 2.4685] 3.8
## de Vibe et al. 0.1825 [-0.0484; 0.4133] 7.9
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 7.3
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 6.3
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 5.7
## Hazlett-Steve… 0.5287 [ 0.1162; 0.9412] 6.0
## Hintz et al. 0.2840 [-0.0453; 0.6133] 6.9
## Kang et al. 1.2751 [ 0.6142; 1.9360] 3.9
## Kuhlmann et al. 0.1036 [-0.2781; 0.4853] 6.3
## Lever Taylor… 0.3884 [-0.0639; 0.8407] 5.6
## Phang et al. 0.5407 [ 0.0619; 1.0196] 5.3
## Rasanen et al. 0.4262 [-0.0794; 0.9317] 5.1
## Ratanasiripong 0.5154 [-0.1731; 1.2039] 3.7
## Shapiro et al. 1.4797 [ 0.8618; 2.0977] 4.2
## Song & Lindquist 0.6126 [ 0.1683; 1.0569] 5.7
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 5.2
##
## Number of studies combined: k = 18
##
## SMD 95%-CI t p-value
## Random effects model 0.5771 [0.3782; 0.7760] 6.12 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 0.0820 [0.0295; 0.3533]; tau = 0.2863 [0.1717; 0.5944];
## I^2 = 62.6% [37.9%; 77.5%]; H = 1.64 [1.27; 2.11]
##
## Test of heterogeneity:
## Q d.f. p-value
## 45.50 17 0.0002
##
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
```
Here we go, the results of our first meta-analysis using _R_. There is a lot to unpack, so let us go through the output step by step.
\index{Weight}
* The first part of the output contains the individual studies, along with their effect sizes and confidence intervals. Since the effects were pre-calculated, there is not much new to be seen here. The `%W(random)` column contains the weight (in percent) that the random-effects model attributed to each study. We can see that, with 7.9%, the greatest weight in our meta-analysis has been given to the study by de Vibe. The smallest weight has been given to the study by Ratanasiripong. Looking at the confidence interval of this study, we can see why this is the case. The CIs around the pooled effect are extremely wide, meaning that the standard error is very high, and that the study's effect size estimate is therefore not very precise.
* Furthermore, the output tells us the total number of studies in our meta-analysis. We see that $K=$ 18 studies were combined.
* The next section provides us with the core result: the pooled effect size. We see that the estimate is $g \approx$ 0.58 and that the 95% confidence interval ranges from $g \approx$ 0.38 to 0.78. We are also presented with the results of a test determining if the effect size is significant. This is the case ($p<$ 0.001). Importantly, we also see the associated test statistic, which is denoted with `t`. This is because we applied the Knapp-Hartung adjustment, which is based on a $t$-distribution.
* Underneath, we see results concerning the between-study heterogeneity. We will learn more about some of the results displayed here in later chapters, so let us only focus on $\tau^2$. Next to `tau^2`, we see an estimate of the variance in true effects: $\tau^2$ = 0.08. We see that the confidence interval of `tau^2` does not include zero (0.03--0.35), meaning that $\tau^2$ is significantly greater than zero. All of this indicates that between-study heterogeneity exists in our data and that the random-effects model was a good choice.
* The last section provides us with details about the meta-analysis. We see that effects were pooled using the inverse variance method, that the restricted maximum-likelihood estimator was used, and that the Knapp-Hartung adjustment was applied.
We can also access information stored in `m.gen` directly. Plenty of objects are stored by default in the meta-analysis results produced by **{meta}**, and a look into the "value" section of the documentation reveals what they mean. We can use the `$` operator to print specific results of our analyses. The pooled effect, for example, is stored as `TE.random`.
```{r}
m.gen$TE.random
```
\index{Fixed-Effect Model}
Even when we specify `fixed = FALSE`, **{meta}**'s functions always also calculate results for the fixed-effect model internally. Thus, we can also access the pooled effect assuming a fixed-effect model.
```{r}
m.gen$TE.fixed
```
We see that this estimate deviates considerably from the random-effects model result.
When we want to adapt some details of our analyses, the `update` function can be helpful. This function needs the **{meta}** object as input, and the argument we want to change. Let us say that we want to check if results differ substantially if we use the Paule-Mandel instead of the restricted maximum likelihood estimator. We can do that using this code:
```{r}
m.gen_update <- update(m.gen, method.tau = "PM")
# Get pooled effect
m.gen_update$TE.random
# Get tau^2 estimate
m.gen_update$tau2
```
We see that while the pooled effect does not differ much, the Paule-Mandel estimator gives us a somewhat larger approximation of $\tau^2$.
Lastly, it is always helpful to save the results for later. Objects generated by **{meta}** can easily be saved as _.rda_ (_R_ data) files, using the `save` function.
```{r, eval=F}
save(m.gen, file = "path/to/my/meta-analysis.rda") # example path
```
<br></br>
### (Standardized) Mean Differences {#pooling-smd}
---
\index{Standardized Mean Difference}
\index{Glass' Delta}
Raw effect size data in the form of means and standard deviations of two groups can be pooled using `metacont`. This function can be used for both standardized and unstandardized between-group mean differences. These can be obtained by either specifying `sm = "SMD"` or `sm = "MD"`. Otherwise, there are seven function-specific arguments we have to provide:
* **`n.e`**. The number of observations in the treatment/experimental group.
* **`mean.e`**. The mean in the treatment/experimental group.
* **`sd.e`**. The standard deviation in the treatment/experimental group.
* **`n.c`**. The number of observations in the control group.
* **`mean.c`**. The mean in the control group.
* **`sd.c`**. The standard deviation in the control group.
* **`method.smd`**. This is only relevant when `sm = "SMD"`. The `metacont` function allows us to calculate three different types of standardized mean differences. When we set `method.smd = "Cohen"`, the uncorrected standardized mean difference (Cohen's $d$) is used as the effect size metric. The two other options are `"Hedges"` (default and recommended), which calculates Hedges' $g$, and `"Glass"`, which will calculate Glass' $\Delta$ (_delta_). Glass' $\Delta$ uses the control group standard deviation instead of the pooled standard deviation to standardize the mean difference. This effect size is sometimes used in primary studies when there is more than one treatment group, but usually not the preferred metric for meta-analyses.
\index{Knapp-Hartung Adjustment}
For our example analysis, we will recycle the `SuicidePrevention` data set we already worked with in Chapters \@ref(data-prep-R) and \@ref(fem). Not all studies in our sample are absolutely identical, so using a random-effects model is warranted. We will also use Knapp-Hartung adjustments again, as well as the restricted maximum likelihood estimator for $\tau^2$. We tell `metacont` to correct for small-sample bias, producing Hedges' $g$ as the effect size metric. Results are saved in an object that we name `m.cont`.
Overall, our code looks like this:
```{r}
# Make sure meta and dmetar are already loaded
library(meta)
library(dmetar)
library(meta)
# Load dataset from dmetar (or download and open manually)
data(SuicidePrevention)
# Use metcont to pool results.
m.cont <- metacont(n.e = n.e,
mean.e = mean.e,
sd.e = sd.e,
n.c = n.c,
mean.c = mean.c,
sd.c = sd.c,
studlab = author,
data = SuicidePrevention,
sm = "SMD",
method.smd = "Hedges",
fixed = FALSE,
random = TRUE,
method.tau = "REML",
method.random.ci = "HK",
title = "Suicide Prevention")
```
Let us see what the results are:
```{r}
summary(m.cont)
```
Looking at the output and comparing it to the one we received in Chapter \@ref(pre-calculated-es), we already see one of **{meta}**'s greatest assets. Although `metagen` and `metacont` are different functions requiring different data types, the structure of the output looks nearly identical. This makes interpreting the results quite easy. We see that the pooled effect according to the random-effects model is $g=$ -0.23, with the 95% confidence interval ranging from -0.09 to -0.37. The effect is significant ($p=$ 0.006).
We see that the effect sizes have a negative sign. In the context of our meta-analysis, this represents a favorable outcome, because it means that suicidal ideation was lower in the treatment groups compared to the control groups. To make this clearer to others, we may also consistently reverse the sign of the effect sizes (e.g. write $g=$ 0.23 instead), so that positive effect sizes always represent "positive" results.
The restricted maximum likelihood method estimated a between-study heterogeneity variance of $\tau^2$ = 0.004. Looking at `tau^2`, we see that the confidence interval includes zero, meaning that the variance of true effect sizes is not significantly greater than zero.
In the details section, we are informed that Hedges' $g$ was used as the effect size metric--just as we requested.
<br></br>
### Binary Outcomes
---
#### Risk & Odds Ratios {#pooling-or-rr}
---
\index{Risk Ratio}
\index{Odds Ratio}
\index{Inverse-Variance Weighting}
\index{Sparse Data}
The `metabin` function can be used to pool effect sizes based on binary data, particularly risk and odds ratios. Before we start using the function, we first have to discuss a few particularities of meta-analyses based on these effect sizes.
It is possible to pool binary effect sizes using the generic inverse variance method we covered in Chapter \@ref(fem) and \@ref(tau-estimators). We need to calculate the log-odds or risk ratio, as well as the standard error of each effect, and can then use the inverse of the effect size variance to determine the pooling weights.
However, this approach is suboptimal for binary outcome data [@higgins2019cochrane, chapter 10.4.1]. When we are dealing with **sparse** data, meaning that the number of events or the total sample size of a study is small, the calculated standard error may not be a good estimator of the precision of the binary effect size.
<br></br>
##### The Mantel-Haenszel Method {#mantel-haenszel}
---
\index{Mantel-Haenszel Method}
The **Mantel-Haenszel** method [@mantel1959statistical; @robins1986general] is therefore sometimes used as an alternative to calculate the (fixed-effect) weights of studies with binary outcome data. It is also the default approach used in `metabin`. This method uses the number of events and non-events in the treatment and control group to determine a study's weight. There are different formulas depending on if we want to calculate the risk or odds ratio.
\vspace{8mm}
**Risk Ratio:**
\begin{equation}
w_k = \frac{(a_k+b_k) c_k}{n_k}
(\#eq:pes9)
\end{equation}
**Odds Ratio:**
\begin{equation}
w_k = \frac{b_kc_k}{n_k}
(\#eq:pes10)
\end{equation}
In the formulas, we use the same notation as in Chapter \@ref(rr), with $a_k$ being the number of events in the treatment group, $c_k$ the number of event in the control group, $b_k$ the number of non-events in the treatment group, $d_k$ the number of non-events in the control group, and $n_k$ being the total sample size.
\index{Peto Method}
\index{Peto Odds Ratio}
<br></br>
##### The Peto Method
---
A second approach is the **Peto** method [@yusuf1985beta]. In its essence, this approach is based on the inverse variance principle we already know. However, it uses a special kind of effect size, the **Peto odds ratio**, which we will denote with $\hat\psi_k$.
To calculate $\hat\psi_k$, we need to know $O_k$, the observed events in the treatment group, and calculate $E_k$, the **expected** number of cases in the treatment group. The difference $O_k-E_k$ is then divided by the variance $V_k$ of the difference between $O_k$ and $E_k$, resulting in a log-transformed version of $\hat\psi_k$. Using the same cell notation as before, the formulas to calculate $E_k$, $O_k$ and $V_k$ are the following:
\begin{equation}
O_k = a_k
(\#eq:pes11)
\end{equation}
\begin{equation}
E_k = \frac{(a_k+b_k)(a_k+c_k)}{a_k+b_k+c_k+d_k}
(\#eq:pes12)
\end{equation}
\vspace{4mm}
\begin{equation}
V_k = \frac{(a_k+b_k)(c_k+d_k)(a_k+c_k)(b_k+d_k)}{{(a_k+b_k+c_k+d_k)}^2(a_k+b_k+c_k+d_k-1)}
(\#eq:pes13)
\end{equation}
\vspace{4mm}
\begin{equation}
\log\hat\psi_k = \frac{O_k-E_k}{V_k}
(\#eq:pes14)
\end{equation}
The inverse of the variance of $\log\hat\psi_k$ is then used as the weight when pooling the effect sizes^[The variance of the log Peto odds ratio $\log\hat\psi_k$ is defined as $\text{SE}_{\log\hat\psi_k}^2 = \widehat{\text{Var}}(\log\hat\psi_k)=V_k^{-1}$, so that the pooling weight using the Peto method is defined as $w^{\text{(Peto)}}_k=\widehat{\text{Var}}(\log\hat\psi_k)^{-1}$.].
<br></br>
##### The Bakbergenuly-Sample Size Method
---
Recently, Bakbergenuly and colleagues [-@bakbergenuly2020methods] proposed another method in which the weight of effects is only determined by a study's sample size, and showed that this approach may be preferable to the one by Mantel and Haenszel. We will call this the **sample size method**. The formula for this approach is fairly easy. We only need to know the sample size $n_{\text{treat}_k}$ and $n_{\text{control}_k}$ in the treatment and control group, respectively.
\begin{equation}
w_k = \frac{n_{\text{treat}_k}n_{\text{control}_k}}{n_{\text{treat}_k} + n_{\text{control}_k} }
(\#eq:pes15)
\end{equation}
When we implement this pooling method in `metabin`, the weights and overall effect of the fixed-effect and random-effects model will be identical, since the sample sizes will be used as weights in both cases. Only the $p$-value and confidence interval of the pooled effect will differ.
\index{Continuity Correction}
\index{Zero Cell Problem}
```{block2, type='boxinfo'}
**Which Pooling Method Should I Use?**
\vspace{4mm}
In Chapter \@ref(rr), we already talked extensively about the problem of **zero-cells** and **continuity correction**. While both the Peto and sample size method can be used without modification when there are zero cells, it is common to add 0.5 to zero cells when using the Mantel-Haenszel method. This is also the default behavior in `metabin`.
Using continuity corrections, however, have been discouraged [@efthimiou2018practical], as they can lead to biased results. The Mantel-Haenszel method only **really** requires a continuity correction when one specific cell is zero in **all** included studies, which is rarely the case. Usually, it is therefore advisable to use the **exact** Mantel-Haenszel method without continuity corrections by setting `MH.exact = TRUE` in `metabin`.
\vspace{4mm}
The Peto method also has its limitations. First of all, it can only be used for odds ratios. Simulation studies also showed that the approach only works well when (1) the number of observations in the treatment and control group is similar, (2) when the observed event is rare (<1%), and (3) when the treatment effect is not overly large [@bradburn2007much; @j2004add].
The Bakbergenuly-sample size method, lastly, is a fairly new approach, meaning that it is not as well studied as the other two methods.
\vspace{2mm}
All in all, it may be advisable in most cases to follow Cochrane's general assessment [@higgins2019cochrane, chapter 10.4], and use the Mantel-Haenszel method (without continuity correction). The Peto method may be used when the odds ratio is the desired effect size metric, and when the event of interest is expected to be rare.
It is also important to keep in mind that these alternative methods are primarily relevant for the **fixed-effect model**. Pooled estimates using the random-effects model will still essentially be based on the (generic) inverse-variance method. For example, even if the Mantel-Haenszel method is specified in `metabin`, the pooled effect is still derived using the same weighting approach we discussed in Chapter \@ref(tau-estimators). For the Peto method, the inverse-variance approach will be applied using the variance of the Peto log-odds ratio.
```
<br></br>
##### Pooling Binary Effect Sizes in _R_ {#ppoolbin}
---
There are eight important function-specific arguments in `metabin`:
* **`event.e`**. The number of events in the treatment/experimental group.
* **`n.e`**. The number of observations in the treatment/experimental group.
* **`event.c`**. The number of events in the control group.
* **`n.c`**. The number of observations in the control group.
* **`method`**. The pooling method to be used. This can either be `"Inverse"` (generic inverse-variance pooling), `"MH"` (Mantel-Haenszel; default and recommended for fixed-effect models), `"Peto"` (Peto method), or `"SSW"` (Bakbergenuly-sample size method; only when `sm = "OR"`).
* **`sm`**. The summary measure (i.e. effect size metric) to be calculated. We can use `"RR"` for the risk ratio and `"OR"` for the odds ratio.
* **`incr`**. The increment to be added for continuity correction of zero cells. If we specify `incr = 0.5`, an increment of 0.5 is added. If we set `incr = "TACC"`, the treatment arm continuity correction method is used (see Chapter \@ref(rr)). As mentioned before, it is usually recommended to leave out this argument and not apply continuity corrections.
* **`MH.exact`**. If `method = "MH"`, we can set this argument to `TRUE`, indicating that we do not want that a continuity correction is used for the Mantel-Haenszel method.
For our hands-on example, we will use the `DepressionMortality` data set. This data set is based on a meta-analysis by Cuijpers and Smit [-@cuijpers2002excess], which examined the effect of suffering from depression on all-cause mortality. The data set contains the number of individuals with and without depression, and how many individuals in both groups had died after several years.
\index{dmetar Package}
```{block, type='boxdmetar'}
**The "DepressionMortality" Data Set**
\vspace{2mm}
The `DepressionMortality` data set is included directly in the **{dmetar}** package. If you have installed **{dmetar}**, and loaded it from your library, running `data(DepressionMortality)` automatically saves the data set in your _R_ environment. The data set is then ready to be used. If you do not have **{dmetar}** installed, you can download the data set as an _.rda_ file from the [Internet](https://www.protectlab.org/meta-analysis-in-r/data/depressionmortality.rda), save it in your working directory, and then click on it in your R Studio window to import it.
```
Let us have a look at the data set first:
```{r, message = F}
library(dmetar)
library(tidyverse)
library(meta)
data(DepressionMortality)
glimpse(DepressionMortality)
```
\index{Paule-Mandel Estimator}
\index{Mantel-Haenszel Method}
In this example, we will calculate the risk ratio as the effect size metric, as was done by Cuijpers and Smit. We will use a random-effects pooling model, and, since we are dealing with binary outcome data, we will use the Paule-Mandel estimator for $\tau^2$.
Looking at the data, we see that the sample sizes vary considerably from study to study, a scenario in which the Paule-Mandel method may be slightly biased (see Chapter \@ref(tau-estimators)). Keeping this in mind, we can also try out another $\tau^2$ estimator as a sensitivity analysis to check if the results vary by a lot.
The data set contains no zero cells, so we do not have to worry about continuity correction, and can use the exact Mantel-Haenszel method right away. We save the meta-analysis results in an object called `m.bin`.
```{r, eval=F}
m.bin <- metabin(event.e = event.e,
n.e = n.e,
event.c = event.c,
n.c = n.c,
studlab = author,
data = DepressionMortality,
sm = "RR",
method = "MH",
MH.exact = TRUE,
fixed = TRUE,
random = TRUE,
method.tau = "PM",
method.random.ci = "HK",
title = "Depression and Mortality")
summary(m.bin)
```
```{r, echo=F, message=F, warning=F}
m.bin <- metabin(event.e = event.e,
n.e = n.e,
event.c = event.c,
n.c = n.c,
studlab = author,
data = DepressionMortality,
sm = "RR",
method = "MH",
MH.exact = TRUE,
fixed = FALSE,
random = TRUE,