-
Notifications
You must be signed in to change notification settings - Fork 0
/
Filedrawer and Publication Bias.qmd
220 lines (174 loc) · 10.3 KB
/
Filedrawer and Publication Bias.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---
title: "Publication Bias"
format: gfm
editor: visual
---
## Question 1.
Patterns in `filedrawer.csv` .
```{r}
# Loading libraries
library(tidyverse)
library(modelsummary)
```
**Part a)** Storing the data in a tibble called `filedrawer` using `read_csv()` .
```{r}
# Question 1 Part a)
# Loading Data
filedrawer <- read_csv("https://ditraglia.com/data/filedrawer.csv")
filedrawer
```
**Part b)** Using `datasummary_crosstab()` to count the number of papers with each publication status.
```{r}
# Question 1 Part b)
# Creating count table
datasummary_crosstab(IV ~ DV, data = filedrawer, statistic = ~N)
```
**Part c)** We first organize the data in a way such that we can run the regression. If a paper is published in a non-top journal or if it is published in a top journal it is coded to be 1, otherwise 0. Similarly, if the statistical significance of the main findings is either `Weak` or `Null` , it is coded to be 1 otherwise 0. We then run a logit regression using this data.
```{r}
# Question 1 Part c)
filedrawer <- filedrawer |>
filter(DV != "Unwritten") |> # Filter out rows where DV is "Unwritten"
mutate(
published = ifelse(DV %in% c("Published, non top", "Published, top"),
1, 0), # Create binary indicator for published
significance = ifelse(IV == "Weak" | IV == "Null", 0, 1)
# Create binary indicator for significance
)
# Running the regression
model_1 <- glm(published ~ significance, data = filedrawer,
family = binomial(link = 'logit'))
# Model summary
modelsummary(list(Significance = model_1),
gof_omit = 'Log.Lik|R2 Adj.|AIC|BIC|F',
# Omit specified goodness-of-fit statistics
fmt = 2, # Format coefficients to 2 decimal places
stars = TRUE) # Include significance stars
```
**Part d)** To account for the possibility that better researchers are more likely to think up experimental interventions with large treatment effects and write better papers., we add the `max.h` variable to our regression.
```{r}
# Question 1 Part d)
model_2 <- glm(published ~ significance + max.h, filedrawer,
family = binomial(link = 'logit'))
# Model summary
modelsummary(
list(Significance = model_1, Significance_ResearchQuality = model_2),
# List of models to summarize
gof_omit = 'Log.Lik|R2 Adj.|AIC|BIC|F',
# Omit specified goodness-of-fit statistics
fmt = 2, # Format coefficients to 2 decimal places
stars = TRUE # Include significance stars
)
```
**Part e)** The coefficient for `significance` is not statistically significant in both models. This suggests that we cannot reject the null hypothesis that there’s no difference in publication rates (in any journal) across projects with “Null” and “Weak” results compared to “Strong” results. Given publication/filedrawer bias, we would have expected studies with non-significant results to be less likely to be published but our analysis suggests that this doesn't appear to be the case.
The coefficient for `max.h` (representing research quality) is however statistically significant. So research quality does seem to play a role in publication.
## Question 2
Patterns in `published.csv` .
**Part a)** Loading data in `published` using `read_csv()` .
```{r}
# Question 2 Part a)
# Loading published data
published <- read_csv("https://ditraglia.com/data/published.csv")
published
```
**Part b)** Using `ggplot` a scatterplot is plotted for with `cond.s` on the x-axis and `cond.p` on the y-axis.
```{r}
# Question 2 Part b)
# Create scatterplot
ggplot(published, aes(x = cond.s, y = cond.p)) + # Define data and aesthetics
geom_jitter( # Add jittered points
width = 0.5, height = 0.5, size = 3, color = "black",
fill = "orange", shape = 21, stroke = 1, alpha = 0.8
) +
labs( # Add labels and title
title = "Scatterplot of Experimental Conditions",
subtitle = "Number of Experimental Conditions in Study vs. Published Paper",
x = "Number of Conditions in Study",
y = "Number of Conditions in Published Paper",
color = "Conditions in Study"
) +
theme_minimal() # Apply minimal theme
```
**Part c)** Like in part b), this plots the scatterplot with `out.s` on the x-axis and `out.p` on the y-axis.
```{r}
# Question 2 Part c)
# Create scatterplot
ggplot(published, aes(x = out.s, y = out.p)) + # Define data and aesthetics
geom_jitter( # Add jittered points
width = 0.5, height = 0.5, size = 3, color = "black",
fill = "lightblue", shape = 21, stroke = 1, alpha = 0.9
) +
labs( # Add labels and title
title = "Scatterplot of Outcome Variables",
subtitle = "Number of Outcome Variables in Study vs. Published Paper",
x = "Number of Outcome Variables in Study",
y = "Number of Outcome Variables in Published Paper",
color = "Outcome Variables in Study"
) +
theme_minimal() # Apply minimal theme
```
**Part d)** In both the scatterplots, we observe a upward sloping positive trend. In the `cond.s` vs `cond.p` , this suggests that studies with more experimental conditions tend to report more of those conditions in their published papers. Looking at it the other way around, the studies that have fewer conditions may thus be less likely to report them if they didn't have statistically significant findings. This could possibly be because of the file drawer bias where researchers don't submit projects with statistically insignificant results for publication and/or because of publication bias where journal editors and referees are more willing to accept papers with statistically significant results.
Similarly, the positive trend in the `out.s` vs `out.p` scatterplot suggests a tendency for selectively reporting those outcome variables that were found to be statistiscally siginificant or more interesting while excluding the variables that were not significant or non-compelling.
## Question 3.
**Part a)** For each study, let $n$ denote the number of tests where $n = cond.s * out.s$. Now for the significance level, $\alpha = 0.05$, this represents a 5% chance of incorrectly rejecting the null hypothesis. So the probability of not rejecting any of the null hypothesis is given by $(1 - \alpha)^n$. Therefore the probability of rejecting at least 1 null hypothesis is $1 - (1 - \alpha)^n$.
The below code first calculates the number of tests in each study and then uses it to calculate the require probability for each study which is then averaged to give the average (per paper) probability of 0.621.
```{r}
# Question 3 Part a)
alpha <- 0.05 # Significance Level
published <- published |>
mutate(num_tests = cond.s * out.s) |>
# Calculate the number of tests for each study
mutate(prob_reject_at_least_one = 1 - (1 - alpha) ^ num_tests)
# Calculate the probability of rejecting at least one
# null hypothesis for each study
average_prob <- mean(published$prob_reject_at_least_one)
# Calculate the average probability
average_prob
```
**Part b)** Following from the previous part, we will be using the binomial distribution for calculating the necessary probability. First note that the probability of rejecting at least two null hypotheses is given by:
$$
1 - P(\text{rejecting 0 null hypothesis}) - P(\text{rejecting 1 null hypothesis})
$$
From the previous part we know that $P(\text{rejecting 0 null hypothesis}) = (1 - \alpha)^n$. Now we can use the binomial distribution to calculate the porbability of rejecting exactly 1 null hypothesis as $P(\text{rejecting 1 null hypothesis}) = n * \alpha * (1 - \alpha)^{n - 1}$. Substituing this into the formula above we get the required probability which is then averaged to get the average (per paper) probability. This comes out to be 0.34.
```{r}
# Question 3 Part b)
alpha <- 0.05 # Significance Level
published <- published |>
mutate(
prob_reject_zero = (1 - alpha) ^ num_tests,
# Probability of rejecting zero null hypotheses
prob_reject_one = num_tests * alpha * (1 - alpha) ^ (num_tests - 1),
# Probability of rejecting exactly one null hypothesis
prob_reject_at_least_two = 1 - prob_reject_zero - prob_reject_one
# Probability of rejecting at least two null hypotheses
)
# Calculate the average probability of rejecting at least two null hypotheses
average_prob_at_least_two <- mean(published$prob_reject_at_least_two)
average_prob_at_least_two
```
**Part c)** The process for this is exactly the same as the previous two parts. The probability of rejecting at least 3 null hypotheses is given by:
$$
1 - P(\text{rejecting 0 null hypothesis}) - P(\text{rejecting 1 null hypothesis} - P(\text{rejecting 2 null hypothesis})
$$
We know $P(\text{rejecting 0 null hypothesis})$ and $P(\text{rejecting 1 null hypothesis})$ from the previous part. The probability of rejecting exactly 2 null hypothesis is obtained as: ${n \choose 2} * \alpha^2 * (1 - \alpha)^{n-2}$. We can then calculate the required average (per paper) probability. This comes out to be 0.177.
```{r}
# Question 3 Part c)
published <- published |>
mutate(
prob_reject_zero = (1 - alpha) ^ num_tests,
# Probability of rejecting zero null hypotheses
prob_reject_one = num_tests * alpha * (1 - alpha) ^ (num_tests - 1),
# Probability of rejecting exactly one null hypothesis
prob_reject_two = choose(num_tests, 2) * alpha^2 *
(1 - alpha) ^ (num_tests - 2),
# Probability of rejecting exactly two null hypotheses
prob_reject_at_least_three = 1 - prob_reject_zero -
prob_reject_one - prob_reject_two
# Probability of rejecting at least three null hypotheses
)
# Calculate the average probability of rejecting at
# least three null hypotheses
average_prob_at_least_three <- mean(published$prob_reject_at_least_three)
average_prob_at_least_three
```
**Part d)** From part a), we can conclude that when a researching is testing multiple conditions within a study, there is a 62.14% probability that they will find at least one outcome that is statistically significant purely by chance (at the 5% significance level). Parts b) and c) show that as the number of tests increases, the probability of finding statistically significant results by chance also increases.
This rightly raises concerns about the reliability of findings in scientific literature. The high probability of rejecting at least one null hypothesis (62.14%) suggests that within a study more than half of the time, at least one of the published significant results is due to chance and not representative of a true effect.