agile-2020-papers.Rmd

---
title: "Reproducibility Review AGILE 2020"
author: "Daniel Nüst"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
  html_document:
    toc: yes
    self_contained: true
  #pdf_document:
  #  toc: yes
params:
  private_info: yes
---

## Introduction

This document includes scripts and text analysis to support the reproducibility review at the [AGILE conference 2020](https://agile-online.org/conference-2020), in Chania, Greece - _"Geospatial Technologies: seeding the future"_.
The physical conference was cancelled due to the [COVID-19 crisis](https://en.wikipedia.org/wiki/Coronavirus_disease_2019).
The full papers were already reviewed at the time the decision to cancel was taken, so these are still published. However, short papers and posters were not sent out to review but are included here in the summary statistics.

Find out more online [about reproducible publications at AGILE](https://doi.org/10.17605/OSF.IO/PHMCE) and the [review process](https://osf.io/eg4qx/), and visit the Reproducible AGILE website: [https://reproducible-agile.github.io/](https://reproducible-agile.github.io/).
The code of this document is published on GitHub in the repository [reproducible-agile/reviews-2020](https://github.com/reproducible-agile/reviews-2020), where you can inspect the R code in the file `agile-2020-papers.Rmd` and find instructions for reproducing the workflow.
The [report parameter](https://bookdown.org/yihui/rmarkdown/parameterized-reports.html) `private_info` can be set to `yes` to show information which cannot be shared publicly, such as author names, titles, or excerpts of not accepted submissions, and to upload review files to private shares, which requires authentication.

```{r load_libraries, message=FALSE, warning=FALSE, include=FALSE}
library("pdftools")
library("stringr")
library("tidyverse")
library("tidytext")
library("wordcloud")
library("RColorBrewer")
library("here")
library("quanteda")
library("googledrive")
library("kableExtra")
library("httr")
library("xml2")
library("rvest")
library("tidyr")
library("ggplot2")
library("httr")
```

```{r seed, echo=FALSE}
set.seed(23) # 23rd AGILE!
```

## Submitted papers

```{r local_paths, echo=FALSE}
submissions_path <- here::here("submissions")
dir.create(submissions_path, recursive = TRUE, showWarnings = FALSE)

review_files_path <- here::here("review-material")
dir.create(review_files_path, recursive = TRUE, showWarnings = FALSE)

cr_path <- here::here("camera-ready-full-papers")
```

### Submission metadata

Retrieve all information about submissions from the EasyChair submissions system.
The full submission information is not included in the public rendering of this report.

```{r easychair_login}
if(is.na(Sys.getenv("easychair_username", unset = NA)) || is.na(Sys.getenv("easychair_password", unset = NA))) {
  stop("Provide login details for EasyChair (e.g., in a file `.Renviron`) in environment variables",
       "`easychair_username` and `easychair_password`")
}

# with help from https://github.com/kaytwo/easierchair/blob/master/scrape_easychair.py
# https://stackoverflow.com/questions/23202522/r-httr-post-request-for-signing-in
login_response <- NULL
if(is.null(login_response)) {
  login_response <- httr::POST(url = "https://easychair.org/account/verify",
                         body = list(name = Sys.getenv("easychair_username"),
                                     password = Sys.getenv("easychair_password")),
                         encode = "form")
}
```

```{r submissions, echo=FALSE}
submission_page <- httr::GET(url = "https://easychair.org/conferences/submissions?a=24268236")
submission_html <- xml2::read_html(submission_page)
submission_table_full <- html_node(submission_html, "#ec\\:table1")
# remove first header row and set names manually later, because the vertical table headers are miages anyway
xml_remove(xml_child(html_node(submission_table_full, "thead")))
submission_table <- rvest::html_table(x = submission_table_full,
                                      fill = TRUE)
names(submission_table) <- c("id", "authors", "title", "information", "paper", "assigment", "update", "type", "time", "decision")
submission_table$id <- str_pad(submission_table$id, width = 3, side = "left", pad = "0")
submission_table <- submission_table  %>%
  mutate_if(is.character, list(~na_if(.,"")))

links <- html_nodes(submission_table_full, "a[href]") %>% html_attr("href")
submission_table$information <- paste("https://easychair.org",
                                      links[str_detect(links, pattern = "submission_view")], sep = "")

submission_table$submission_id <- str_match(submission_table$information, pattern = "submission=([[:digit:]]+)")[,2]
# FIXME no. 033 has no file
#submission_table$paper <- R.utils::insert(paste("https://easychair.org",
#                                                links[str_detect(links, pattern = "download")],
#                                                sep = ""),
#                                          ats = which(submission_table$id == "033"),
#                                         values = NA)
submission_table$paper <- paste("https://easychair.org",
                                links[str_detect(links, pattern = "download")],
                                sep = "")

submission_table %>%
  group_by(type) %>%
  tally() %>%
  kable() %>%
  kable_styling("striped")
```

```{r submissions_full_metadata, echo=FALSE}
if(params$private_info) {
  submission_table %>%
    arrange(id) %>%
    kable() %>%
    kable_styling("striped") %>%
    scroll_box(height = "480px")
}
```

### Load texts

The paper PDFs are downloaded from EasyChair directly using the links provided in the submission overview table.

```{r download_easychair, echo=FALSE}
for (i in 1:nrow(submission_table)) {
  if(is.na(submission_table[i,]$paper)) {
    cat("No paper URL for ", i, "\n")
    next
  }
  
  current <- submission_table[i,]
  filename <- file.path(submissions_path, paste0(current$id, ".pdf"))
  if(!file.exists(filename)) {
    httr::GET(url = current$paper,
              httr::write_disk(path = filename,
                               overwrite = TRUE))
  }
}

submission_files <- dir(path = submissions_path, pattern = ".pdf$", full.names = TRUE)

submission_table <- left_join(submission_table,
                              tibble("id" = str_match(submission_files,
                                                      pattern = "([:digit:]*)\\.pdf")[,2],
                                     "file" = submission_files),
                              by = "id")
```

The text is extracted from PDFs and it is processed to create a [tidy](https://www.jstatsoft.org/article/view/v059i10) data structure without [stop words](https://en.wikipedia.org/wiki/Stop_words).
The stop words include specific words, which might be included in the page header, abbreviations, and terms particular to scientific articles, such as `figure`.

```{r tidy_data, echo=FALSE}
texts <- list()
for (i in 1:nrow(submission_table)) {
  current <- submission_table[i,]
  #cat("Reading ", current$file, "\n")
  the_text <- NA
  if(!is.na(current$file)) {
    the_text <- pdf_text(current$file)
    the_text <- str_c(the_text, collapse = TRUE)
  }
  
  names(the_text) <- current$id
  texts <- c(texts, the_text)
}

pages <- lapply(submission_table$file, function(f) {
  if(is.na(f))
    NA
  else pdf_info(f)$pages
})

tidy_texts <- tibble(id = submission_table$id,
                     path = submission_table$file,
                     type = submission_table$type,
                     text = unlist(texts),
                     pages = pages)

# create a table of all words
all_words <- tidy_texts %>%
  select(id,
         type,
         text) %>%
  unnest_tokens(word, text)

# remove stop words and remove numbers
my_stop_words <- tibble(
  word = c(
    "et",
    "al",
    "fig",
    "e.g",
    "i.e",
    "http",
    "ing",
    "pp",
    "figure",
    "based",
    "conference",
    "university",
    "table"
  ),
  lexicon = "agile"
)

all_stop_words <- stop_words %>%
  bind_rows(my_stop_words)
suppressWarnings({
  no_numbers <- all_words %>%
    filter(is.na(as.numeric(word)))
})

no_stop_words <- no_numbers %>%
  anti_join(all_stop_words, by = "word")

total_words = nrow(all_words)
after_cleanup = nrow(no_stop_words)
```

About `r round(after_cleanup/total_words * 100)`&nbsp;% of the words are considered stop words.

The following table shows how many words and non-stop words each document has, sorted by number of non-stop words.
The `id` is built from the file name plus a prefix:
for full papers, it is the left-padded submission number and the prefix `fp_`;
<!--for short papers and posters, it is the submission number included in the file name and the prefixes `sp_` and `po_` respectively.-->

```{r stop_words, echo=FALSE, message=FALSE, warning=FALSE}
nsw_per_doc <- no_stop_words %>%
  group_by(id) %>%
  summarise(words = n()) %>%
  rename(`non-stop words` = words)

words_per_doc <- all_words %>%
  group_by(id, type) %>%
  summarise(words = n())

type_counts_totals <- submission_table %>%
  group_by(type) %>%
  tally()
type_counts_totals$type <- c("FP", "PO", "SP")
type_counts_totals <- paste(
  paste(type_counts_totals$type, type_counts_totals$n, sep = ":"),
  collapse = "|")


words_joined <- as.data.frame(inner_join(words_per_doc, nsw_per_doc))
summary_row <- tibble(id = "Total",
                      type = type_counts_totals,
                      words = sum(words_per_doc$words),
                      `non-stop words` = sum(nsw_per_doc$`non-stop words`))
if(!params$private_info) {
  words_joined$id <- NULL
  summary_row$id <- NULL
}

bind_rows(words_joined, summary_row) %>%
  kable() %>%
  kable_styling("striped", full_width = FALSE) %>%
  row_spec(nrow(words_joined) + 1, bold = TRUE) %>%
  scroll_box(height = "240px")
```

### Which papers include a "Data and Software Availability" section?

According the the [AGILE Reproducible Paper Guidelines](https://osf.io/c8gtq/), all authors must add a _Data and Software Availability_ section to their paper.
The guidelines themselves are not mandatory yet in 2020, but let's see how many authors did include the statement.
This detection naturally relies on the loaded texts _with_ stop words.

```{r pdfgrep, echo=FALSE, eval=FALSE}
# Quick version with `pdfgrep`
cmd <- paste("pdfgrep", "-e 'Data and Software Availability'", "-i", "-A 3", "2020/*/*")
output <- system(cmd, intern = TRUE)
print(cmd)
print(output)
```

```{r dasa_section, echo=FALSE}
dasa_pattern <- regex("Data and Software Availability", ignore_case = TRUE)
tidy_texts <- tidy_texts %>%
  mutate(has_dasa = str_detect(tidy_texts$text, pattern = dasa_pattern))

dasa_count <- tidy_texts %>% filter(has_dasa) %>% nrow()

excerpt_length <- 800
dasa_texts <- tidy_texts %>%
  filter(has_dasa) %>%
  mutate(dasa_start = str_locate(.data$text, pattern = dasa_pattern)[,1]) %>%
  mutate(dasa_text = str_sub(.data$text, start = dasa_start, end = dasa_start + excerpt_length)) %>%
  select(id, type, dasa_text)
```

`r dasa_count` papers have the section in question, that is `r round(dasa_count/nrow(submission_table) * 100)`&nbsp;% of all submissions.
Here are the statistics per submission type:

```{r dasa_statistics, echo=FALSE}
dasa_stats <- tidy_texts %>%
  filter(has_dasa) %>%
  group_by(type, .drop = FALSE) %>%
  summarise(n = n())

dasa_stats <- left_join(tidy_texts %>%
                          group_by(type, .drop = FALSE) %>%
                          summarise(submissions = n()),
                        dasa_stats,
                        by = "type")

dasa_stats <- dasa_stats %>%
  mutate(`%` = round(n/submissions*100, digits = 1))

dasa_stats %>%
  arrange(desc(n)) %>%
  rename(`with DASA` =)
  kable() %>%
  kable_styling("striped")
```

`r if(!params$private_info) {"<!--"}`
The following table shows the first `r excerpt_length` characters of these sections.
`r if(!params$private_info) {"-->"}`

```{r dasa_section_table, echo=FALSE}
if(params$private_info) {
  dasa_texts %>%
    arrange(id) %>%
    kable() %>%
    kable_styling("striped") %>%
    scroll_box(height = "320px")
}
```

### Wordstem analysis

```{r wordstem_data, include=FALSE}
wordstems <- no_stop_words %>%
  mutate(wordstem = quanteda::char_wordstem(no_stop_words$word))

countPapersUsingWordstem <- function(the_word) {
  sapply(the_word, function(w) {
    wordstems %>%
      filter(wordstem == w) %>%
      group_by(id) %>%
      count %>%
      nrow
  })
}

top_wordstems <- wordstems %>%
  group_by(wordstem) %>%
  tally %>%
  arrange(desc(n)) %>%
  head(20) %>%
  mutate(`# papers` = countPapersUsingWordstem(wordstem)) %>%
  mutate(`% papers` = round(countPapersUsingWordstem(wordstem)/nrow(submission_table) * 100)) %>%
  add_column(place = c(1:nrow(.)), .before = 0)

minimum_occurence <- 100
cloud_wordstems <- wordstems %>%
  group_by(wordstem) %>%
  tally %>%
  filter(n >= minimum_occurence) %>%
  arrange(desc(n))
```

For the following table and figure, the word stems were extracted based on a stemming algorithm from package [`quanteda`](https://cran.r-project.org/package=quanteda).
The word cloud is based on `r length(unique(cloud_wordstems$wordstem))` unique words occuring each at least `r minimum_occurence` times, all in all occuring `r sum(cloud_wordstems$n)` times which comprises `r round(sum(cloud_wordstems$n)/ nrow(no_stop_words) * 100)`&nbsp;% of non-stop words.

```{r top_wordstems, echo=FALSE}
top_wordstems %>%
  kable() %>%
  kable_styling("striped") %>%
  scroll_box(height = "320px")
```

```{r wordstemcloud, dpi=150, echo=FALSE, fig.cap="Wordstem cloud of AGILE 2020 full paper submissions"}
wordcloud(cloud_wordstems$wordstem, cloud_wordstems$n,
          max.words = Inf,
          random.order = FALSE,
          fixed.asp = FALSE,
          rot.per = 0,
          color = brewer.pal(8,"Dark2"))
```

## Reproducible research-related keywords of all submissions

The following tables lists how often terms related to reproducible research appear in each document.
The detection matches full words using regex option `\b`.

- reproduc (`reproduc.*`, reproducibility, reproducible, reproduce, reproduction)
- replic (`replicat.*`, i.e. replication, replicate)
- repeatab (`repeatab.*`, i.e. repeatability, repeatable)
- software
- (pseudo) code/script(s) [column name _code_]
- algorithm (`algorithm.*`, i.e. algorithms, algorithmic)
- process (`process.*`, i.e. processing, processes, preprocessing)
- data (`data.*`, i.e. dataset(s), database(s))
- result(s) (`results?`)
- repository(ies) (`repositor(y|ies)`)
- collaboration platforms (`git(hub|lab)`)

The following table highlights papers with the Data and Software Availability Section with italic font and grey background.
The entries are sorted by descending sum of all keywords per paper.

```{r keywords_per_paper, echo=FALSE, warning=FALSE}
tidy_texts_lower <- str_to_lower(tidy_texts$text)
word_counts <- tibble(
  id = tidy_texts$id,
  type = tidy_texts$type,
  DASA = tidy_texts$has_dasa,
  `reproduc..` = str_count(tidy_texts_lower, "\\breproduc.*\\b"),
  `replic..` = str_count(tidy_texts_lower, "\\breplicat.*\\b"),
  `repeatab..` = str_count(tidy_texts_lower, "\\brepeatab.*\\b"),
  `code` = str_count(tidy_texts_lower,
    "(\\bcode\\b|\\bscript.*\\b|\\bpseudo\ code\\b)"),
  software = str_count(tidy_texts_lower, "\\bsoftware\\b"),
  `algorithm(s)` = str_count(tidy_texts_lower, "\\balgorithm.*\\b"),
  `(pre)process..` = str_count(tidy_texts_lower, 
                "(\\bprocess.*\\b|\\bpreprocess.*\\b|\\bpre-process.*\\b)"),
  `data.*` = str_count(tidy_texts_lower, "\\bdata.*\\b"),
  `result(s)` = str_count(tidy_texts_lower, "\\bresults?\\b"),
  `repository/ies` = str_count(tidy_texts_lower, "\\brepositor(y|ies)\\b"),
  `github/lab` = str_count(tidy_texts_lower, "\\bgit(hub|lab)\\b")
)

# https://stackoverflow.com/a/32827260/261210
sumColsInARow <- function(df, list_of_cols, new_col) {
  df %>% 
    mutate_(.dots = ~Reduce(`+`, .[list_of_cols])) %>% 
    setNames(c(names(df), new_col))
}

word_counts_sums <- sumColsInARow(
  word_counts, 
  names(word_counts)[!(names(word_counts) %in% c("id", "type"))], "all") %>%
  arrange(desc(all))

DASA_counts <- word_counts_sums %>%
  group_by(DASA) %>%
  tally()

word_counts_sums_total <- word_counts_sums %>% 
  summarise_if(is.numeric, funs(sum)) %>%
  add_column(id = "Total",
             type = "",
             DASA = paste0("T:", DASA_counts[2,2], "|F:", DASA_counts[1,2]),
             .before = 0)
word_counts_sums <- rbind(word_counts_sums, word_counts_sums_total)

if(!params$private_info) {
  word_counts_sums$id <- NULL
}

word_counts_sums %>%
  kable() %>%
  kable_styling("striped", font_size = 12, bootstrap_options = "condensed")  %>%
  row_spec(0, font_size = "x-small", bold = T)  %>%
  row_spec(word_counts_sums %>% rownames_to_column() %>%
             filter(DASA == TRUE, .preserve = TRUE) %>%
             select(rowname) %>% unlist() %>% as.numeric(),
           italic = TRUE, background = "#eeeeee") %>%
  row_spec(nrow(word_counts_sums), bold = T) %>%
  scroll_box(height = "480px")
```

------

## Accepted full papers

### Full paper decisions

There is "accept" and "conditionally accept" (after second review)!

```{r scrape_accepted, echo=FALSE}
submission_table %>%
  filter(type == "Full-paper") %>%
  group_by(decision) %>%
  summarise(count = n()) %>%
  kable() %>%
  kable_styling("striped")
```

```{r compile review data, echo=FALSE}
#page <- httr::GET(url = "https://easychair.org/conferences/status?a=24268236")
#review_status_page <- xml2::read_html(page)
#review_table <- rvest::html_table(html_nodes(review_status_page, ".paperTable")[[1]], header = TRUE)
#names(review_table)[4] <- "average"
#names(review_table)[1] <- "id"
#review_table$id <- str_pad(review_table$id, width = 3, side = "left", pad = "0")
#
## IMPORTANT: "Show paper authors" must be _un_ticked for the following code to work
#review_table <- review_table %>%
#  tidyr::separate(col = title, into = c("authors","title"), sep = "\\.+?", extra = "merge")
#
## the tr element of the review table has the internal paper ID in format "r4789577"
#review_table$internal_id <- sapply(X = html_nodes(review_status_page, css = ".paperTable tr[id]"), FUN = function(row) {
#  substr(html_attr(row, "id"), 2, 999)
#})

review_data <- left_join(submission_table, dasa_texts %>% select(-type),
                         by = "id")

if(params$private_info) {
  review_data %>%
    dplyr::filter(decision == "ACCEPT" | decision == "accept?") %>%
    filter(type == "Full-paper") %>%
    arrange(id) %>%
    kable() %>%
    kable_styling("striped") %>%
    scroll_box(height = "480px")
}
```

### How does acceptance relate to DASA section availability for full papers?

```{r accepted_dasa, echo=FALSE}
dasa_vs_accepted <- tibble(papers = c("submitted",
                  "submitted with DASA",
                  "accepted",
                  "accepted with DASA",
                  "rejected with DASA"),
       n = c(review_data %>%
               filter(type == "Full-paper") %>%
               nrow(.),
             review_data %>%
               filter(type == "Full-paper") %>%
               filter(!is.na(dasa_text)) %>% 
               nrow(.),
             review_data %>%
               dplyr::filter(decision == "ACCEPT") %>%
               filter(type == "Full-paper") %>%
               nrow(.),
             review_data %>%
               dplyr::filter(decision == "ACCEPT") %>%
               filter(type == "Full-paper") %>%
               filter(!is.na(dasa_text)) %>% 
               nrow(.),
             review_data %>%
               dplyr::filter(decision == "REJECT") %>% 
               filter(!is.na(dasa_text)) %>% 
               nrow(.))
)
dasa_vs_accepted %>%
  kable() %>%
  kable_styling("striped")
```

```{r accepted_dasa_barplot_label}
review_data_figure_label <- if(params$private_info) {
  "Barplot of Data and Software Availability sections across accepted and rejected full paper submissions"
} else {
  "Barplot of Data and Software Availability sections across accepted full paper submissions"
}
```

```{r accepted_dasa_barplot, echo=FALSE, fig.cap=review_data_figure_label, fig.height=4}
if(params$private_info) {
  review_data_figure_data <- review_data %>%
    mutate(`has DASA` = !is.na(dasa_text)) %>%
    filter(!is.na(decision)) %>%
    filter(type == "Full-paper") %>%
    group_by(type, `has DASA`, decision) %>%
    summarise(n = n())
} else {
  review_data_figure_data <- review_data %>%
    mutate(`has DASA` = !is.na(dasa_text)) %>%
    filter(!is.na(decision)) %>%
    filter(type == "Full-paper", decision == "ACCEPT") %>%
    group_by(type, `has DASA`, decision) %>%
    summarise(n = n())
}

review_data_figure <- review_data_figure_data  %>%
    ggplot(aes(x = decision, y = n, fill = `has DASA`)) +
    geom_bar(stat="identity", width = 0.5) + 
    scale_fill_brewer(palette="Paired") +
    labs(title = "AGILE 2020 Submissions: Transparency & Reproducibility",
         subtitle = "Data and Software Availability Sections (DASA) across full papers") +
    theme_minimal() + 
    theme(axis.title.y = element_text(angle = 360, hjust = 1, vjust = 0.5))

review_data_figure
```

`r if(!params$private_info) {"<!--"}`

## Reproducibility reviews

### About

The assignment of reviews is done via a privately shared spreadsheet, to handle potential non-public comments.
The main outcome of the reviews is a _report_, which is published in individual OSF projects as components of the [OSF project for the reproducibility reviews 2020](https://osf.io/6k5fh/).
The report should be based on a template from this repository in [`report-template`](report-template).
Find the details on the review process in [this document]https://docs.google.com/document/d/1JHCQV7GP3YkKwp0Nii3dt3p3Y45hU56Xz2cr-xJVz34/edit#).

### Prepare data for reviewers

The review comments for each paper are stored in an HTML file next to the PDF of the submission and uploaded to [a private share](https://drive.google.com/drive/folders/1FMsRlvouWPkpnHU4mJ6dIMDeTwSRgpUP) so that reproducibility reviewers can access them.
This file includes the full reviews, including the names of the original reviewers, and cannot be publicly shared.

```{r upload_settings}
review_data_csv_file <- file.path(review_files_path, "review_data.csv")
```

1. ~~Download PDFs to directory `r review_files_path`~~ Skipped: The camera-ready PDFs are managed outsdie of EasyChair and must be available locall in the directory `r cr_path`, from where they are copied to `r review_files_path`
2. Create HTML from review page and safe to `r review_files_path`
3. Write paper metadata (ID, decision, title) to a CSV file `r review_data_csv_file`
4. **Manually** [import the paper metadata into the spreadsheet](https://www.tillerhq.com/how-to-import-csv-into-a-google-spreadsheet/) (Select cell `A1` then "File" > "Import" > "Import File" > "My Drive" then search for `review_data` and find `review_data.csv` then select the file > "Replace data at selected cell" and click "Import data")

In the future, we could do some text mining to answer questions such as

- Do reviewers findings match the automatic detection of DASA?
- How many comments are there regarding reproducibility?

```{r accepted_fp_files_download_easychair, echo=FALSE, eval=FALSE}
# DO NOT RUN - latest PDFs are not on EasyChair, use next chunk
#if (params$private_info) {
#  review_files <- review_data %>%
#    filter(type == "Full-paper") %>%
#    dplyr::filter(decision != "REJECT") %>%
#    select(id, submission_id, decision, title, file, paper)
#  
#  # first, re-download all accepted full paper PDFs to make sure we have latest copies
#  for (i in 1:nrow(review_files)) {
#    current <- review_files[i,]
#    filename <- file.path(review_files_path, paste0(current$id, ".pdf"))
#    httr::GET(url = current$paper,
#                httr::write_disk(path = filename,
#                                 overwrite = TRUE))
#  }
#}
```

```{r update_camera_ready_file_paths, echo=FALSE}
review_files <- review_data %>%
  filter(type == "Full-paper") %>%
  dplyr::filter(decision == "ACCEPT") %>%
  dplyr::mutate(file = stringr::str_replace(.$file, pattern = "submissions", replacement = "camera-ready-full-papers")) %>%
  select(id, submission_id, decision, title, file, paper)

# copy camera ready files to upload directory
for (i in 1:nrow(review_files)) {
  current <- review_files[i,]
  file.copy(from = file.path(cr_path, paste0(current$id, ".pdf")),
            to = file.path(review_files_path, paste0(current$id, ".pdf")))
}
```

```{r review_data_csv, echo=FALSE}
readr::write_csv(review_files %>%
                   select(ID = id, Decision = decision, Title = title),
                 path = review_data_csv_file,
                 append = FALSE)
```

```{r reviewer_comments, echo=FALSE}
# get review contents for each paper
conference_id <- "24268236"

# https://easychair.org/conferences/submission_reviews?submission=4788883;a=24268236#{fr:EZIcStd1tVrf}
retrieve_review <- function(id, submission_id) {
  url <- parse_url("https://easychair.org/conferences/submission_reviews")
  url$query <- list(submission = submission_id, a = conference_id)
  response <- httr::GET(url = build_url(url))
  content <- content(response)
  page_title <- as.character(
    xml_contents(
      html_node(
        content(response), "title")))
  if(grepl("Log in", page_title))
     stop("You must (re)login to EasyChair")
  
  # check if id matches
  title_id <- str_pad(str_extract(page_title,
    "[[:digit:]]"),
    width = 3, side = "left", pad = "0")
  
  if(id != title_id)
    warning(paste("Ids mismatch, id: ", id, " id in reponse: ", title_id))
  
  # TODO replace lines with <td>PC member:</td> in the reviews for anonymity
  review_doc <- xml_new_root(xml_dtd(name = "html", external_id = "-//W3C//DTD XHTML 1.0 Transitional//EN", system_id = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"))
  review_head <- xml2::xml_add_child(review_doc, "head")
  review_style <- xml2::xml_add_child(review_head, "style")
  xml_text(review_style) <- "
  table, th, td {
    border: 1px solid black;
    padding: 5px;
  }
  table {
    margin-bottom: 20px;
  }"
  
  review_body <- xml2::xml_add_child(review_doc, "body")
  
  xml2::xml_add_child(review_body,
                      xml2::xml_find_first(content,
                                           xpath = "//h3[contains(., 'Submission')]/following-sibling::div"))
  
  review_content <- xml2::xml_find_all(content,
                                       xpath = "//h3[contains(., 'Reviews and Comments')]/following-sibling::div")
  for (i in c(1:length(review_content))) {
    xml2::xml_add_child(review_body, review_content[[i]])
  }
  
  reviews_html_path <- file.path(review_files_path, paste0(id, "_reviews.html"))
  xml2::write_html(review_doc, reviews_html_path)
  reviews_html_path
}

#retrieve_review(review_files[1,]$id, review_files[1,]$submission_id)

for(i in c(1:nrow(review_files))) {
  retrieve_review(review_files[i,]$id, review_files[i,]$submission_id)
}
```

```{r review_files_upload_to_share, echo=FALSE}
if (params$private_info) {
  # put accepted full papers to private share
  library("googledrive")
  googledrive::drive_auth(use_oob = TRUE)
  
  sapply(list.files(review_files_path, full.names = TRUE), function(the_file) {
     googledrive::drive_put(media = the_file,
                            name = basename(the_file),
                            path = "https://drive.google.com/drive/folders/1FMsRlvouWPkpnHU4mJ6dIMDeTwSRgpUP")
  })
  
  googledrive::drive_put(media = review_data_csv_file,
                         name = basename(review_data_csv_file),
                         path = "https://drive.google.com/drive/folders/1FMsRlvouWPkpnHU4mJ6dIMDeTwSRgpUP")
}
```

`r if(!params$private_info) {"-->"}`

### Reproducibility reviewer instructions

1. Familiarise yourself with the [AGILE Reproducibility Review Process](https://docs.google.com/document/d/1JHCQV7GP3YkKwp0Nii3dt3p3Y45hU56Xz2cr-xJVz34/edit#heading=h.oheeg2s92zdm); the following steps are just tl;dr version
2. Take a look at the [review report template](https://github.com/reproducible-agile/reviews-2020/blob/master/report-template/reproreview-template.Rmd)
3. Go to the [master spreadsheet](https://docs.google.com/spreadsheets/d/1K6_8NqDfXH5uI07LBmBmgnapVP2fYlvU2QcE84UeqD0/) and find your assignments
4. Conduct your reproducibility review and write the report
    - Don't forget to take a look at the scientific reviews for comments on reproducibility; do _not_ worry about the science or read the full paper, unless it really interests you
    - If code is available on GitHub, you can fork the project into the [Reproducible AGILE organisation](https://github.com/reproducible-agile/), same [on GitLab in the subgroup "reviews"](https://gitlab.com/reproducible-agile/reviews)
    - If need be, limit the review scope, e.g. reproduce only a specific figure; the reproducibility review should not take you longer (not counting computation time) than a scientific review
5. Send the report to the original authors of the paper and add the reproducibility chair in CC, see template below; _only if the authors agree_ with a publication of the report, proceed with the following steps
6. Add a new component to the [OSF project for 2020 reproducibility reviews](https://osf.io/6k5fh/)
    - Use the European storage location, "Frankfurt"
    - Name the component `Reproducibility review of: FULL PAPER TITLE`
    - Add all contributors to the project
    - Publish the component and mint a DOI
    - Add the DOI to your report
    - Upload a PDF of your report and any useful supplemental files
7. Add link to the OSF project in the master spreadsheet

#### Author contact template

```
Dear AUTHORS,

congratulations to the acceptance of your submission "TITLE" as a full paper at the AGILE conference 2020. As part of the Reproducible AGILE initative (https://reproducible-agile.github.io/) I attempted to reproduce the results from your paper. Attached to this email you find my report on your results.

As part of the AGILE reproducibility review (see https://docs.google.com/document/d/1JHCQV7GP3YkKwp0Nii3dt3p3Y45hU56Xz2cr-xJVz34/edit#heading=h.oheeg2s92zdm for details) I hereby ask you if you agree to the report being published on OSF (see https://osf.io/6k5fh/). A publication of the reproduction attempt gives credit to my work as a reproducibility reviewer and helps to improve future work.

[OPTIONAL:] Alongside the report I would like to publish an archive of the used data and script files, and the output files generated by myself. Note these would be published under a CC-BY license on OSF, though the original source and license are noted in the report.

Please don't hesitate to get in touch with me and Daniel Nüst (CC'ed), AGILE conference's Reproducibility Chair, if you have any questions.  Please also include your coauthors in any further communication as you see fit.

YOUR SALUTATION OF CHOICE,
NAME
```

```
Dear AUTHORS,

thank you for your participation in a real open science endeavour!

The reproducibility review report on your paper is now published at DOI URL HERE.

Please don't hesitate to get in touch with Daniel Nüst (CC'ed), AGILE conference's Reproducibility Chair, if you have any questions.

YOUR SALUTATION OF CHOICE,
NAME
```

## AGILE General Meeting

```{r general_meeting_figures, fig.height=4, fig.width=4}
data <- tibble(`reproduction attempts` = c(0, 8))

review_data_2019 <- tibble(#id = c(1:19),
                           `has DASA` = factor("FALSE", levels = c("FALSE", "TRUE")),
                           decision = "Accept",
                           n = 19)
review_data_2019 %>%
  ggplot(aes(x = decision, y = n, fill = `has DASA`)) +
  geom_bar(stat="identity", width = 0.5) + 
  scale_fill_brewer(palette="Paired", drop = FALSE) +
  labs(title = "AGILE 2019 Submissions",
       subtitle = "DASA sections in full paper submissions") +
  ylim(ggplot_build(review_data_figure)$layout$panel_scales_y[[1]]$range$range) +
  theme_minimal() + 
  theme(axis.title.y = element_text(angle = 360, hjust = 1, vjust = 0.5))
```


## Colophon

This document is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).
All contained code is licensed under the [Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/).

**Runtime environment description:**

```{r session_info, echo=FALSE}
sessionInfo()
```

**The used MRAN snapshot is `r paste(options("repos"))`**.

```{r upload_to_drive, eval=FALSE, include=FALSE}
# upload the HTML file and source code to the Reproducibility Committee shared folder
drive_put(media = "agile-2020-papers.html",
          name = paste0("agile-2020-papers_",
                 ifelse(params$private_info, "PRIVATE", "public"),
                 ".html"),
          path = "https://drive.google.com/drive/folders/1jyYj1hFqbR74D9ljjjcScR4lD3aWCO2U/")
drive_put("agile-2020-papers.Rmd", path = "https://drive.google.com/drive/folders/1jyYj1hFqbR74D9ljjjcScR4lD3aWCO2U/")
```

```{r render_public_version, eval=FALSE, include=FALSE}
rmarkdown::render(input = "agile-2020-papers.Rmd",
                  params = list(private_info = FALSE),
                  output_dir = here::here("docs/"),
                  output_format = rmarkdown::html_document(toc = TRUE, self_contained = FALSE))
```