Skip to content

Commit

Permalink
Correct all docs [no ci]
Browse files Browse the repository at this point in the history
  • Loading branch information
chainsawriot committed Jun 11, 2023
1 parent e60d5ac commit aba6e82
Show file tree
Hide file tree
Showing 11 changed files with 33 additions and 32 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,4 @@
^CODE_OF_CONDUCT\.md$
^\.github$
^rawdata/
^CRAN-SUBMISSION$
4 changes: 2 additions & 2 deletions R/oolong.R
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Oolong_test <- R6::R6Class(
#' @section Usage:
#'
#' Use \code{wi}, \code{ti}, \code{witi}, \code{wsi} or \code{gs} to generate an oolong test of your choice. It is recommended to supply also \code{userid} (current coder).
#' The names of the tests (word intrusion test and topic intrusion test) follow Chang et al (2009). In Ying et al. (forthcoming), topic intrusion test is named "T8WSI" (Top 8 Word Set Intrusion). Word set intrusion test in this package is actually the "R4WSI" (Random 4 Word Set Intrusion) in Lu et al (forthcoming). The default settings of \code{wi}, \code{witi}, and \code{ti} follow Chang et al (2009), e.g. \code{n_top_terms} = 5; instead of \code{n_top_terms} = 4 as in Lu et al (forthcoming). The default setting of \code{wsi} follows Ying et al. (forthcoming), e.g. \code{n_topiclabel_words} = 4.
#' The names of the tests (word intrusion test and topic intrusion test) follow Chang et al (2009). In Ying et al. (2021), topic intrusion test is named "T8WSI" (Top 8 Word Set Intrusion). Word set intrusion test in this package is actually the "R4WSI" (Random 4 Word Set Intrusion) in Ying et al. The default settings of \code{wi}, \code{witi}, and \code{ti} follow Chang et al (2009), e.g. \code{n_top_terms} = 5; instead of \code{n_top_terms} = 4 as in Ying et al. The default setting of \code{wsi} follows Ying et al., e.g. \code{n_topiclabel_words} = 4.
#' As suggested by Song et al. (2020), 1% of the articles from \code{input_corpus} are randomly selected as the test cases of both \code{ti} and \code{gs}, i.e. \code{frac} = 0.01. However, it is generally believed that this proportion is dependent of the size of \code{input_corpus}, e.g. it does not make sense to draw 1% of the articles from only 100 articles. Use \code{exact_n} in these cases.
#' @section About create_oolong:
#'
Expand Down Expand Up @@ -102,7 +102,7 @@ Oolong_test <- R6::R6Class(
#'
#' Song et al. (2020) In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication.
#'
#' Ying, L., Montgomery, J. M., & Stewart, B. M. (Forthcoming). Inferring concepts from topics: Towards procedures for validating topics as measures. Political Analysis
#' Ying, L., Montgomery, J. M., & Stewart, B. M. (2021). Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures. Political Analysis
#'
#' @export
create_oolong <- function(input_model = NULL, input_corpus = NULL, n_top_terms = 5, bottom_terms_percentile = 0.6, exact_n = NULL, frac = 0.01, n_top_topics = 3, n_topiclabel_words = 8, use_frex_words = FALSE, difficulty = 1, input_dfm = NULL, construct = "positive", btm_dataframe = NULL, n_correct_ws = 3, wsi_n_top_terms = 20, userid = NA, type = "witi") {
Expand Down
2 changes: 1 addition & 1 deletion R/oolong_summary.R
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ plot.oolong_summary <- function(x, ...) {
#'
#' Song et al. (2020) In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication.
#'
#' Ying, L., Montgomery, J. M., & Stewart, B. M. (Forthcoming). Inferring concepts from topics: Towards procedures for validating topics as measures. Political Analysis.
#' Ying, L., Montgomery, J. M., & Stewart, B. M. (2021). Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures. Political Analysis.
#' @export
summarize_oolong <- function(..., target_value = NULL, n_iter = 1500) {
obj_list <- list(...)
Expand Down
4 changes: 2 additions & 2 deletions btm_gh.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ with other topic models.
oolong <- create_oolong(trump_btm)
oolong
#>
#> ── oolong (topic model) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────
#> ✔ WI ✖ TI ✖ WSI
#> ℹ WI: k = 8, 0 coded.
#>
Expand All @@ -136,7 +136,7 @@ frame you used for training (in this case `trump_dat`). Your
oolong <- create_oolong(trump_btm, trump_corpus, btm_dataframe = trump_dat)
oolong
#>
#> ── oolong (topic model) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────
#> ✔ WI ✔ TI ✖ WSI
#> ℹ WI: k = 8, 0 coded.
#> ℹ TI: n = 20, 0 coded.
Expand Down
4 changes: 2 additions & 2 deletions deploy_gh.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ library(oolong)
wsi_test <- wsi(abstracts_keyatm)
wsi_test
#>
#> ── oolong (topic model) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────
#> ✖ WI ✖ TI ✔ WSI
#> ℹ WSI: n = 10, 0 coded.
#>
Expand Down Expand Up @@ -116,7 +116,7 @@ revert_oolong(wsi_test, "oolong_2021-05-22 20 51 26 Hadley Wickham.RDS")
```

#>
#> ── oolong (topic model) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────
#> ✖ WI ✖ TI ✔ WSI
#> ☺ Hadley Wickham
#> ℹ WSI: n = 10, 10 coded.
Expand Down
4 changes: 2 additions & 2 deletions man/create_oolong.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/summarize_oolong.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions overview_gh.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ summarize_oolong(oolong_test, target_value = all_afinn_score)

### Suggested workflow

Create an oolong object, clone it for another coder. According to Song et al. (Forthcoming), you should at least draw 1% of your data.
Create an oolong object, clone it for another coder. According to Song et al. (2020), you should at least draw 1% of your data.

```{r}
trump <- gs(input_corpus = trump2k, exact_n = 40, userid = "JJ")
Expand Down Expand Up @@ -439,7 +439,7 @@ Read the results. The diagnostic plot consists of 4 subplots. It is a good idea
* Subplot (bottom left): Raw correlation between target value and content length. One should want to have no correlation, as an indication of good reliability against the influence of content length. (See Chan et al. 2020)
* Subplot (bottom right): Cook's distance of all data point. One should want to have no dot (or at least very few dots) above the threshold. It is an indication of how the raw correlation between human judgement and target value can or cannot be influenced by extreme values in your data.

The textual output contains the Krippendorff's alpha of the codings by your raters. In order to claim validity of your target value, you must first establish the reliability of your gold standard. Song et al. [Forthcoming] suggest Krippendorff's Alpha > 0.7 as an acceptable cut-off.
The textual output contains the Krippendorff's alpha of the codings by your raters. In order to claim validity of your target value, you must first establish the reliability of your gold standard. Song et al. (2020) suggest Krippendorff's Alpha > 0.7 as an acceptable cut-off.

```{r}
res
Expand Down
Loading

0 comments on commit aba6e82

Please sign in to comment.