diff --git a/.Rbuildignore b/.Rbuildignore index 38500c6..c97d6d7 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -42,3 +42,6 @@ ^rawdata/ ^CRAN-SUBMISSION$ ^\.editorconfig$ +^_pkgdown\.yml$ +^docs$ +^pkgdown$ diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml new file mode 100644 index 0000000..85315dd --- /dev/null +++ b/.github/workflows/pkgdown.yaml @@ -0,0 +1,51 @@ +# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples +# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help + +on: + push: + branches: + - v* + pull_request: + branches: + - v* + release: + types: [published] + workflow_dispatch: + +name: pkgdown + +jobs: + pkgdown: + runs-on: ubuntu-latest + # Only restrict concurrency for non-PR jobs + concurrency: + group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + permissions: + contents: write + steps: + - uses: actions/checkout@v3 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::pkgdown, local::. + needs: website + + - name: Build site + run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) + shell: Rscript {0} + + - name: Deploy to GitHub pages πŸš€ + if: github.event_name != 'pull_request' + uses: JamesIves/github-pages-deploy-action@v4.4.1 + with: + clean: false + branch: gh-pages + folder: docs diff --git a/.gitignore b/.gitignore index eef8e90..8150162 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ doc Meta ^cran-comments\.md$ +docs diff --git a/DESCRIPTION b/DESCRIPTION index bf739d1..f5982fa 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -2,11 +2,12 @@ Package: oolong Title: Create Validation Tests for Automated Content Analysis Version: 0.5.1 Authors@R: - c(person(given = "Chung-hong", family = "Chan", role = c("aut", "cre"), email = "chainsawtiney@gmail.com", comment = c(ORCID = "0000-0002-6232-7530")), person(given = "Marius", family = "SΓ€ltzer", role = c("aut"), email = "msaeltze@mail.uni-mannheim.de", comment = c(ORCID = "0000-0002-8604-4666"))) + c(person(given = "Chung-hong", family = "Chan", role = c("aut", "cre"), email = "chainsawtiney@gmail.com", comment = c(ORCID = "0000-0002-6232-7530")), + person(given = "Marius", family = "SΓ€ltzer", role = c("aut"), email = "msaeltze@mail.uni-mannheim.de", comment = c(ORCID = "0000-0002-8604-4666"))) Description: Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, ) and word set intrusion (Ying et al. 2021) tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) . License: LGPL (>= 2.1) Encoding: UTF-8 -URL: https://github.com/gesistsa/oolong +URL: https://gesistsa.github.io/oolong, https://github.com/gesistsa/oolong LazyData: true Depends: R (>= 4.0) @@ -43,3 +44,4 @@ Suggests: BugReports: https://github.com/gesistsa/oolong/issues VignetteBuilder: knitr Config/testthat/edition: 3 +Config/Needs/website: gesistsa/tsatemplate diff --git a/Makefile b/Makefile index 14cd1f7..c1b9cea 100644 --- a/Makefile +++ b/Makefile @@ -1,37 +1,8 @@ -all: vignettes overviewgh build ghdocs +all: build .FORCE: -vignettes: .FORCE - cat vig_head.Rmd | sed 's/{title}/Overview/g' > vig_temp.Rmd - cat vig_temp.Rmd vig_body.Rmd > vignettes/overview.Rmd - rm vig_temp.Rmd - cat vig_head.Rmd | sed 's/{title}/BTM/g' > vig_temp.Rmd - cat vig_temp.Rmd btm.Rmd > vignettes/btm.Rmd - rm vig_temp.Rmd - cat vig_head.Rmd | sed 's/{title}/Deploy/g' > vig_temp.Rmd - cat vig_temp.Rmd deploy.Rmd > vignettes/deploy.Rmd - rm vig_temp.Rmd - -overviewgh: vignettes - cat gh_head.Rmd | sed 's/{title}/Overview/g' > gh_temp.Rmd - cat gh_temp.Rmd vig_body.Rmd > overview_gh.Rmd - rm gh_temp.Rmd - cat gh_head.Rmd | sed 's/{title}/BTM/g' > gh_temp.Rmd - cat gh_temp.Rmd btm.Rmd > btm_gh.Rmd - rm gh_temp.Rmd - cat gh_head.Rmd | sed 's/{title}/Deploy/g' > gh_temp.Rmd - cat gh_temp.Rmd deploy.Rmd > gh_temp2.Rmd - cat gh_temp2.Rmd | sed 's;figures;vignettes/figures;g' > deploy_gh.Rmd - rm gh_temp.Rmd gh_temp2.Rmd - -build: vignettes +build: Rscript -e "devtools::document()" Rscript -e "devtools::install(quick = TRUE, upgrade = 'never')" Rscript -e "devtools::check()" - -ghdocs: overviewgh - Rscript -e "rmarkdown::render('README.Rmd')" - Rscript -e "rmarkdown::render('overview_gh.Rmd')" - Rscript -e "rmarkdown::render('btm_gh.Rmd')" - Rscript -e "rmarkdown::render('deploy_gh.Rmd')" diff --git a/README.Rmd b/README.Rmd index 37dfa73..a09166e 100644 --- a/README.Rmd +++ b/README.Rmd @@ -14,7 +14,7 @@ knitr::opts_chunk$set( set.seed(42) ``` -# oolong +# oolong [![Codecov test coverage](https://codecov.io/gh/gesistsa/oolong/branch/v0.5/graph/badge.svg)](https://codecov.io/gh/gesistsa/oolong?branch=v0.5) @@ -26,7 +26,7 @@ set.seed(42) The goal of oolong ^[/ˈuːlΚŠΕ‹/ 烏龍, literally means "Dark Dragon", is a semi-oxidized tea from Asia. It is very popular in Taiwan, Japan and Hong Kong. In Cantonese and Taiwanese Mandarin, the same word can also mean "confused". It perfectly captures the spirit of human-in-the-loop validation. ] is to generate and administrate validation tests easily for typical automated content analysis tools such as topic models and dictionary-based tools. -Please refer to the [overview](overview_gh.md) for an introduction to this package. If you need to deploy the test online, please refer to the [Deployment Vignette](deploy_gh.md). If you use BTM, please refer to the [BTM Vignette](btm_gh.md). +Please refer to the [overview](https://gesistsa.github.io/oolong/articles/overview.html) for an introduction to this package. If you need to deploy the test online, please refer to the [Deployment Vignette](https://gesistsa.github.io/oolong/articles/deploy.html). If you use BTM, please refer to the [BTM Vignette](https://gesistsa.github.io/oolong/articles/btm.html). ## Citation diff --git a/README.md b/README.md index 9dd23cf..16ff262 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ -# oolong +# oolong @@ -19,10 +19,13 @@ The goal of oolong \[1\] is to generate and administrate validation tests easily for typical automated content analysis tools such as topic models and dictionary-based tools. -Please refer to the [overview](overview_gh.md) for an introduction to -this package. If you need to deploy the test online, please refer to the -[Deployment Vignette](deploy_gh.md). If you use BTM, please refer to the -[BTM Vignette](btm_gh.md). +Please refer to the +[overview](https://gesistsa.github.io/oolong/articles/overview.html) for +an introduction to this package. If you need to deploy the test online, +please refer to the [Deployment +Vignette](https://gesistsa.github.io/oolong/articles/deploy.html). If +you use BTM, please refer to the [BTM +Vignette](https://gesistsa.github.io/oolong/articles/btm.html). ## Citation diff --git a/_pkgdown.yml b/_pkgdown.yml new file mode 100644 index 0000000..da1fee5 --- /dev/null +++ b/_pkgdown.yml @@ -0,0 +1,3 @@ +url: https://gesistsa.github.io/oolong/ +template: + package: tsatemplate diff --git a/btm.Rmd b/btm.Rmd deleted file mode 100644 index 3a0f05b..0000000 --- a/btm.Rmd +++ /dev/null @@ -1,94 +0,0 @@ - -The package BTM by Jan Wijffels et al. finds "topics in collections of short text". Compared to other topic model packages, BTM requires a special data format for training. Oolong has no problem generating word intrusion tests with BTM. However, that special data format can make creation of topic intrusion tests very tricky. - -This guide provides our recommendations on how to use BTM, so that the model can be used for generating topic intrusion tests. - -# Requirement #1: Keep your quanteda corpus - -It is because every document has a unique document id. - -```{r} -require(BTM) -require(quanteda) -require(oolong) -trump_corpus <- corpus(trump2k) -``` - -And then you can do regular text cleaning, stemming procedure with `quanteda`. Instead of making the product a `DFM` object, make it a `token` object. You may read [this issue](https://github.com/quanteda/quanteda/issues/1404) by Benoit et al. - -```{r} -tokens(trump_corpus, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE, split_hyphens = TRUE, remove_url = TRUE) %>% tokens_tolower() %>% tokens_remove(stopwords("en")) %>% tokens_remove("@*") -> trump_toks -``` - -# Requirement #2: Keep your data frame - -Use this function to convert the `token` object to a data frame. - -```{r} -as.data.frame.tokens <- function(x) { - data.frame( - doc_id = rep(names(x), lengths(x)), - tokens = unlist(x, use.names = FALSE) - ) -} - -trump_dat <- as.data.frame.tokens(trump_toks) -``` - -Train a BTM model - -```{r, message = FALSE, results = 'hide', warning = FALSE} -trump_btm <- BTM(trump_dat, k = 8, iter = 500, trace = 10) -``` - -## Pecularities of BTM - -This is how you should generate $\theta_{t}$ . However, there are many NaN and there are only 1994 rows (`trump2k` has 2000 tweets) due to empty documents. - -```{r} -theta <- predict(trump_btm, newdata = trump_dat) -dim(theta) -``` - -```{r} -setdiff(docid(trump_corpus), row.names(theta)) -``` - -```{r} -trump_corpus[604] -``` - -Also, the row order is messed up. - -```{r} -head(row.names(theta), 100) -``` - - -# Oolong's support for BTM - -Oolong has no problem generating word intrusion test for BTM like you do with other topic models. - -```{r} -oolong <- create_oolong(trump_btm) -oolong -``` - -For generating topic intrusion tests, however, you must provide the data frame you used for training (in this case `trump_dat`). Your `input_corpus` must be a quanteda corpus too. - -```{r} -oolong <- create_oolong(trump_btm, trump_corpus, btm_dataframe = trump_dat) -oolong -``` - -`btm_dataframe` must not be NULL. - -```{r, error = TRUE} -oolong <- create_oolong(trump_btm, trump_corpus) -``` - -`input_corpus` must be a quanteda corpus. - -```{r, error = TRUE} -oolong <- create_oolong(trump_btm, trump2k, btm_dataframe = trump_dat) -``` diff --git a/btm_gh.Rmd b/btm_gh.Rmd deleted file mode 100644 index 416379a..0000000 --- a/btm_gh.Rmd +++ /dev/null @@ -1,110 +0,0 @@ ---- -title: "BTM" -output: github_document -author: - - Chung-hong Chan ^[GESIS] ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.path = "man/figures/README-", - out.width = "100%" - ) -set.seed(42) -``` - -The package BTM by Jan Wijffels et al. finds "topics in collections of short text". Compared to other topic model packages, BTM requires a special data format for training. Oolong has no problem generating word intrusion tests with BTM. However, that special data format can make creation of topic intrusion tests very tricky. - -This guide provides our recommendations on how to use BTM, so that the model can be used for generating topic intrusion tests. - -# Requirement #1: Keep your quanteda corpus - -It is because every document has a unique document id. - -```{r} -require(BTM) -require(quanteda) -require(oolong) -trump_corpus <- corpus(trump2k) -``` - -And then you can do regular text cleaning, stemming procedure with `quanteda`. Instead of making the product a `DFM` object, make it a `token` object. You may read [this issue](https://github.com/quanteda/quanteda/issues/1404) by Benoit et al. - -```{r} -tokens(trump_corpus, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE, split_hyphens = TRUE, remove_url = TRUE) %>% tokens_tolower() %>% tokens_remove(stopwords("en")) %>% tokens_remove("@*") -> trump_toks -``` - -# Requirement #2: Keep your data frame - -Use this function to convert the `token` object to a data frame. - -```{r} -as.data.frame.tokens <- function(x) { - data.frame( - doc_id = rep(names(x), lengths(x)), - tokens = unlist(x, use.names = FALSE) - ) -} - -trump_dat <- as.data.frame.tokens(trump_toks) -``` - -Train a BTM model - -```{r, message = FALSE, results = 'hide', warning = FALSE} -trump_btm <- BTM(trump_dat, k = 8, iter = 500, trace = 10) -``` - -## Pecularities of BTM - -This is how you should generate $\theta_{t}$ . However, there are many NaN and there are only 1994 rows (`trump2k` has 2000 tweets) due to empty documents. - -```{r} -theta <- predict(trump_btm, newdata = trump_dat) -dim(theta) -``` - -```{r} -setdiff(docid(trump_corpus), row.names(theta)) -``` - -```{r} -trump_corpus[604] -``` - -Also, the row order is messed up. - -```{r} -head(row.names(theta), 100) -``` - - -# Oolong's support for BTM - -Oolong has no problem generating word intrusion test for BTM like you do with other topic models. - -```{r} -oolong <- create_oolong(trump_btm) -oolong -``` - -For generating topic intrusion tests, however, you must provide the data frame you used for training (in this case `trump_dat`). Your `input_corpus` must be a quanteda corpus too. - -```{r} -oolong <- create_oolong(trump_btm, trump_corpus, btm_dataframe = trump_dat) -oolong -``` - -`btm_dataframe` must not be NULL. - -```{r, error = TRUE} -oolong <- create_oolong(trump_btm, trump_corpus) -``` - -`input_corpus` must be a quanteda corpus. - -```{r, error = TRUE} -oolong <- create_oolong(trump_btm, trump2k, btm_dataframe = trump_dat) -``` diff --git a/btm_gh.md b/btm_gh.md deleted file mode 100644 index 31374fe..0000000 --- a/btm_gh.md +++ /dev/null @@ -1,163 +0,0 @@ -BTM -================ -Chung-hong Chan - -The package BTM by Jan Wijffels et al.Β finds β€œtopics in collections of -short text”. Compared to other topic model packages, BTM requires a -special data format for training. Oolong has no problem generating word -intrusion tests with BTM. However, that special data format can make -creation of topic intrusion tests very tricky. - -This guide provides our recommendations on how to use BTM, so that the -model can be used for generating topic intrusion tests. - -# Requirement \#1: Keep your quanteda corpus - -It is because every document has a unique document id. - -``` r -require(BTM) -#> Loading required package: BTM -require(quanteda) -#> Loading required package: quanteda -#> Package version: 3.3.1 -#> Unicode version: 14.0 -#> ICU version: 70.1 -#> Parallel computing: 8 of 8 threads used. -#> See https://quanteda.io for tutorials and examples. -require(oolong) -#> Loading required package: oolong -trump_corpus <- corpus(trump2k) -``` - -And then you can do regular text cleaning, stemming procedure with -`quanteda`. Instead of making the product a `DFM` object, make it a -`token` object. You may read [this -issue](https://github.com/quanteda/quanteda/issues/1404) by Benoit et -al. - -``` r -tokens(trump_corpus, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE, split_hyphens = TRUE, remove_url = TRUE) %>% tokens_tolower() %>% tokens_remove(stopwords("en")) %>% tokens_remove("@*") -> trump_toks -``` - -# Requirement \#2: Keep your data frame - -Use this function to convert the `token` object to a data frame. - -``` r -as.data.frame.tokens <- function(x) { - data.frame( - doc_id = rep(names(x), lengths(x)), - tokens = unlist(x, use.names = FALSE) - ) -} - -trump_dat <- as.data.frame.tokens(trump_toks) -``` - -Train a BTM model - -``` r -trump_btm <- BTM(trump_dat, k = 8, iter = 500, trace = 10) -``` - -## Pecularities of BTM - -This is how you should generate -![\\theta\_{t}](https://latex.codecogs.com/png.image?%5Cdpi%7B110%7D&space;%5Cbg_white&space;%5Ctheta_%7Bt%7D -"\\theta_{t}") . However, there are many NaN and there are only 1994 -rows (`trump2k` has 2000 tweets) due to empty documents. - -``` r -theta <- predict(trump_btm, newdata = trump_dat) -dim(theta) -#> [1] 1993 8 -``` - -``` r -setdiff(docid(trump_corpus), row.names(theta)) -#> [1] "text604" "text624" "text633" "text659" "text1586" "text1587" "text1761" -``` - -``` r -trump_corpus[604] -#> Corpus consisting of 1 document. -#> text604 : -#> "http://t.co/PtViAyrO4A" -``` - -Also, the row order is messed up. - -``` r -head(row.names(theta), 100) -#> [1] "text1" "text10" "text100" "text1000" "text1001" "text1002" -#> [7] "text1003" "text1004" "text1005" "text1006" "text1007" "text1008" -#> [13] "text1009" "text101" "text1010" "text1011" "text1012" "text1013" -#> [19] "text1014" "text1015" "text1016" "text1017" "text1018" "text1019" -#> [25] "text102" "text1020" "text1021" "text1022" "text1023" "text1024" -#> [31] "text1025" "text1026" "text1027" "text1028" "text1029" "text103" -#> [37] "text1030" "text1031" "text1032" "text1033" "text1034" "text1035" -#> [43] "text1036" "text1037" "text1038" "text1039" "text104" "text1040" -#> [49] "text1041" "text1042" "text1043" "text1044" "text1045" "text1046" -#> [55] "text1047" "text1048" "text1049" "text105" "text1050" "text1051" -#> [61] "text1052" "text1053" "text1054" "text1055" "text1056" "text1057" -#> [67] "text1058" "text1059" "text106" "text1060" "text1061" "text1062" -#> [73] "text1063" "text1064" "text1065" "text1066" "text1067" "text1068" -#> [79] "text1069" "text107" "text1070" "text1071" "text1072" "text1073" -#> [85] "text1074" "text1075" "text1076" "text1077" "text1078" "text1079" -#> [91] "text108" "text1080" "text1081" "text1082" "text1083" "text1084" -#> [97] "text1085" "text1086" "text1087" "text1088" -``` - -# Oolong’s support for BTM - -Oolong has no problem generating word intrusion test for BTM like you do -with other topic models. - -``` r -oolong <- create_oolong(trump_btm) -oolong -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ” WI βœ– TI βœ– WSI -#> β„Ή WI: k = 8, 0 coded. -#> -#> ── Methods ── -#> -#> β€’ <$do_word_intrusion_test()>: do word intrusion test -#> β€’ <$lock()>: finalize and see the results -``` - -For generating topic intrusion tests, however, you must provide the data -frame you used for training (in this case `trump_dat`). Your -`input_corpus` must be a quanteda corpus too. - -``` r -oolong <- create_oolong(trump_btm, trump_corpus, btm_dataframe = trump_dat) -oolong -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ” WI βœ” TI βœ– WSI -#> β„Ή WI: k = 8, 0 coded. -#> β„Ή TI: n = 20, 0 coded. -#> -#> ── Methods ── -#> -#> β€’ <$do_word_intrusion_test()>: do word intrusion test -#> β€’ <$do_topic_intrusion_test()>: do topic intrusion test -#> β€’ <$lock()>: finalize and see the results -``` - -`btm_dataframe` must not be NULL. - -``` r -oolong <- create_oolong(trump_btm, trump_corpus) -#> Error: You need to provide input_corpus (in quanteda format) and btm_dataframe for generating topic intrusion tests. -``` - -`input_corpus` must be a quanteda corpus. - -``` r -oolong <- create_oolong(trump_btm, trump2k, btm_dataframe = trump_dat) -#> Error: You need to provide input_corpus (in quanteda format) and btm_dataframe for generating topic intrusion tests. -``` diff --git a/deploy.Rmd b/deploy.Rmd deleted file mode 100644 index 7835342..0000000 --- a/deploy.Rmd +++ /dev/null @@ -1,81 +0,0 @@ -In oolong 0.3.22, functions for deploying oolong tests were added (`export_oolong`, `revert_oolong` etc.). These functions make it possible for the coders to conduct validation tests online using their browser, rather than having to install R on their computer. - -The basic workflow is simple: 1) create the oolong test object as usual; 2) deploy the test online and obtain the URL to the test; 3) ask your coders to conduct the test online and send back the data file; 4) revert back from the data file to an oolong object. - -# Create an oolong test - -Please note that one cannot deploy oolong test objects with *both* word and topic intrusion tests, i.e. those created using `witi()` online. If you need to do both tests, you need to deploy them as two separate instances: one created using `wi()` and another created using `ti()`. - -In this guide, we assume you want to deploy a word set intrusion test online. - -```{r} -library(oolong) -wsi_test <- wsi(abstracts_keyatm) -wsi_test -``` - -# Deploy the test online - -First, you need to export the oolong test object as a stand alone Shiny app. This stand alone Shiny app will be in a directory. - -```{r} -export_oolong(wsi_test, dir = "./wsi_test", use_full_path = FALSE) -``` - -The directory has only two files - -```{r} -fs::dir_tree("./wsi_test") -``` - -This structure is called ["Single-file Shiny app."](https://shiny.rstudio.com/articles/app-formats.html) Experienced Shiny users might have their preferred method of deploying this app to whatever Shiny server they can master. - -For less experienced users, the simplest way to deploy this app online is to use [shinyapps.io](https://www.shinyapps.io/) (free tier available with 25 hours of computational time per month). Please register for an account at shinyapps.io and configure rsconnect. Please refer to [this guide](https://shiny.rstudio.com/articles/shinyapps.html) for more information. Please remember to configure the tokens. - -```r -## replace , , with the information from your profile on Shinyapps.io: click Your name -> Tokens -rsconnect::setAccountInfo(name="", token="", secret="") -``` - -For RStudio users, the simplest way to deploy the app to shinyapps.io is to first launch the app. - -```{r, eval = FALSE} -library(shiny) -runApp("./wsi_test") -``` - -And then click the **Publish** button at the right corner of the launched window. - -You will be asked for the title of the app, just give it a name, e.g. *wsi_test*. You probably can keep other default settings and push the **Publish** button to initialize the deployment process. - - - -If there is no hiccup, you will get a URL to your deployed oolong test. Something like: *https://yourname.shinyapps.io/wsi_test/* - -# Conduct the test - -You can give the URL to your coders and they conduct the test with their browser online. The only difference of the deployed version is that, there will be a userid prompt and download button after the coding. - - - -You should instruct your coders to download the data file after coding and return it to you. ^[Future versions might provide permanent storage] - -# Revert - -You can then obtain a locked oolong object from the original oolong and the downloaded data file. `revert_oolong` will do verifications with the original oolong object to make sure no error and no cheating. - -```{r, include = FALSE} -wsi_test <- readRDS(system.file("extdata", "wsi_test.RDS", package = "oolong")) -``` - -```r -revert_oolong(wsi_test, "oolong_2021-05-22 20 51 26 Hadley Wickham.RDS") -``` - -```{r, echo = FALSE} -revert_oolong(wsi_test, system.file("extdata", "hadley.RDS", package = "oolong")) -``` - -```{r, include = FALSE} -unlink("./wsi_test", recursive = TRUE) -``` diff --git a/deploy_gh.Rmd b/deploy_gh.Rmd deleted file mode 100644 index 3b364d5..0000000 --- a/deploy_gh.Rmd +++ /dev/null @@ -1,97 +0,0 @@ ---- -title: "Deploy" -output: github_document -author: - - Chung-hong Chan ^[GESIS] ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.path = "man/vignettes/figures/README-", - out.width = "100%" - ) -set.seed(42) -``` -In oolong 0.3.22, functions for deploying oolong tests were added (`export_oolong`, `revert_oolong` etc.). These functions make it possible for the coders to conduct validation tests online using their browser, rather than having to install R on their computer. - -The basic workflow is simple: 1) create the oolong test object as usual; 2) deploy the test online and obtain the URL to the test; 3) ask your coders to conduct the test online and send back the data file; 4) revert back from the data file to an oolong object. - -# Create an oolong test - -Please note that one cannot deploy oolong test objects with *both* word and topic intrusion tests, i.e. those created using `witi()` online. If you need to do both tests, you need to deploy them as two separate instances: one created using `wi()` and another created using `ti()`. - -In this guide, we assume you want to deploy a word set intrusion test online. - -```{r} -library(oolong) -wsi_test <- wsi(abstracts_keyatm) -wsi_test -``` - -# Deploy the test online - -First, you need to export the oolong test object as a stand alone Shiny app. This stand alone Shiny app will be in a directory. - -```{r} -export_oolong(wsi_test, dir = "./wsi_test", use_full_path = FALSE) -``` - -The directory has only two files - -```{r} -fs::dir_tree("./wsi_test") -``` - -This structure is called ["Single-file Shiny app."](https://shiny.rstudio.com/articles/app-formats.html) Experienced Shiny users might have their preferred method of deploying this app to whatever Shiny server they can master. - -For less experienced users, the simplest way to deploy this app online is to use [shinyapps.io](https://www.shinyapps.io/) (free tier available with 25 hours of computational time per month). Please register for an account at shinyapps.io and configure rsconnect. Please refer to [this guide](https://shiny.rstudio.com/articles/shinyapps.html) for more information. Please remember to configure the tokens. - -```r -## replace , , with the information from your profile on Shinyapps.io: click Your name -> Tokens -rsconnect::setAccountInfo(name="", token="", secret="") -``` - -For RStudio users, the simplest way to deploy the app to shinyapps.io is to first launch the app. - -```{r, eval = FALSE} -library(shiny) -runApp("./wsi_test") -``` - -And then click the **Publish** button at the right corner of the launched window. - -You will be asked for the title of the app, just give it a name, e.g. *wsi_test*. You probably can keep other default settings and push the **Publish** button to initialize the deployment process. - - - -If there is no hiccup, you will get a URL to your deployed oolong test. Something like: *https://yourname.shinyapps.io/wsi_test/* - -# Conduct the test - -You can give the URL to your coders and they conduct the test with their browser online. The only difference of the deployed version is that, there will be a userid prompt and download button after the coding. - - - -You should instruct your coders to download the data file after coding and return it to you. ^[Future versions might provide permanent storage] - -# Revert - -You can then obtain a locked oolong object from the original oolong and the downloaded data file. `revert_oolong` will do verifications with the original oolong object to make sure no error and no cheating. - -```{r, include = FALSE} -wsi_test <- readRDS(system.file("extdata", "wsi_test.RDS", package = "oolong")) -``` - -```r -revert_oolong(wsi_test, "oolong_2021-05-22 20 51 26 Hadley Wickham.RDS") -``` - -```{r, echo = FALSE} -revert_oolong(wsi_test, system.file("extdata", "hadley.RDS", package = "oolong")) -``` - -```{r, include = FALSE} -unlink("./wsi_test", recursive = TRUE) -``` diff --git a/deploy_gh.md b/deploy_gh.md deleted file mode 100644 index 2769eab..0000000 --- a/deploy_gh.md +++ /dev/null @@ -1,128 +0,0 @@ -Deploy -================ -Chung-hong Chan - -In oolong 0.3.22, functions for deploying oolong tests were added -(`export_oolong`, `revert_oolong` etc.). These functions make it -possible for the coders to conduct validation tests online using their -browser, rather than having to install R on their computer. - -The basic workflow is simple: 1) create the oolong test object as usual; -2) deploy the test online and obtain the URL to the test; 3) ask your -coders to conduct the test online and send back the data file; 4) revert -back from the data file to an oolong object. - -# Create an oolong test - -Please note that one cannot deploy oolong test objects with *both* word -and topic intrusion tests, i.e.Β those created using `witi()` online. If -you need to do both tests, you need to deploy them as two separate -instances: one created using `wi()` and another created using `ti()`. - -In this guide, we assume you want to deploy a word set intrusion test -online. - -``` r -library(oolong) -wsi_test <- wsi(abstracts_keyatm) -wsi_test -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ– WI βœ– TI βœ” WSI -#> β„Ή WSI: n = 10, 0 coded. -#> -#> ── Methods ── -#> -#> β€’ <$do_word_set_intrusion_test()>: do word set intrusion test -#> β€’ <$lock()>: finalize and see the results -``` - -# Deploy the test online - -First, you need to export the oolong test object as a stand alone Shiny -app. This stand alone Shiny app will be in a directory. - -``` r -export_oolong(wsi_test, dir = "./wsi_test", use_full_path = FALSE) -#> β„Ή The Shiny has been written to the directory: ./wsi_test -#> β„Ή You can test the app with: shiny::runApp("./wsi_test") -``` - -The directory has only two files - -``` r -fs::dir_tree("./wsi_test") -#> ./wsi_test -#> β”œβ”€β”€ app.R -#> └── oolong.RDS -``` - -This structure is called [β€œSingle-file Shiny -app.”](https://shiny.rstudio.com/articles/app-formats.html) -Experienced Shiny users might have their preferred method of deploying -this app to whatever Shiny server they can master. - -For less experienced users, the simplest way to deploy this app online -is to use [shinyapps.io](https://www.shinyapps.io/) (free tier available -with 25 hours of computational time per month). Please register for an -account at shinyapps.io and configure rsconnect. Please refer to [this -guide](https://shiny.rstudio.com/articles/shinyapps.html) for more -information. Please remember to configure the tokens. - -``` r -## replace , , with the information from your profile on Shinyapps.io: click Your name -> Tokens -rsconnect::setAccountInfo(name="", token="", secret="") -``` - -For RStudio users, the simplest way to deploy the app to shinyapps.io is -to first launch the app. - -``` r -library(shiny) -runApp("./wsi_test") -``` - -And then click the **Publish** button at the right corner of the -launched window. - -You will be asked for the title of the app, just give it a name, -e.g.Β *wsi\_test*. You probably can keep other default settings and push -the **Publish** button to initialize the deployment process. - - - -If there is no hiccup, you will get a URL to your deployed oolong test. -Something like: ** - -# Conduct the test - -You can give the URL to your coders and they conduct the test with their -browser online. The only difference of the deployed version is that, -there will be a userid prompt and download button after the coding. - - - -You should instruct your coders to download the data file after coding -and return it to you. \[1\] - -# Revert - -You can then obtain a locked oolong object from the original oolong and -the downloaded data file. `revert_oolong` will do verifications with the -original oolong object to make sure no error and no cheating. - -``` r -revert_oolong(wsi_test, "oolong_2021-05-22 20 51 26 Hadley Wickham.RDS") -``` - - #> - #> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── - #> βœ– WI βœ– TI βœ” WSI - #> ☺ Hadley Wickham - #> β„Ή WSI: n = 10, 10 coded. - #> - #> ── Results: ── - #> - #> β„Ή 80% precision (WSI) - -1. Future versions might provide permanent storage diff --git a/gh_head.Rmd b/gh_head.Rmd deleted file mode 100644 index 4e2c04d..0000000 --- a/gh_head.Rmd +++ /dev/null @@ -1,16 +0,0 @@ ---- -title: "{title}" -output: github_document -author: - - Chung-hong Chan ^[GESIS] ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.path = "man/figures/README-", - out.width = "100%" - ) -set.seed(42) -``` diff --git a/man/figures/oolong_logo.png b/man/figures/oolong_logo.png new file mode 100644 index 0000000..9579a22 Binary files /dev/null and b/man/figures/oolong_logo.png differ diff --git a/overview_gh.Rmd b/overview_gh.Rmd deleted file mode 100644 index f419760..0000000 --- a/overview_gh.Rmd +++ /dev/null @@ -1,473 +0,0 @@ ---- -title: "Overview" -output: github_document -author: - - Chung-hong Chan ^[GESIS] ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.path = "man/figures/README-", - out.width = "100%" - ) -set.seed(42) -``` -The validation test is called "oolong test" (for reading tea leaves). This package provides several functions for generating different types of oolong test. - -| function | purpose | -|---------:|:----------------------------------------------------------------------------------------------------------------------------------| -| `wi()` | validating a topic model with [word intrusion test](#word-intrusion-test) (Chang et al., 2008) | -| `ti()` | validating a topic model with [topic intrusion test](#topic-intrusion-test) (Chang et al., 2008; aka "T8WSI" in Ying et al. 2021) | -| `witi()` | validating a topic model with [word intrusion test](#word-intrusion-test) and [topic intrusion test](#topic-intrusion-test) | -| `wsi()` | validating a topic model with [word set intrusion test](#word-set-intrusion-test) (Ying et al. 2021) | -| `gs()` | oolong test for [creating gold standard](#creating-gold-standard) (see Song et al., 2020) | - -All of these tests can also be generated with the function [`create_oolong`](#backward-compatibility). As of version 0.3.20, it is no longer recommended. - -## Installation - -Because the package is constantly changing, we suggest using the development version from [GitHub](https://github.com/): - -``` r -# install.packages("devtools") -devtools::install_github("chainsawriot/oolong") -``` - -You can also install the "stable" (but slightly older) version from CRAN: - -```r -install.packages("oolong") -``` - -## Validating Topic Models - -#### Word intrusion test - -`abstracts_keyatm` is an example topic model trained with the data `abstracts` using the `keyATM` package. Currently, this package supports structural topic models / correlated topic models from `stm`, Warp LDA models from `text2vec` , LDA/CTM models from `topicmodels`, Biterm Topic Models from `BTM`, Keyword Assisted Topic Models from `keyATM`, and seeded LDA models from `seededlda`. Although not strictly a topic model, Naive Bayes models from `quanteda.textmodels` are also supported. See the section on [Naive Bayes](#about-naive-bayes) for more information. - -```{r} -library(oolong) -library(keyATM) -library(quanteda) -library(dplyr) -``` - -```{r example} -abstracts_keyatm -``` - -To create an oolong test with word intrusion test, use the function `wi`. It is recommended to provide a user id of coder who are going to be doing the test. - -```{r createtest} -oolong_test <- wi(abstracts_keyatm, userid = "Hadley") -oolong_test -``` - -As instructed, use the method `$do_word_intrusion_test()` to start coding. - -```{r, eval = FALSE} -oolong_test$do_word_intrusion_test() -``` - -You can pause the test by clicking the "Exit" button. Your progress will be recorded in the object. If you want to save your progress, just save the object (e.g. `saveRDS(oolong_test, "oolong_test.RDS")`). To resume the test, launch the test again. - -After the coding (all items are coded), you need to press the "Exit" button to quit the coding interface and then lock the test. Then, you can look at the model precision by printing the oolong test. - -```{r, include = FALSE} -### Mock this process -oolong_test$.__enclos_env__$private$test_content$wi$answer <- oolong_test$.__enclos_env__$private$test_content$wi$intruder -oolong_test$.__enclos_env__$private$test_content$wi$answer[1] <- "wronganswer" -``` - -```{r lock} -oolong_test$lock() -oolong_test -``` - -#### Word set intrusion test - -Word set intrusion test is a variant of word intrusion test (Ying et al., 2021), in which multiple word sets generated from top terms of one topic are juxtaposed with one intruder word set generated similarly from another topic. In Ying et al., this test is called "R4WSI" because 4 word sets are displayed. By default, oolong generates also R4WSI. However, it is also possible to generate R(N)WSI by setting the parameter `n_correct_ws` to N - 1. - -```{r wsi1} -oolong_test <- wsi(abstracts_keyatm, userid = "Garrett") -oolong_test -``` - -Use the method `$do_word_set_intrusion_test()` to start coding. - -```{r wsi2, eval = FALSE} -oolong_test$do_word_set_intrusion_test() -``` - -```{r, include = FALSE} -### Mock this process -oolong_test$.__enclos_env__$private$test_content$wsi$answer <- oolong_test$.__enclos_env__$private$test_content$wsi$intruder -oolong_test$.__enclos_env__$private$test_content$wsi$answer[1] <- "wronganswer" -``` - -```{r wsi3} -oolong_test$lock() -oolong_test -``` - -#### Topic intrusion test - -For example, `abstracts_keyatm` was generated with the corpus `abstracts$text` - -```{r newgroup5} -library(tibble) -abstracts -``` - -Creating the oolong test object with the corpus used for training the topic model will generate topic intrusion test cases. - -```{r createtest2} -oolong_test <- ti(abstracts_keyatm, abstracts$text, userid = "Julia") -oolong_test -``` - -Similarly, use the `$do_topic_intrusion_test` to code the test cases, lock the test with `$lock()` and then you can look at the TLO (topic log odds) value by printing the oolong test. - -```{r, eval = FALSE} -oolong_test$do_topic_intrusion_test() -oolong_test$lock() -``` - -```{r, include = FALSE} -genius_topic <- function(obj1) { - obj1$.__enclos_env__$private$test_content$ti$answer <- obj1$.__enclos_env__$private$test_content$ti$intruder - return(obj1) -} -genius_word <- function(obj1) { - obj1$.__enclos_env__$private$test_content$wi$answer <- obj1$.__enclos_env__$private$test_content$wi$intruder - return(obj1) -} -oolong_test <- genius_word(genius_topic(oolong_test)) -oolong_test$.__enclos_env__$private$test_content$ti$answer[2] <- sample(oolong_test$.__enclos_env__$private$test_content$ti$candidates[[2]], 1) -oolong_test$lock() -``` - -```{r topic_res} -oolong_test -``` - -### Suggested workflow - -The test makes more sense if more than one coder is involved. A suggested workflow is to create the test, then clone the oolong object. Ask multiple coders to do the test(s) and then summarize the results. - -Preprocess and create a document-feature matrix - -```{r, eval = FALSE} -dfm(abstracts$text, tolower = TRUE, stem = TRUE, remove = stopwords('english'), remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE, remove_hyphens = TRUE) %>% dfm_trim(min_docfreq = 3, max_docfreq = 500) %>% dfm_select(min_nchar = 3, pattern = "^[a-zA-Z]+$", valuetype = "regex") -> abstracts_dfm -``` - -Train a topic model. - -```{r step0, eval = FALSE} -require(keyATM) -abstracts_keyatm <- keyATM(keyATM_read(abstracts_dfm), no_keyword_topics = 0, keywords = abstracts_dictionary, model = "base", options = list(seed = 46709394)) -``` - -Create a new oolong object. - -```{r step1} -oolong_test_rater1 <- witi(abstracts_keyatm, abstracts$text, userid = "Yihui") -``` - -Clone the oolong object to be used by other raters. - -```{r step2} -oolong_test_rater2 <- clone_oolong(oolong_test_rater1, userid = "Jenny") -``` - -Ask different coders to code each object and then lock the object. - -```{r, eval = FALSE} -## Let Yihui do the test. -oolong_test_rater1$do_word_intrusion_test() -oolong_test_rater1$do_topic_intrusion_test() -oolong_test_rater1$lock() - -## Let Jenny do the test. -oolong_test_rater2$do_word_intrusion_test() -oolong_test_rater2$do_topic_intrusion_test() -oolong_test_rater2$lock() -``` - -```{r, include = FALSE} -### Mock this process -set.seed(46709394) -oolong_test_rater1 <- oolong:::.monkey_test(oolong_test_rater1, intelligent = 0.3) -oolong_test_rater2 <- oolong:::.monkey_test(oolong_test_rater2, intelligent = 0) -oolong_test_rater1$lock() -oolong_test_rater2$lock() -``` - -Get a summary of the two objects. - -```{r, step3} -summarize_oolong(oolong_test_rater1, oolong_test_rater2) -``` - -### About the p-values - -The test for model precision (MP) is based on an one-tailed, one-sample binomial test for each rater. In a multiple-rater situation, the p-values from all raters are combined using the Fisher's method (a.k.a. Fisher's omnibus test). - -H0: MP is not better than 1/ (n\_top\_terms + 1) - -H1: MP is better than 1/ (n\_top\_terms + 1) - - -The test for the median of TLO is based on a permutation test. - -H0: Median TLO is not better than random guess. - -H1: Median TLO is better than random guess. - -One must notice that the two statistical tests are testing the bear minimum. A significant test only indicates the topic model can make the rater(s) perform better than random guess. It is not an indication of good topic interpretability. Also, one should use a very conservative significant level, e.g. $\alpha < 0.001$. - -## About Biterm Topic Model - -Please refer to the vignette about BTM. - -## About Naive Bayes - -Naive Bayes model is a supervised machine learning model. This package supports Naive Bayes models trained using `quanteda.textmodels`. - -Suppose `newsgroup_nb` is a Naive Bayes model trained on a subset of the classic [20 newsgroups] dataset. - -```r -tokens(newsgroup5$text, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, spilit_hyphens = TRUE) %>% tokens_wordstem %>% tokens_remove(stopwords("en")) %>% dfm(tolower = TRUE) %>% dfm_trim(min_termfreq = 3, max_docfreq = 0.06, docfreq_type = "prop") -> newsgroup_dfm -docvars(newsgroup_dfm, "group") <- newsgroup5$title -newsgroup_nb <- textmodel_nb(newsgroup_dfm, docvars(newsgroup_dfm, "group"), distribution = "Bernoulli") -``` - -You can still generate word intrusion and word set intrusion tests. - -```{r} -wi(newsgroup_nb) -``` - -```{r} -wsi(newsgroup_nb) -``` - -## Validating Dictionary-based Methods - -### Creating gold standard - -`trump2k` is a dataset of 2,000 tweets from \@realdonaldtrump. - -```{r trump2k} -tibble(text = trump2k) -``` - -For example, you are interested in studying the sentiment of these tweets. One can use tools such as AFINN to automatically extract sentiment in these tweets. However, oolong recommends to generate gold standard by human coding first using a subset. By default, oolong selects 1% of the origin corpus as test cases. The parameter `construct` should be an adjective, e.g. positive, liberal, populistic, etc. - -```{r goldstandard} -oolong_test <- gs(input_corpus = trump2k, construct = "positive", userid = "Joe") -oolong_test -``` - -As instructed, use the method `$do_gold_standard_test()` to start coding. - -```{r, eval = FALSE} -oolong_test$do_gold_standard_test() -``` - -After the coding, you need to first lock the test and then the `$turn_gold()` method is available. - -```{r, include = FALSE} -oolong_test$.__enclos_env__$private$test_content$gs <- -structure(list(case = 1:20, text = c("Thank you Eau Claire, Wisconsin. \n#VoteTrump on Tuesday, April 5th!\nMAKE AMERICA GREAT AGAIN! https://t.co/JI5JqwHnMC", -"\"@bobby990r_1: @realDonaldTrump would lead polls the second he announces candidacy! America is waiting for him to LEAD us out of this mess!", -"\"@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump My money is on Kenya getting fired first.\"", -"Thank you for a great afternoon Birmingham, Alabama! #Trump2016 #MakeAmericaGreatAgain https://t.co/FrOkqCzBoD", -"\"@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 http://t.co/UlQWGKUrCJ\"", -"People believe CNN these days almost as little as they believe Hillary....that's really saying something!", -"It was great being in Michigan. Remember, I am the only presidential candidate who will bring jobs back to the U.S.and protect car industry!", -"\"@DomineekSmith: @realDonaldTrump is the best Republican presidential candidate of all time.\" Thank you.", -"Word is that little Morty Zuckerman’s @NYDailyNews loses more than $50 million per year---can that be possible?", -"\"@Chevy_Mama: @realDonaldTrump I'm obsessed with @celebrityapprenticeNBC. Honestly, Mr Trump, you are very inspiring\"", -"President Obama said \"ISIL continues to shrink\" in an interview just hours before the horrible attack in Paris. He is just so bad! CHANGE.", -".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It is time to #DrainTheSwamp! Debates https://t.co/3Mz4T7qTTR", -"\"@jerrimoore: @realDonaldTrump awesome. A treat to get to see the brilliant Joan Rivers once more #icon\"", -"\"@shoegoddesss: @realDonaldTrump Will definitely vote for you. Breath of fresh air. America needs you!\"", -"Ted is the ultimate hypocrite. Says one thing for money, does another for votes. \nhttps://t.co/hxdfy0mjVw", -"\"@Lisa_Milicaj: Truth be told, I never heard of The National Review until they \"tried\" to declare war on you. No worries, you got my vote!\"", -"THANK YOU Daytona Beach, Florida!\n#MakeAmericaGreatAgain https://t.co/IAcLfXe463", -"People rarely say that many conservatives didn’t vote for Mitt Romney. If I can get them to vote for me, we win in a landslide.", -"Trump National Golf Club, Washington, D.C. is on 600 beautiful acres fronting the Potomac River. A fantastic setting! http://t.co/pYtkbyKwt5", -"\"@DRUDGE_REPORT: REUTERS 5-DAY ROLLING POLL: TRUMP 34%, CARSON 19.6%, RUBIO 9.7%, CRUZ 7.7%...\" Thank you - a great honor!" -), answer = c(4L, 4L, 2L, 5L, 3L, 2L, 4L, 5L, 2L, 4L, 1L, 1L, -4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L), target_value = c(NA, NA, NA, -NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame" -)) -``` - -```{r gs_locking} -oolong_test$lock() -oolong_test -``` - -### Example: Validating AFINN using the gold standard - -A locked oolong test can be converted into a quanteda-compatible corpus for further analysis. The corpus contains two `docvars`, 'answer'. - -```{r} -oolong_test$turn_gold() -``` - -In this example, we calculate the AFINN score for each tweet using quanteda. The dictionary `afinn` is bundle with this package. - -```{r} -gold_standard <- oolong_test$turn_gold() -dfm(gold_standard, remove_punct = TRUE) %>% dfm_lookup(afinn) %>% quanteda::convert(to = "data.frame") %>% - mutate(matching_word_valence = (neg5 * -5) + (neg4 * -4) + (neg3 * -3) + (neg2 * -2) + (neg1 * -1) - + (zero * 0) + (pos1 * 1) + (pos2 * 2) + (pos3 * 3) + (pos4 * 4) + (pos5 * 5), - base = ntoken(gold_standard, remove_punct = TRUE), afinn_score = matching_word_valence / base) %>% - pull(afinn_score) -> all_afinn_score -all_afinn_score -``` - -Put back the vector of AFINN score into the respective `docvars` and study the correlation between the gold standard and AFINN. - -```{r} -summarize_oolong(oolong_test, target_value = all_afinn_score) -``` - -### Suggested workflow - -Create an oolong object, clone it for another coder. According to Song et al. (2020), you should at least draw 1% of your data. - -```{r} -trump <- gs(input_corpus = trump2k, exact_n = 40, userid = "JJ") -trump2 <- clone_oolong(trump, userid = "Winston") -``` - -Instruct two coders to code the tweets and lock the objects. - -```{r, eval = FALSE} -trump$do_gold_standard_test() -trump2$do_gold_standard_test() -trump$lock() -trump2$lock() -``` - -```{r, include = FALSE} -trump$.__enclos_env__$private$test_content$gs <- -structure(list(case = 1:20, text = c("Thank you Eau Claire, Wisconsin. \n#VoteTrump on Tuesday, April 5th!\nMAKE AMERICA GREAT AGAIN! https://t.co/JI5JqwHnMC", -"\"@bobby990r_1: @realDonaldTrump would lead polls the second he announces candidacy! America is waiting for him to LEAD us out of this mess!", -"\"@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump My money is on Kenya getting fired first.\"", -"Thank you for a great afternoon Birmingham, Alabama! #Trump2016 #MakeAmericaGreatAgain https://t.co/FrOkqCzBoD", -"\"@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 http://t.co/UlQWGKUrCJ\"", -"People believe CNN these days almost as little as they believe Hillary....that's really saying something!", -"It was great being in Michigan. Remember, I am the only presidential candidate who will bring jobs back to the U.S.and protect car industry!", -"\"@DomineekSmith: @realDonaldTrump is the best Republican presidential candidate of all time.\" Thank you.", -"Word is that little Morty Zuckerman’s @NYDailyNews loses more than $50 million per year---can that be possible?", -"\"@Chevy_Mama: @realDonaldTrump I'm obsessed with @celebrityapprenticeNBC. Honestly, Mr Trump, you are very inspiring\"", -"President Obama said \"ISIL continues to shrink\" in an interview just hours before the horrible attack in Paris. He is just so bad! CHANGE.", -".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It is time to #DrainTheSwamp! Debates https://t.co/3Mz4T7qTTR", -"\"@jerrimoore: @realDonaldTrump awesome. A treat to get to see the brilliant Joan Rivers once more #icon\"", -"\"@shoegoddesss: @realDonaldTrump Will definitely vote for you. Breath of fresh air. America needs you!\"", -"Ted is the ultimate hypocrite. Says one thing for money, does another for votes. \nhttps://t.co/hxdfy0mjVw", -"\"@Lisa_Milicaj: Truth be told, I never heard of The National Review until they \"tried\" to declare war on you. No worries, you got my vote!\"", -"THANK YOU Daytona Beach, Florida!\n#MakeAmericaGreatAgain https://t.co/IAcLfXe463", -"People rarely say that many conservatives didn’t vote for Mitt Romney. If I can get them to vote for me, we win in a landslide.", -"Trump National Golf Club, Washington, D.C. is on 600 beautiful acres fronting the Potomac River. A fantastic setting! http://t.co/pYtkbyKwt5", -"\"@DRUDGE_REPORT: REUTERS 5-DAY ROLLING POLL: TRUMP 34%, CARSON 19.6%, RUBIO 9.7%, CRUZ 7.7%...\" Thank you - a great honor!" -), answer = c(4L, 4L, 2L, 5L, 3L, 2L, 4L, 5L, 2L, 4L, 1L, 1L, -4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L), target_value = c(NA, NA, NA, -NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame" - )) - -trump2$.__enclos_env__$private$test_content$gs <- -structure(list(case = 1:20, text = c("Thank you Eau Claire, Wisconsin. \n#VoteTrump on Tuesday, April 5th!\nMAKE AMERICA GREAT AGAIN! https://t.co/JI5JqwHnMC", -"\"@bobby990r_1: @realDonaldTrump would lead polls the second he announces candidacy! America is waiting for him to LEAD us out of this mess!", -"\"@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump My money is on Kenya getting fired first.\"", -"Thank you for a great afternoon Birmingham, Alabama! #Trump2016 #MakeAmericaGreatAgain https://t.co/FrOkqCzBoD", -"\"@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 http://t.co/UlQWGKUrCJ\"", -"People believe CNN these days almost as little as they believe Hillary....that's really saying something!", -"It was great being in Michigan. Remember, I am the only presidential candidate who will bring jobs back to the U.S.and protect car industry!", -"\"@DomineekSmith: @realDonaldTrump is the best Republican presidential candidate of all time.\" Thank you.", -"Word is that little Morty Zuckerman’s @NYDailyNews loses more than $50 million per year---can that be possible?", -"\"@Chevy_Mama: @realDonaldTrump I'm obsessed with @celebrityapprenticeNBC. Honestly, Mr Trump, you are very inspiring\"", -"President Obama said \"ISIL continues to shrink\" in an interview just hours before the horrible attack in Paris. He is just so bad! CHANGE.", -".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It is time to #DrainTheSwamp! Debates https://t.co/3Mz4T7qTTR", -"\"@jerrimoore: @realDonaldTrump awesome. A treat to get to see the brilliant Joan Rivers once more #icon\"", -"\"@shoegoddesss: @realDonaldTrump Will definitely vote for you. Breath of fresh air. America needs you!\"", -"Ted is the ultimate hypocrite. Says one thing for money, does another for votes. \nhttps://t.co/hxdfy0mjVw", -"\"@Lisa_Milicaj: Truth be told, I never heard of The National Review until they \"tried\" to declare war on you. No worries, you got my vote!\"", -"THANK YOU Daytona Beach, Florida!\n#MakeAmericaGreatAgain https://t.co/IAcLfXe463", -"People rarely say that many conservatives didn’t vote for Mitt Romney. If I can get them to vote for me, we win in a landslide.", -"Trump National Golf Club, Washington, D.C. is on 600 beautiful acres fronting the Potomac River. A fantastic setting! http://t.co/pYtkbyKwt5", -"\"@DRUDGE_REPORT: REUTERS 5-DAY ROLLING POLL: TRUMP 34%, CARSON 19.6%, RUBIO 9.7%, CRUZ 7.7%...\" Thank you - a great honor!" -), answer = c(5L, 3L, 2L, 5L, 3L, 1L, 4L, 5L, 2L, 4L, 1L, 1L, -4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L), target_value = c(NA, NA, NA, -NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame" - )) -trump$lock() -trump2$lock() -``` - -Calculate the target value (in this case, the AFINN score) by turning one object into a corpus. - -```{r} -gold_standard <- trump$turn_gold() -dfm(gold_standard, remove_punct = TRUE) %>% dfm_lookup(afinn) %>% quanteda::convert(to = "data.frame") %>% - mutate(matching_word_valence = (neg5 * -5) + (neg4 * -4) + (neg3 * -3) + (neg2 * -2) + (neg1 * -1) - + (zero * 0) + (pos1 * 1) + (pos2 * 2) + (pos3 * 3) + (pos4 * 4) + (pos5 * 5), - base = ntoken(gold_standard, remove_punct = TRUE), afinn_score = matching_word_valence / base) %>% - pull(afinn_score) -> target_value -``` - -Summarize all oolong objects with the target value. - -```{r} -res <- summarize_oolong(trump, trump2, target_value = target_value) -``` - -Read the results. The diagnostic plot consists of 4 subplots. It is a good idea to read Bland & Altman (1986) on the difference between correlation and agreement. - -* Subplot (top left): Raw correlation between human judgement and target value. One should want to have a good correlation between the two. -* Subplot (top right): Bland-Altman plot. One should want to have no correlation. Also, the dots should be randomly scattering around the mean value. If it is so, the two measurements (human judgement and target value) are in good agreement. -* Subplot (bottom left): Raw correlation between target value and content length. One should want to have no correlation, as an indication of good reliability against the influence of content length. (See Chan et al. 2020) -* Subplot (bottom right): Cook's distance of all data point. One should want to have no dot (or at least very few dots) above the threshold. It is an indication of how the raw correlation between human judgement and target value can or cannot be influenced by extreme values in your data. - -The textual output contains the Krippendorff's alpha of the codings by your raters. In order to claim validity of your target value, you must first establish the reliability of your gold standard. Song et al. (2020) suggest Krippendorff's Alpha > 0.7 as an acceptable cut-off. - -```{r} -res -``` - -```{r diagnosis} -plot(res) -``` - -## Backward compatibility - -Historically, oolong test objects could only be generated with only one function: `create_oolong`. It is no longer the case and no longer recommended anymore. It is still retained for backward compatibility purposes. If you still need to use `create_oolong()`, the most important parameters are `input_model` and `input_corpus`. Setting each of them to `NULL` generates different tests. - -| input\_model | input\_corpus | output | -|--------------|:-------------:|---------------------------------------------------------------------------------------------------------------------------------------------| -| Not NULL | NULL | oolong test for validating a topic model with [word intrusion test](#word-intrusion-test) | -| Not NULL | Not NULL | oolong test for validating a topic model with [word intrusion test](#word-intrusion-test) and [topic intrusion test](#topic-intrusion-test) | -| NULL | Not NULL | oolong test for [creating gold standard](#creating-gold-standard) | -| NULL | NULL | error | - - -## References - -1. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288-296). [link](https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models) -2. Ying, L., Montgomery, J. M., & Stewart, B. M. (2021). Inferring concepts from topics: Towards procedures for validating topics as measures. Political Analysis. [link](https://doi.org/10.1017/pan.2021.33) -3. Song et al. (2020) In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication. [link](https://doi.org/10.1080/10584609.2020.1723752) -4. Bland, J. M., & Altman, D. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The lancet, 327(8476), 307-310. -5. Chan et al. (2020) Four best practices for measuring news sentiment using β€˜off-the-shelf’ dictionaries: a large-scale p-hacking experiment. Computational Communication Research. [link](https://osf.io/preprints/socarxiv/np5wa/) -6. Nielsen, F. Γ…. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903. [link](https://arxiv.org/abs/1103.2903) - ---- diff --git a/overview_gh.md b/overview_gh.md deleted file mode 100644 index b641483..0000000 --- a/overview_gh.md +++ /dev/null @@ -1,645 +0,0 @@ -Overview -================ -Chung-hong Chan - -The validation test is called β€œoolong test” (for reading tea leaves). -This package provides several functions for generating different types -of oolong test. - -| function | purpose | -| -------: | :-------------------------------------------------------------------------------------------------------------------------------- | -| `wi()` | validating a topic model with [word intrusion test](#word-intrusion-test) (Chang et al., 2008) | -| `ti()` | validating a topic model with [topic intrusion test](#topic-intrusion-test) (Chang et al., 2008; aka β€œT8WSI” in Ying et al.Β 2021) | -| `witi()` | validating a topic model with [word intrusion test](#word-intrusion-test) and [topic intrusion test](#topic-intrusion-test) | -| `wsi()` | validating a topic model with [word set intrusion test](#word-set-intrusion-test) (Ying et al.Β 2021) | -| `gs()` | oolong test for [creating gold standard](#creating-gold-standard) (see Song et al., 2020) | - -All of these tests can also be generated with the function -[`create_oolong`](#backward-compatibility). As of version 0.3.20, it is -no longer recommended. - -## Installation - -Because the package is constantly changing, we suggest using the -development version from [GitHub](https://github.com/): - -``` r -# install.packages("devtools") -devtools::install_github("chainsawriot/oolong") -``` - -You can also install the β€œstable” (but slightly older) version from -CRAN: - -``` r -install.packages("oolong") -``` - -## Validating Topic Models - -#### Word intrusion test - -`abstracts_keyatm` is an example topic model trained with the data -`abstracts` using the `keyATM` package. Currently, this package supports -structural topic models / correlated topic models from `stm`, Warp LDA -models from `text2vec` , LDA/CTM models from `topicmodels`, Biterm Topic -Models from `BTM`, Keyword Assisted Topic Models from `keyATM`, and -seeded LDA models from `seededlda`. Although not strictly a topic model, -Naive Bayes models from `quanteda.textmodels` are also supported. See -the section on [Naive Bayes](#about-naive-bayes) for more information. - -``` r -library(oolong) -library(keyATM) -#> keyATM 0.5.0 successfully loaded. -#> Papers, examples, resources, and other materials are at -#> https://keyatm.github.io/keyATM/ -library(quanteda) -#> Package version: 3.3.1 -#> Unicode version: 14.0 -#> ICU version: 70.1 -#> Parallel computing: 8 of 8 threads used. -#> See https://quanteda.io for tutorials and examples. -library(dplyr) -#> -#> Attaching package: 'dplyr' -#> The following objects are masked from 'package:stats': -#> -#> filter, lag -#> The following objects are masked from 'package:base': -#> -#> intersect, setdiff, setequal, union -``` - -``` r -abstracts_keyatm -#> keyATM_output object for the base model. -``` - -To create an oolong test with word intrusion test, use the function -`wi`. It is recommended to provide a user id of coder who are going to -be doing the test. - -``` r -oolong_test <- wi(abstracts_keyatm, userid = "Hadley") -oolong_test -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ” WI βœ– TI βœ– WSI -#> ☺ Hadley -#> β„Ή WI: k = 10, 0 coded. -#> -#> ── Methods ── -#> -#> β€’ <$do_word_intrusion_test()>: do word intrusion test -#> β€’ <$lock()>: finalize and see the results -``` - -As instructed, use the method `$do_word_intrusion_test()` to start -coding. - -``` r -oolong_test$do_word_intrusion_test() -``` - -You can pause the test by clicking the β€œExit” button. Your progress will -be recorded in the object. If you want to save your progress, just save -the object (e.g.Β `saveRDS(oolong_test, "oolong_test.RDS")`). To resume -the test, launch the test again. - -After the coding (all items are coded), you need to press the β€œExit” -button to quit the coding interface and then lock the test. Then, you -can look at the model precision by printing the oolong test. - -``` r -oolong_test$lock() -oolong_test -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ” WI βœ– TI βœ– WSI -#> ☺ Hadley -#> β„Ή WI: k = 10, 10 coded. -#> -#> ── Results: ── -#> -#> β„Ή 90% precision -``` - -#### Word set intrusion test - -Word set intrusion test is a variant of word intrusion test (Ying et -al., 2021), in which multiple word sets generated from top terms of one -topic are juxtaposed with one intruder word set generated similarly from -another topic. In Ying et al., this test is called β€œR4WSI” because 4 -word sets are displayed. By default, oolong generates also R4WSI. -However, it is also possible to generate R(N)WSI by setting the -parameter `n_correct_ws` to N - 1. - -``` r -oolong_test <- wsi(abstracts_keyatm, userid = "Garrett") -oolong_test -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ– WI βœ– TI βœ” WSI -#> ☺ Garrett -#> β„Ή WSI: n = 10, 0 coded. -#> -#> ── Methods ── -#> -#> β€’ <$do_word_set_intrusion_test()>: do word set intrusion test -#> β€’ <$lock()>: finalize and see the results -``` - -Use the method `$do_word_set_intrusion_test()` to start coding. - -``` r -oolong_test$do_word_set_intrusion_test() -``` - -``` r -oolong_test$lock() -oolong_test -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ– WI βœ– TI βœ” WSI -#> ☺ Garrett -#> β„Ή WSI: n = 10, 10 coded. -#> -#> ── Results: ── -#> -#> β„Ή 90% precision (WSI) -``` - -#### Topic intrusion test - -For example, `abstracts_keyatm` was generated with the corpus -`abstracts$text` - -``` r -library(tibble) -abstracts -#> # A tibble: 2,500 Γ— 1 -#> text -#> -#> 1 This study explores the benefits and risks featured in medical tourism broke… -#> 2 This article puts forth the argument that with the transfer of stock trading… -#> 3 The purpose of this study was to evaluate the effect the visual fidelity of … -#> 4 Among the many health issues relevant to college students, overconsumption o… -#> 5 This address, delivered at ICA's 50th anniversary conference, calls on the a… -#> 6 The Internet has often been used to reach men who have sex with men (MSMs) i… -#> 7 This article argues that the literature describing the internet revolution i… -#> 8 This research study examined Bud Goodall's online health narrative as a case… -#> 9 Information technology and new media allow for collecting and sharing person… -#> 10 Using a national, telephone survey of 1,762 adolescents aged 12-17 years, th… -#> # β„Ή 2,490 more rows -``` - -Creating the oolong test object with the corpus used for training the -topic model will generate topic intrusion test cases. - -``` r -oolong_test <- ti(abstracts_keyatm, abstracts$text, userid = "Julia") -oolong_test -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ– WI βœ” TI βœ– WSI -#> ☺ Julia -#> β„Ή TI: n = 25, 0 coded. -#> -#> ── Methods ── -#> -#> β€’ <$do_topic_intrusion_test()>: do topic intrusion test -#> β€’ <$lock()>: finalize and see the results -``` - -Similarly, use the `$do_topic_intrusion_test` to code the test cases, -lock the test with `$lock()` and then you can look at the TLO (topic log -odds) value by printing the oolong test. - -``` r -oolong_test$do_topic_intrusion_test() -oolong_test$lock() -``` - -``` r -oolong_test -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ– WI βœ” TI βœ– WSI -#> ☺ Julia -#> β„Ή TI: n = 25, 25 coded. -#> -#> ── Results: ── -#> -#> β„Ή TLO: -0.009 -``` - -### Suggested workflow - -The test makes more sense if more than one coder is involved. A -suggested workflow is to create the test, then clone the oolong object. -Ask multiple coders to do the test(s) and then summarize the results. - -Preprocess and create a document-feature matrix - -``` r -dfm(abstracts$text, tolower = TRUE, stem = TRUE, remove = stopwords('english'), remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE, remove_hyphens = TRUE) %>% dfm_trim(min_docfreq = 3, max_docfreq = 500) %>% dfm_select(min_nchar = 3, pattern = "^[a-zA-Z]+$", valuetype = "regex") -> abstracts_dfm -``` - -Train a topic model. - -``` r -require(keyATM) -abstracts_keyatm <- keyATM(keyATM_read(abstracts_dfm), no_keyword_topics = 0, keywords = abstracts_dictionary, model = "base", options = list(seed = 46709394)) -``` - -Create a new oolong object. - -``` r -oolong_test_rater1 <- witi(abstracts_keyatm, abstracts$text, userid = "Yihui") -``` - -Clone the oolong object to be used by other raters. - -``` r -oolong_test_rater2 <- clone_oolong(oolong_test_rater1, userid = "Jenny") -``` - -Ask different coders to code each object and then lock the object. - -``` r -## Let Yihui do the test. -oolong_test_rater1$do_word_intrusion_test() -oolong_test_rater1$do_topic_intrusion_test() -oolong_test_rater1$lock() - -## Let Jenny do the test. -oolong_test_rater2$do_word_intrusion_test() -oolong_test_rater2$do_topic_intrusion_test() -oolong_test_rater2$lock() -``` - -Get a summary of the two objects. - -``` r -summarize_oolong(oolong_test_rater1, oolong_test_rater2) -#> -#> ── Summary (topic model): ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> -#> ── Word intrusion test ── -#> -#> β„Ή Mean model precision: 0.3 -#> β„Ή Quantiles of model precision: 0.1, 0.2, 0.3, 0.4, 0.5 -#> β„Ή P-value of the model precision -#> (H0: Model precision is not better than random guess): 0.0693 -#> β„Ή Krippendorff's alpha: 0.095 -#> β„Ή K Precision: -#> 0, 0.5, 1, 0, 0, 0.5, 0, 0.5, 0, 0.5 -#> -#> ── Topic intrusion test ── -#> -#> β„Ή Mean TLO: -3.2 -#> β„Ή Median TLO: -4.07 -#> β„Ή Quantiles of TLO: -8.26, -6.09, -4.07, 0, 0 -#> β„Ή P-Value of the median TLO -#> (H0: Median TLO is not better than random guess): 0.096 -``` - -### About the p-values - -The test for model precision (MP) is based on an one-tailed, one-sample -binomial test for each rater. In a multiple-rater situation, the -p-values from all raters are combined using the Fisher’s method (a.k.a. -Fisher’s omnibus test). - -H0: MP is not better than 1/ (n\_top\_terms + 1) - -H1: MP is better than 1/ (n\_top\_terms + 1) - -The test for the median of TLO is based on a permutation test. - -H0: Median TLO is not better than random guess. - -H1: Median TLO is better than random guess. - -One must notice that the two statistical tests are testing the bear -minimum. A significant test only indicates the topic model can make the -rater(s) perform better than random guess. It is not an indication of -good topic interpretability. Also, one should use a very conservative -significant level, e.g.Β ![\\alpha -\< 0.001](https://latex.codecogs.com/png.image?%5Cdpi%7B110%7D&space;%5Cbg_white&space;%5Calpha%20%3C%200.001 -"\\alpha \< 0.001"). - -## About Biterm Topic Model - -Please refer to the vignette about BTM. - -## About Naive Bayes - -Naive Bayes model is a supervised machine learning model. This package -supports Naive Bayes models trained using `quanteda.textmodels`. - -Suppose `newsgroup_nb` is a Naive Bayes model trained on a subset of the -classic \[20 newsgroups\] dataset. - -``` r -tokens(newsgroup5$text, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, spilit_hyphens = TRUE) %>% tokens_wordstem %>% tokens_remove(stopwords("en")) %>% dfm(tolower = TRUE) %>% dfm_trim(min_termfreq = 3, max_docfreq = 0.06, docfreq_type = "prop") -> newsgroup_dfm -docvars(newsgroup_dfm, "group") <- newsgroup5$title -newsgroup_nb <- textmodel_nb(newsgroup_dfm, docvars(newsgroup_dfm, "group"), distribution = "Bernoulli") -``` - -You can still generate word intrusion and word set intrusion tests. - -``` r -wi(newsgroup_nb) -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ” WI βœ– TI βœ– WSI -#> β„Ή WI: k = 20, 0 coded. -#> -#> ── Methods ── -#> -#> β€’ <$do_word_intrusion_test()>: do word intrusion test -#> β€’ <$lock()>: finalize and see the results -``` - -``` r -wsi(newsgroup_nb) -#> -#> ── oolong (topic model) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> βœ– WI βœ– TI βœ” WSI -#> β„Ή WSI: n = 20, 0 coded. -#> -#> ── Methods ── -#> -#> β€’ <$do_word_set_intrusion_test()>: do word set intrusion test -#> β€’ <$lock()>: finalize and see the results -``` - -## Validating Dictionary-based Methods - -### Creating gold standard - -`trump2k` is a dataset of 2,000 tweets from @realdonaldtrump. - -``` r -tibble(text = trump2k) -#> # A tibble: 2,000 Γ— 1 -#> text -#> -#> 1 "In just out book, Secret Service Agent Gary Byrne doesn't believe that Croo… -#> 2 "Hillary Clinton has announced that she is letting her husband out to campai… -#> 3 "\"@TheBrodyFile: Always great to visit with @TheBrodyFile one-on-one with \… -#> 4 "Explain to @brithume and @megynkelly, who know nothing, that I will beat Hi… -#> 5 "Nobody beats me on National Security. https://t.co/sCrj4Ha1I5" -#> 6 "\"@realbill2016: @realDonaldTrump @Brainykid2010 @shl Trump leading LA Time… -#> 7 "\"@teapartynews: Trump Wins Tea Party Group's 'Nashville Straw Poll' - News… -#> 8 "Big Republican Dinner tonight at Mar-a-Lago in Palm Beach. I will be there!" -#> 9 ".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It … -#> 10 "\"@brianstoya: @realDonaldTrump For POTUS #2016\"" -#> # β„Ή 1,990 more rows -``` - -For example, you are interested in studying the sentiment of these -tweets. One can use tools such as AFINN to automatically extract -sentiment in these tweets. However, oolong recommends to generate gold -standard by human coding first using a subset. By default, oolong -selects 1% of the origin corpus as test cases. The parameter `construct` -should be an adjective, e.g.Β positive, liberal, populistic, etc. - -``` r -oolong_test <- gs(input_corpus = trump2k, construct = "positive", userid = "Joe") -oolong_test -#> -#> ── oolong (gold standard generation) ─────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> ☺ Joe -#> β„Ή GS: n = 20, 0 coded. -#> β„Ή Construct: positive. -#> -#> ── Methods ── -#> -#> β€’ <$do_gold_standard_test()>: generate gold standard -#> β€’ <$lock()>: finalize this object and see the results -``` - -As instructed, use the method `$do_gold_standard_test()` to start -coding. - -``` r -oolong_test$do_gold_standard_test() -``` - -After the coding, you need to first lock the test and then the -`$turn_gold()` method is available. - -``` r -oolong_test$lock() -oolong_test -#> -#> ── oolong (gold standard generation) ─────────────────────────────────────────────────────────────────────────────────────────────────────────── -#> ☺ Joe -#> β„Ή GS: n = 20, 20 coded. -#> β„Ή Construct: positive. -#> -#> ── Methods ── -#> -#> β€’ <$turn_gold()>: convert the test results into a quanteda corpus -``` - -### Example: Validating AFINN using the gold standard - -A locked oolong test can be converted into a quanteda-compatible corpus -for further analysis. The corpus contains two `docvars`, β€˜answer’. - -``` r -oolong_test$turn_gold() -#> Corpus consisting of 20 documents and 1 docvar. -#> text1 : -#> "Thank you Eau Claire, Wisconsin. #VoteTrump on Tuesday, Apr..." -#> -#> text2 : -#> ""@bobby990r_1: @realDonaldTrump would lead polls the second ..." -#> -#> text3 : -#> ""@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump..." -#> -#> text4 : -#> "Thank you for a great afternoon Birmingham, Alabama! #Trump2..." -#> -#> text5 : -#> ""@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 ht..." -#> -#> text6 : -#> "People believe CNN these days almost as little as they belie..." -#> -#> [ reached max_ndoc ... 14 more documents ] -#> β„Ή Access the answer from the coding with quanteda::docvars(obj, 'answer') -``` - -In this example, we calculate the AFINN score for each tweet using -quanteda. The dictionary `afinn` is bundle with this package. - -``` r -gold_standard <- oolong_test$turn_gold() -dfm(gold_standard, remove_punct = TRUE) %>% dfm_lookup(afinn) %>% quanteda::convert(to = "data.frame") %>% - mutate(matching_word_valence = (neg5 * -5) + (neg4 * -4) + (neg3 * -3) + (neg2 * -2) + (neg1 * -1) - + (zero * 0) + (pos1 * 1) + (pos2 * 2) + (pos3 * 3) + (pos4 * 4) + (pos5 * 5), - base = ntoken(gold_standard, remove_punct = TRUE), afinn_score = matching_word_valence / base) %>% - pull(afinn_score) -> all_afinn_score -#> Warning: 'dfm.corpus()' is deprecated. Use 'tokens()' first. -#> Warning: '...' should not be used for tokens() arguments; use 'tokens()' first. -all_afinn_score -#> text1 text2 text3 text4 text5 text6 -#> 0.33333333 -0.09090909 -0.16666667 0.45454545 0.00000000 0.00000000 -#> text7 text8 text9 text10 text11 text12 -#> 0.16666667 0.38461538 0.00000000 0.38461538 -0.29166667 0.00000000 -#> text13 text14 text15 text16 text17 text18 -#> 0.50000000 0.07142857 0.00000000 -0.12000000 0.28571429 0.16000000 -#> text19 text20 -#> 0.36842105 0.38888889 -``` - -Put back the vector of AFINN score into the respective `docvars` and -study the correlation between the gold standard and AFINN. - -``` r -summarize_oolong(oolong_test, target_value = all_afinn_score) -#> New names: -#> `geom_smooth()` using formula = 'y ~ x' -#> `geom_smooth()` using formula = 'y ~ x' -#> -#> ── Summary (gold standard generation): -#> ───────────────────────────────────────────────────────────────────────────────────────────────────────── -#> β„Ή Correlation: 0.718 (p = 4e-04) -#> β„Ή Effect of content length: -0.319 (p = 0.1709) -#> β€’ `` -> `...1` -``` - -### Suggested workflow - -Create an oolong object, clone it for another coder. According to Song -et al.Β (2020), you should at least draw 1% of your data. - -``` r -trump <- gs(input_corpus = trump2k, exact_n = 40, userid = "JJ") -trump2 <- clone_oolong(trump, userid = "Winston") -``` - -Instruct two coders to code the tweets and lock the objects. - -``` r -trump$do_gold_standard_test() -trump2$do_gold_standard_test() -trump$lock() -trump2$lock() -``` - -Calculate the target value (in this case, the AFINN score) by turning -one object into a corpus. - -``` r -gold_standard <- trump$turn_gold() -dfm(gold_standard, remove_punct = TRUE) %>% dfm_lookup(afinn) %>% quanteda::convert(to = "data.frame") %>% - mutate(matching_word_valence = (neg5 * -5) + (neg4 * -4) + (neg3 * -3) + (neg2 * -2) + (neg1 * -1) - + (zero * 0) + (pos1 * 1) + (pos2 * 2) + (pos3 * 3) + (pos4 * 4) + (pos5 * 5), - base = ntoken(gold_standard, remove_punct = TRUE), afinn_score = matching_word_valence / base) %>% - pull(afinn_score) -> target_value -#> Warning: 'dfm.corpus()' is deprecated. Use 'tokens()' first. -#> Warning: '...' should not be used for tokens() arguments; use 'tokens()' first. -``` - -Summarize all oolong objects with the target value. - -``` r -res <- summarize_oolong(trump, trump2, target_value = target_value) -#> New names: -#> `geom_smooth()` using formula = 'y ~ x' -#> `geom_smooth()` using formula = 'y ~ x' -#> β€’ `` -> `...1` -#> β€’ `` -> `...2` -``` - -Read the results. The diagnostic plot consists of 4 subplots. It is a -good idea to read Bland & Altman (1986) on the difference between -correlation and agreement. - - - Subplot (top left): Raw correlation between human judgement and - target value. One should want to have a good correlation between the - two. - - Subplot (top right): Bland-Altman plot. One should want to have no - correlation. Also, the dots should be randomly scattering around the - mean value. If it is so, the two measurements (human judgement and - target value) are in good agreement. - - Subplot (bottom left): Raw correlation between target value and - content length. One should want to have no correlation, as an - indication of good reliability against the influence of content - length. (See Chan et al.Β 2020) - - Subplot (bottom right): Cook’s distance of all data point. One - should want to have no dot (or at least very few dots) above the - threshold. It is an indication of how the raw correlation between - human judgement and target value can or cannot be influenced by - extreme values in your data. - -The textual output contains the Krippendorff’s alpha of the codings by -your raters. In order to claim validity of your target value, you must -first establish the reliability of your gold standard. Song et -al.Β (2020) suggest Krippendorff’s Alpha \> 0.7 as an acceptable -cut-off. - -``` r -res -#> -#> ── Summary (gold standard generation): ───────────────────────────────────────────────────────────────────────────────────────────────────────── -#> β„Ή Krippendorff's Alpha: 0.931 -#> β„Ή Correlation: 0.744 (p = 2e-04) -#> β„Ή Effect of content length: -0.319 (p = 0.1709) -``` - -``` r -plot(res) -``` - - - -## Backward compatibility - -Historically, oolong test objects could only be generated with only one -function: `create_oolong`. It is no longer the case and no longer -recommended anymore. It is still retained for backward compatibility -purposes. If you still need to use `create_oolong()`, the most important -parameters are `input_model` and `input_corpus`. Setting each of them to -`NULL` generates different tests. - -| input\_model | input\_corpus | output | -| ------------ | :-----------: | ------------------------------------------------------------------------------------------------------------------------------------------- | -| Not NULL | NULL | oolong test for validating a topic model with [word intrusion test](#word-intrusion-test) | -| Not NULL | Not NULL | oolong test for validating a topic model with [word intrusion test](#word-intrusion-test) and [topic intrusion test](#topic-intrusion-test) | -| NULL | Not NULL | oolong test for [creating gold standard](#creating-gold-standard) | -| NULL | NULL | error | - -## References - -1. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. - (2009). Reading tea leaves: How humans interpret topic models. In - Advances in neural information processing systems (pp.Β 288-296). - [link](https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models) -2. Ying, L., Montgomery, J. M., & Stewart, B. M. (2021). Inferring - concepts from topics: Towards procedures for validating topics as - measures. Political Analysis. - [link](https://doi.org/10.1017/pan.2021.33) -3. Song et al.Β (2020) In validations we trust? The impact of imperfect - human annotations as a gold standard on the quality of validation of - automated content analysis. Political Communication. - [link](https://doi.org/10.1080/10584609.2020.1723752) -4. Bland, J. M., & Altman, D. (1986). Statistical methods for assessing - agreement between two methods of clinical measurement. The lancet, - 327(8476), 307-310. -5. Chan et al.Β (2020) Four best practices for measuring news sentiment - using β€˜off-the-shelf’ dictionaries: a large-scale p-hacking - experiment. Computational Communication Research. - [link](https://osf.io/preprints/socarxiv/np5wa/) -6. Nielsen, F. Γ…. (2011). A new ANEW: Evaluation of a word list for - sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903. - [link](https://arxiv.org/abs/1103.2903) - ------ diff --git a/vig_body.Rmd b/vig_body.Rmd deleted file mode 100644 index 32ec04e..0000000 --- a/vig_body.Rmd +++ /dev/null @@ -1,457 +0,0 @@ -The validation test is called "oolong test" (for reading tea leaves). This package provides several functions for generating different types of oolong test. - -| function | purpose | -|---------:|:----------------------------------------------------------------------------------------------------------------------------------| -| `wi()` | validating a topic model with [word intrusion test](#word-intrusion-test) (Chang et al., 2008) | -| `ti()` | validating a topic model with [topic intrusion test](#topic-intrusion-test) (Chang et al., 2008; aka "T8WSI" in Ying et al. 2021) | -| `witi()` | validating a topic model with [word intrusion test](#word-intrusion-test) and [topic intrusion test](#topic-intrusion-test) | -| `wsi()` | validating a topic model with [word set intrusion test](#word-set-intrusion-test) (Ying et al. 2021) | -| `gs()` | oolong test for [creating gold standard](#creating-gold-standard) (see Song et al., 2020) | - -All of these tests can also be generated with the function [`create_oolong`](#backward-compatibility). As of version 0.3.20, it is no longer recommended. - -## Installation - -Because the package is constantly changing, we suggest using the development version from [GitHub](https://github.com/): - -``` r -# install.packages("devtools") -devtools::install_github("chainsawriot/oolong") -``` - -You can also install the "stable" (but slightly older) version from CRAN: - -```r -install.packages("oolong") -``` - -## Validating Topic Models - -#### Word intrusion test - -`abstracts_keyatm` is an example topic model trained with the data `abstracts` using the `keyATM` package. Currently, this package supports structural topic models / correlated topic models from `stm`, Warp LDA models from `text2vec` , LDA/CTM models from `topicmodels`, Biterm Topic Models from `BTM`, Keyword Assisted Topic Models from `keyATM`, and seeded LDA models from `seededlda`. Although not strictly a topic model, Naive Bayes models from `quanteda.textmodels` are also supported. See the section on [Naive Bayes](#about-naive-bayes) for more information. - -```{r} -library(oolong) -library(keyATM) -library(quanteda) -library(dplyr) -``` - -```{r example} -abstracts_keyatm -``` - -To create an oolong test with word intrusion test, use the function `wi`. It is recommended to provide a user id of coder who are going to be doing the test. - -```{r createtest} -oolong_test <- wi(abstracts_keyatm, userid = "Hadley") -oolong_test -``` - -As instructed, use the method `$do_word_intrusion_test()` to start coding. - -```{r, eval = FALSE} -oolong_test$do_word_intrusion_test() -``` - -You can pause the test by clicking the "Exit" button. Your progress will be recorded in the object. If you want to save your progress, just save the object (e.g. `saveRDS(oolong_test, "oolong_test.RDS")`). To resume the test, launch the test again. - -After the coding (all items are coded), you need to press the "Exit" button to quit the coding interface and then lock the test. Then, you can look at the model precision by printing the oolong test. - -```{r, include = FALSE} -### Mock this process -oolong_test$.__enclos_env__$private$test_content$wi$answer <- oolong_test$.__enclos_env__$private$test_content$wi$intruder -oolong_test$.__enclos_env__$private$test_content$wi$answer[1] <- "wronganswer" -``` - -```{r lock} -oolong_test$lock() -oolong_test -``` - -#### Word set intrusion test - -Word set intrusion test is a variant of word intrusion test (Ying et al., 2021), in which multiple word sets generated from top terms of one topic are juxtaposed with one intruder word set generated similarly from another topic. In Ying et al., this test is called "R4WSI" because 4 word sets are displayed. By default, oolong generates also R4WSI. However, it is also possible to generate R(N)WSI by setting the parameter `n_correct_ws` to N - 1. - -```{r wsi1} -oolong_test <- wsi(abstracts_keyatm, userid = "Garrett") -oolong_test -``` - -Use the method `$do_word_set_intrusion_test()` to start coding. - -```{r wsi2, eval = FALSE} -oolong_test$do_word_set_intrusion_test() -``` - -```{r, include = FALSE} -### Mock this process -oolong_test$.__enclos_env__$private$test_content$wsi$answer <- oolong_test$.__enclos_env__$private$test_content$wsi$intruder -oolong_test$.__enclos_env__$private$test_content$wsi$answer[1] <- "wronganswer" -``` - -```{r wsi3} -oolong_test$lock() -oolong_test -``` - -#### Topic intrusion test - -For example, `abstracts_keyatm` was generated with the corpus `abstracts$text` - -```{r newgroup5} -library(tibble) -abstracts -``` - -Creating the oolong test object with the corpus used for training the topic model will generate topic intrusion test cases. - -```{r createtest2} -oolong_test <- ti(abstracts_keyatm, abstracts$text, userid = "Julia") -oolong_test -``` - -Similarly, use the `$do_topic_intrusion_test` to code the test cases, lock the test with `$lock()` and then you can look at the TLO (topic log odds) value by printing the oolong test. - -```{r, eval = FALSE} -oolong_test$do_topic_intrusion_test() -oolong_test$lock() -``` - -```{r, include = FALSE} -genius_topic <- function(obj1) { - obj1$.__enclos_env__$private$test_content$ti$answer <- obj1$.__enclos_env__$private$test_content$ti$intruder - return(obj1) -} -genius_word <- function(obj1) { - obj1$.__enclos_env__$private$test_content$wi$answer <- obj1$.__enclos_env__$private$test_content$wi$intruder - return(obj1) -} -oolong_test <- genius_word(genius_topic(oolong_test)) -oolong_test$.__enclos_env__$private$test_content$ti$answer[2] <- sample(oolong_test$.__enclos_env__$private$test_content$ti$candidates[[2]], 1) -oolong_test$lock() -``` - -```{r topic_res} -oolong_test -``` - -### Suggested workflow - -The test makes more sense if more than one coder is involved. A suggested workflow is to create the test, then clone the oolong object. Ask multiple coders to do the test(s) and then summarize the results. - -Preprocess and create a document-feature matrix - -```{r, eval = FALSE} -dfm(abstracts$text, tolower = TRUE, stem = TRUE, remove = stopwords('english'), remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE, remove_hyphens = TRUE) %>% dfm_trim(min_docfreq = 3, max_docfreq = 500) %>% dfm_select(min_nchar = 3, pattern = "^[a-zA-Z]+$", valuetype = "regex") -> abstracts_dfm -``` - -Train a topic model. - -```{r step0, eval = FALSE} -require(keyATM) -abstracts_keyatm <- keyATM(keyATM_read(abstracts_dfm), no_keyword_topics = 0, keywords = abstracts_dictionary, model = "base", options = list(seed = 46709394)) -``` - -Create a new oolong object. - -```{r step1} -oolong_test_rater1 <- witi(abstracts_keyatm, abstracts$text, userid = "Yihui") -``` - -Clone the oolong object to be used by other raters. - -```{r step2} -oolong_test_rater2 <- clone_oolong(oolong_test_rater1, userid = "Jenny") -``` - -Ask different coders to code each object and then lock the object. - -```{r, eval = FALSE} -## Let Yihui do the test. -oolong_test_rater1$do_word_intrusion_test() -oolong_test_rater1$do_topic_intrusion_test() -oolong_test_rater1$lock() - -## Let Jenny do the test. -oolong_test_rater2$do_word_intrusion_test() -oolong_test_rater2$do_topic_intrusion_test() -oolong_test_rater2$lock() -``` - -```{r, include = FALSE} -### Mock this process -set.seed(46709394) -oolong_test_rater1 <- oolong:::.monkey_test(oolong_test_rater1, intelligent = 0.3) -oolong_test_rater2 <- oolong:::.monkey_test(oolong_test_rater2, intelligent = 0) -oolong_test_rater1$lock() -oolong_test_rater2$lock() -``` - -Get a summary of the two objects. - -```{r, step3} -summarize_oolong(oolong_test_rater1, oolong_test_rater2) -``` - -### About the p-values - -The test for model precision (MP) is based on an one-tailed, one-sample binomial test for each rater. In a multiple-rater situation, the p-values from all raters are combined using the Fisher's method (a.k.a. Fisher's omnibus test). - -H0: MP is not better than 1/ (n\_top\_terms + 1) - -H1: MP is better than 1/ (n\_top\_terms + 1) - - -The test for the median of TLO is based on a permutation test. - -H0: Median TLO is not better than random guess. - -H1: Median TLO is better than random guess. - -One must notice that the two statistical tests are testing the bear minimum. A significant test only indicates the topic model can make the rater(s) perform better than random guess. It is not an indication of good topic interpretability. Also, one should use a very conservative significant level, e.g. $\alpha < 0.001$. - -## About Biterm Topic Model - -Please refer to the vignette about BTM. - -## About Naive Bayes - -Naive Bayes model is a supervised machine learning model. This package supports Naive Bayes models trained using `quanteda.textmodels`. - -Suppose `newsgroup_nb` is a Naive Bayes model trained on a subset of the classic [20 newsgroups] dataset. - -```r -tokens(newsgroup5$text, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, spilit_hyphens = TRUE) %>% tokens_wordstem %>% tokens_remove(stopwords("en")) %>% dfm(tolower = TRUE) %>% dfm_trim(min_termfreq = 3, max_docfreq = 0.06, docfreq_type = "prop") -> newsgroup_dfm -docvars(newsgroup_dfm, "group") <- newsgroup5$title -newsgroup_nb <- textmodel_nb(newsgroup_dfm, docvars(newsgroup_dfm, "group"), distribution = "Bernoulli") -``` - -You can still generate word intrusion and word set intrusion tests. - -```{r} -wi(newsgroup_nb) -``` - -```{r} -wsi(newsgroup_nb) -``` - -## Validating Dictionary-based Methods - -### Creating gold standard - -`trump2k` is a dataset of 2,000 tweets from \@realdonaldtrump. - -```{r trump2k} -tibble(text = trump2k) -``` - -For example, you are interested in studying the sentiment of these tweets. One can use tools such as AFINN to automatically extract sentiment in these tweets. However, oolong recommends to generate gold standard by human coding first using a subset. By default, oolong selects 1% of the origin corpus as test cases. The parameter `construct` should be an adjective, e.g. positive, liberal, populistic, etc. - -```{r goldstandard} -oolong_test <- gs(input_corpus = trump2k, construct = "positive", userid = "Joe") -oolong_test -``` - -As instructed, use the method `$do_gold_standard_test()` to start coding. - -```{r, eval = FALSE} -oolong_test$do_gold_standard_test() -``` - -After the coding, you need to first lock the test and then the `$turn_gold()` method is available. - -```{r, include = FALSE} -oolong_test$.__enclos_env__$private$test_content$gs <- -structure(list(case = 1:20, text = c("Thank you Eau Claire, Wisconsin. \n#VoteTrump on Tuesday, April 5th!\nMAKE AMERICA GREAT AGAIN! https://t.co/JI5JqwHnMC", -"\"@bobby990r_1: @realDonaldTrump would lead polls the second he announces candidacy! America is waiting for him to LEAD us out of this mess!", -"\"@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump My money is on Kenya getting fired first.\"", -"Thank you for a great afternoon Birmingham, Alabama! #Trump2016 #MakeAmericaGreatAgain https://t.co/FrOkqCzBoD", -"\"@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 http://t.co/UlQWGKUrCJ\"", -"People believe CNN these days almost as little as they believe Hillary....that's really saying something!", -"It was great being in Michigan. Remember, I am the only presidential candidate who will bring jobs back to the U.S.and protect car industry!", -"\"@DomineekSmith: @realDonaldTrump is the best Republican presidential candidate of all time.\" Thank you.", -"Word is that little Morty Zuckerman’s @NYDailyNews loses more than $50 million per year---can that be possible?", -"\"@Chevy_Mama: @realDonaldTrump I'm obsessed with @celebrityapprenticeNBC. Honestly, Mr Trump, you are very inspiring\"", -"President Obama said \"ISIL continues to shrink\" in an interview just hours before the horrible attack in Paris. He is just so bad! CHANGE.", -".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It is time to #DrainTheSwamp! Debates https://t.co/3Mz4T7qTTR", -"\"@jerrimoore: @realDonaldTrump awesome. A treat to get to see the brilliant Joan Rivers once more #icon\"", -"\"@shoegoddesss: @realDonaldTrump Will definitely vote for you. Breath of fresh air. America needs you!\"", -"Ted is the ultimate hypocrite. Says one thing for money, does another for votes. \nhttps://t.co/hxdfy0mjVw", -"\"@Lisa_Milicaj: Truth be told, I never heard of The National Review until they \"tried\" to declare war on you. No worries, you got my vote!\"", -"THANK YOU Daytona Beach, Florida!\n#MakeAmericaGreatAgain https://t.co/IAcLfXe463", -"People rarely say that many conservatives didn’t vote for Mitt Romney. If I can get them to vote for me, we win in a landslide.", -"Trump National Golf Club, Washington, D.C. is on 600 beautiful acres fronting the Potomac River. A fantastic setting! http://t.co/pYtkbyKwt5", -"\"@DRUDGE_REPORT: REUTERS 5-DAY ROLLING POLL: TRUMP 34%, CARSON 19.6%, RUBIO 9.7%, CRUZ 7.7%...\" Thank you - a great honor!" -), answer = c(4L, 4L, 2L, 5L, 3L, 2L, 4L, 5L, 2L, 4L, 1L, 1L, -4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L), target_value = c(NA, NA, NA, -NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame" -)) -``` - -```{r gs_locking} -oolong_test$lock() -oolong_test -``` - -### Example: Validating AFINN using the gold standard - -A locked oolong test can be converted into a quanteda-compatible corpus for further analysis. The corpus contains two `docvars`, 'answer'. - -```{r} -oolong_test$turn_gold() -``` - -In this example, we calculate the AFINN score for each tweet using quanteda. The dictionary `afinn` is bundle with this package. - -```{r} -gold_standard <- oolong_test$turn_gold() -dfm(gold_standard, remove_punct = TRUE) %>% dfm_lookup(afinn) %>% quanteda::convert(to = "data.frame") %>% - mutate(matching_word_valence = (neg5 * -5) + (neg4 * -4) + (neg3 * -3) + (neg2 * -2) + (neg1 * -1) - + (zero * 0) + (pos1 * 1) + (pos2 * 2) + (pos3 * 3) + (pos4 * 4) + (pos5 * 5), - base = ntoken(gold_standard, remove_punct = TRUE), afinn_score = matching_word_valence / base) %>% - pull(afinn_score) -> all_afinn_score -all_afinn_score -``` - -Put back the vector of AFINN score into the respective `docvars` and study the correlation between the gold standard and AFINN. - -```{r} -summarize_oolong(oolong_test, target_value = all_afinn_score) -``` - -### Suggested workflow - -Create an oolong object, clone it for another coder. According to Song et al. (2020), you should at least draw 1% of your data. - -```{r} -trump <- gs(input_corpus = trump2k, exact_n = 40, userid = "JJ") -trump2 <- clone_oolong(trump, userid = "Winston") -``` - -Instruct two coders to code the tweets and lock the objects. - -```{r, eval = FALSE} -trump$do_gold_standard_test() -trump2$do_gold_standard_test() -trump$lock() -trump2$lock() -``` - -```{r, include = FALSE} -trump$.__enclos_env__$private$test_content$gs <- -structure(list(case = 1:20, text = c("Thank you Eau Claire, Wisconsin. \n#VoteTrump on Tuesday, April 5th!\nMAKE AMERICA GREAT AGAIN! https://t.co/JI5JqwHnMC", -"\"@bobby990r_1: @realDonaldTrump would lead polls the second he announces candidacy! America is waiting for him to LEAD us out of this mess!", -"\"@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump My money is on Kenya getting fired first.\"", -"Thank you for a great afternoon Birmingham, Alabama! #Trump2016 #MakeAmericaGreatAgain https://t.co/FrOkqCzBoD", -"\"@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 http://t.co/UlQWGKUrCJ\"", -"People believe CNN these days almost as little as they believe Hillary....that's really saying something!", -"It was great being in Michigan. Remember, I am the only presidential candidate who will bring jobs back to the U.S.and protect car industry!", -"\"@DomineekSmith: @realDonaldTrump is the best Republican presidential candidate of all time.\" Thank you.", -"Word is that little Morty Zuckerman’s @NYDailyNews loses more than $50 million per year---can that be possible?", -"\"@Chevy_Mama: @realDonaldTrump I'm obsessed with @celebrityapprenticeNBC. Honestly, Mr Trump, you are very inspiring\"", -"President Obama said \"ISIL continues to shrink\" in an interview just hours before the horrible attack in Paris. He is just so bad! CHANGE.", -".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It is time to #DrainTheSwamp! Debates https://t.co/3Mz4T7qTTR", -"\"@jerrimoore: @realDonaldTrump awesome. A treat to get to see the brilliant Joan Rivers once more #icon\"", -"\"@shoegoddesss: @realDonaldTrump Will definitely vote for you. Breath of fresh air. America needs you!\"", -"Ted is the ultimate hypocrite. Says one thing for money, does another for votes. \nhttps://t.co/hxdfy0mjVw", -"\"@Lisa_Milicaj: Truth be told, I never heard of The National Review until they \"tried\" to declare war on you. No worries, you got my vote!\"", -"THANK YOU Daytona Beach, Florida!\n#MakeAmericaGreatAgain https://t.co/IAcLfXe463", -"People rarely say that many conservatives didn’t vote for Mitt Romney. If I can get them to vote for me, we win in a landslide.", -"Trump National Golf Club, Washington, D.C. is on 600 beautiful acres fronting the Potomac River. A fantastic setting! http://t.co/pYtkbyKwt5", -"\"@DRUDGE_REPORT: REUTERS 5-DAY ROLLING POLL: TRUMP 34%, CARSON 19.6%, RUBIO 9.7%, CRUZ 7.7%...\" Thank you - a great honor!" -), answer = c(4L, 4L, 2L, 5L, 3L, 2L, 4L, 5L, 2L, 4L, 1L, 1L, -4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L), target_value = c(NA, NA, NA, -NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame" - )) - -trump2$.__enclos_env__$private$test_content$gs <- -structure(list(case = 1:20, text = c("Thank you Eau Claire, Wisconsin. \n#VoteTrump on Tuesday, April 5th!\nMAKE AMERICA GREAT AGAIN! https://t.co/JI5JqwHnMC", -"\"@bobby990r_1: @realDonaldTrump would lead polls the second he announces candidacy! America is waiting for him to LEAD us out of this mess!", -"\"@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump My money is on Kenya getting fired first.\"", -"Thank you for a great afternoon Birmingham, Alabama! #Trump2016 #MakeAmericaGreatAgain https://t.co/FrOkqCzBoD", -"\"@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 http://t.co/UlQWGKUrCJ\"", -"People believe CNN these days almost as little as they believe Hillary....that's really saying something!", -"It was great being in Michigan. Remember, I am the only presidential candidate who will bring jobs back to the U.S.and protect car industry!", -"\"@DomineekSmith: @realDonaldTrump is the best Republican presidential candidate of all time.\" Thank you.", -"Word is that little Morty Zuckerman’s @NYDailyNews loses more than $50 million per year---can that be possible?", -"\"@Chevy_Mama: @realDonaldTrump I'm obsessed with @celebrityapprenticeNBC. Honestly, Mr Trump, you are very inspiring\"", -"President Obama said \"ISIL continues to shrink\" in an interview just hours before the horrible attack in Paris. He is just so bad! CHANGE.", -".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It is time to #DrainTheSwamp! Debates https://t.co/3Mz4T7qTTR", -"\"@jerrimoore: @realDonaldTrump awesome. A treat to get to see the brilliant Joan Rivers once more #icon\"", -"\"@shoegoddesss: @realDonaldTrump Will definitely vote for you. Breath of fresh air. America needs you!\"", -"Ted is the ultimate hypocrite. Says one thing for money, does another for votes. \nhttps://t.co/hxdfy0mjVw", -"\"@Lisa_Milicaj: Truth be told, I never heard of The National Review until they \"tried\" to declare war on you. No worries, you got my vote!\"", -"THANK YOU Daytona Beach, Florida!\n#MakeAmericaGreatAgain https://t.co/IAcLfXe463", -"People rarely say that many conservatives didn’t vote for Mitt Romney. If I can get them to vote for me, we win in a landslide.", -"Trump National Golf Club, Washington, D.C. is on 600 beautiful acres fronting the Potomac River. A fantastic setting! http://t.co/pYtkbyKwt5", -"\"@DRUDGE_REPORT: REUTERS 5-DAY ROLLING POLL: TRUMP 34%, CARSON 19.6%, RUBIO 9.7%, CRUZ 7.7%...\" Thank you - a great honor!" -), answer = c(5L, 3L, 2L, 5L, 3L, 1L, 4L, 5L, 2L, 4L, 1L, 1L, -4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L), target_value = c(NA, NA, NA, -NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame" - )) -trump$lock() -trump2$lock() -``` - -Calculate the target value (in this case, the AFINN score) by turning one object into a corpus. - -```{r} -gold_standard <- trump$turn_gold() -dfm(gold_standard, remove_punct = TRUE) %>% dfm_lookup(afinn) %>% quanteda::convert(to = "data.frame") %>% - mutate(matching_word_valence = (neg5 * -5) + (neg4 * -4) + (neg3 * -3) + (neg2 * -2) + (neg1 * -1) - + (zero * 0) + (pos1 * 1) + (pos2 * 2) + (pos3 * 3) + (pos4 * 4) + (pos5 * 5), - base = ntoken(gold_standard, remove_punct = TRUE), afinn_score = matching_word_valence / base) %>% - pull(afinn_score) -> target_value -``` - -Summarize all oolong objects with the target value. - -```{r} -res <- summarize_oolong(trump, trump2, target_value = target_value) -``` - -Read the results. The diagnostic plot consists of 4 subplots. It is a good idea to read Bland & Altman (1986) on the difference between correlation and agreement. - -* Subplot (top left): Raw correlation between human judgement and target value. One should want to have a good correlation between the two. -* Subplot (top right): Bland-Altman plot. One should want to have no correlation. Also, the dots should be randomly scattering around the mean value. If it is so, the two measurements (human judgement and target value) are in good agreement. -* Subplot (bottom left): Raw correlation between target value and content length. One should want to have no correlation, as an indication of good reliability against the influence of content length. (See Chan et al. 2020) -* Subplot (bottom right): Cook's distance of all data point. One should want to have no dot (or at least very few dots) above the threshold. It is an indication of how the raw correlation between human judgement and target value can or cannot be influenced by extreme values in your data. - -The textual output contains the Krippendorff's alpha of the codings by your raters. In order to claim validity of your target value, you must first establish the reliability of your gold standard. Song et al. (2020) suggest Krippendorff's Alpha > 0.7 as an acceptable cut-off. - -```{r} -res -``` - -```{r diagnosis} -plot(res) -``` - -## Backward compatibility - -Historically, oolong test objects could only be generated with only one function: `create_oolong`. It is no longer the case and no longer recommended anymore. It is still retained for backward compatibility purposes. If you still need to use `create_oolong()`, the most important parameters are `input_model` and `input_corpus`. Setting each of them to `NULL` generates different tests. - -| input\_model | input\_corpus | output | -|--------------|:-------------:|---------------------------------------------------------------------------------------------------------------------------------------------| -| Not NULL | NULL | oolong test for validating a topic model with [word intrusion test](#word-intrusion-test) | -| Not NULL | Not NULL | oolong test for validating a topic model with [word intrusion test](#word-intrusion-test) and [topic intrusion test](#topic-intrusion-test) | -| NULL | Not NULL | oolong test for [creating gold standard](#creating-gold-standard) | -| NULL | NULL | error | - - -## References - -1. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288-296). [link](https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models) -2. Ying, L., Montgomery, J. M., & Stewart, B. M. (2021). Inferring concepts from topics: Towards procedures for validating topics as measures. Political Analysis. [link](https://doi.org/10.1017/pan.2021.33) -3. Song et al. (2020) In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication. [link](https://doi.org/10.1080/10584609.2020.1723752) -4. Bland, J. M., & Altman, D. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The lancet, 327(8476), 307-310. -5. Chan et al. (2020) Four best practices for measuring news sentiment using β€˜off-the-shelf’ dictionaries: a large-scale p-hacking experiment. Computational Communication Research. [link](https://osf.io/preprints/socarxiv/np5wa/) -6. Nielsen, F. Γ…. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903. [link](https://arxiv.org/abs/1103.2903) - ---- diff --git a/vig_head.Rmd b/vig_head.Rmd deleted file mode 100644 index 4b689e7..0000000 --- a/vig_head.Rmd +++ /dev/null @@ -1,18 +0,0 @@ ---- -title: "{title}" -output: rmarkdown::html_vignette -author: - - Chung-hong Chan ^[GESIS] -vignette: > - %\VignetteIndexEntry{{title}} - %\VignetteEngine{knitr::rmarkdown} - %\usepackage[utf8]{inputenc} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>" - ) -set.seed(46709394) -```