Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #67 #68

Merged
merged 1 commit into from
Jun 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
^data/abstracts_topicmodels\.rda$
^data/abstracts_seededlda\.rda$
^data/abstracts_unseededlda\.rda$
^data/abstracts_warplda\.rda$
^data/abstracts_stm\.rda$
^tests/testthat/apps/
^tests/testdata/downloaded$
Expand Down
3 changes: 0 additions & 3 deletions R/oolong_data_misc.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,6 @@
#' These are topic models trained with different topic model packages.
"abstracts_keyatm"

#' @rdname abstracts_keyatm
"abstracts_warplda"

#' @rdname abstracts_keyatm
"abstracts_btm"

Expand Down
13 changes: 7 additions & 6 deletions btm_gh.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ require(BTM)
#> Loading required package: BTM
require(quanteda)
#> Loading required package: quanteda
#> Package version: 3.2.0
#> Package version: 3.2.1
#> Unicode version: 13.0
#> ICU version: 66.1
#> Parallel computing: 8 of 8 threads used.
Expand Down Expand Up @@ -63,9 +63,10 @@ trump_btm <- BTM(trump_dat, k = 8, iter = 500, trace = 10)

## Pecularities of BTM

This is how you should generate \(\theta_{t}\) . However, there are many
NaN and there are only 1994 rows (`trump2k` has 2000 tweets) due to
empty documents.
This is how you should generate
![\\theta\_{t}](https://latex.codecogs.com/png.image?%5Cdpi%7B110%7D&space;%5Cbg_white&space;%5Ctheta_%7Bt%7D
"\\theta_{t}") . However, there are many NaN and there are only 1994
rows (`trump2k` has 2000 tweets) due to empty documents.

``` r
theta <- predict(trump_btm, newdata = trump_dat)
Expand Down Expand Up @@ -117,7 +118,7 @@ with other topic models.
oolong <- create_oolong(trump_btm)
oolong
#>
#> ── oolong (topic model) ────────────────────────────────────────────────────────
#> ── oolong (topic model) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ✔ WI ✖ TI ✖ WSI
#> ℹ WI: k = 8, 0 coded.
#>
Expand All @@ -135,7 +136,7 @@ frame you used for training (in this case `trump_dat`). Your
oolong <- create_oolong(trump_btm, trump_corpus, btm_dataframe = trump_dat)
oolong
#>
#> ── oolong (topic model) ────────────────────────────────────────────────────────
#> ── oolong (topic model) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ✔ WI ✔ TI ✖ WSI
#> ℹ WI: k = 8, 0 coded.
#> ℹ TI: n = 20, 0 coded.
Expand Down
4 changes: 2 additions & 2 deletions deploy_gh.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ library(oolong)
wsi_test <- wsi(abstracts_keyatm)
wsi_test
#>
#> ── oolong (topic model) ────────────────────────────────────────────────────────
#> ── oolong (topic model) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ✖ WI ✖ TI ✔ WSI
#> ℹ WSI: n = 10, 0 coded.
#>
Expand Down Expand Up @@ -116,7 +116,7 @@ revert_oolong(wsi_test, "oolong_2021-05-22 20 51 26 Hadley Wickham.RDS")
```

#>
#> ── oolong (topic model) ────────────────────────────────────────────────────────
#> ── oolong (topic model) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ✖ WI ✖ TI ✔ WSI
#> ☺ Hadley Wickham
#> ℹ WSI: n = 10, 10 coded.
Expand Down
5 changes: 0 additions & 5 deletions man/abstracts_keyatm.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 0 additions & 31 deletions overview_gh.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -226,37 +226,6 @@ H1: Median TLO is better than random guess.

One must notice that the two statistical tests are testing the bear minimum. A significant test only indicates the topic model can make the rater(s) perform better than random guess. It is not an indication of good topic interpretability. Also, one should use a very conservative significant level, e.g. $\alpha < 0.001$.

### About Warp LDA

There is a subtle difference between the support for `stm` and for `text2vec`.

`abstracts_warplda` is a Warp LDA object trained with the same dataset as the `abstracts_stm`

```{r warplda}
abstracts_warplda
```

All the API endpoints are the same, except the one for the creation of topic intrusion test cases. You must supply also the `input_dfm`.

```{r warplda2}
### Just word intrusion test.
oolong_test <- wi(abstracts_warplda, userid = "Lionel")
oolong_test
```


```{r warplda3}
abstracts_dfm
```

```{r warplda4, , message = FALSE, results = 'hide', warning = FALSE}
oolong_test <- witi(abstracts_warplda, abstracts$text, input_dfm = abstracts_dfm, userid = "Mara")
```

```{r warplda5}
oolong_test
```

## About Biterm Topic Model

Please refer to the vignette about BTM.
Expand Down
Loading