trollR - Online Troll Detection using R

LSE Hackathon Challenge: Detecting Online Trolling Behaviour

Data source: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/

Data description

A large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

toxic
severe_toxic
obscene
threat
insult
identity_hate

Usage

To install the package use

# install.packages("devtools")
devtools::install_github("schliebs/trollR")
library(trollR)
library(xgboost)

predict_troll("Hello World - this is an example of trollR - Identifying trolling comments using R")
#> [1] 0.0722369

# take some text
text <- c(
  "I would like to point out that your comment was substandard!",
  "YOU SHOULD DIE!!!!",
  "YOU SHOULD DIE",
  "you should die!!!!",
  "you should die",
  "Go rot in hell",
  "I can also write something non-toxic -- really",
  "COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK",
  "bloody hell, i forgot my purse at the pub yesterday"
)

# and find how likely it is to be trolling?
data_frame(text = text, troll = predict_troll(text)) %>% arrange(-troll)
#> # A tibble: 9 x 2
#>   text                                                          troll
#>   <chr>                                                         <dbl>
#> 1 COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK                 0.972 
#> 2 bloody hell, i forgot my purse at the pub yesterday          0.958 
#> 3 Go rot in hell                                               0.796 
#> 4 you should die!!!!                                           0.729 
#> 5 YOU SHOULD DIE!!!!                                           0.714 
#> 6 YOU SHOULD DIE                                               0.667 
#> 7 you should die                                               0.543 
#> 8 I would like to point out that your comment was substandard! 0.0739
#> 9 I can also write something non-toxic -- really               0.0281

Thats all?

Of course not

run_api()

Or from a terminal

curl "http://localhost:8000/trollR?text=You suck you cocksucker"

{"text":["You suck you cocksucker"],"troll_certainty":[0.9746]}

But wait, there is more

run_shiny()

Understanding the model

# load the model
model <- xgb.load(system.file("xgboost_model.buffer", package = "trollR"))
df <- xgb.importance(mdl_data$model_matrix %>% colnames(), model) %>% as_data_frame()

vars <- c("length", "ncap", "ncap_len", "nsen", "nexcl", "nquest", "npunct", 
          "nword", "nsymb", "nsmile", "nslur")
df %>% 
  arrange(-Gain) %>% 
  top_n(20, Gain) %>% 
  mutate(Feature = reorder(Feature, Gain),
         Vartype = Feature %in% vars) %>% 
  ggplot(aes(x = Feature, y = Gain, fill = Vartype)) + 
  geom_col() +
  coord_flip() +
  labs(y = "Feature Importance in the XGBoost Model", x = "", title = "") +
  theme(axis.text.y = element_text(size = 15, face = "bold")) +
  scale_fill_brewer(palette = "Set1", guide = F)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
R		R
data		data
docs		docs
helpers		helpers
inst		inst
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
trollR.Rproj		trollR.Rproj
trollR.pdf		trollR.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trollR - Online Troll Detection using R

Usage

Thats all?

Understanding the model

About

Releases

Packages

Contributors 2

Languages

schliebs/trollR

Folders and files

Latest commit

History

Repository files navigation

trollR - Online Troll Detection using R

Usage

Thats all?

Understanding the model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages