playful_iconicity_paper.Rmd

---
title: "Playful iconicity: data & analyses"
author: "Mark Dingemanse & Bill Thompson"
date: "(this version: `r format(Sys.Date())`)"
output:
  github_document:
    toc: true
    toc_depth: 3
always_allow_html: yes
editor_options: 
  chunk_output_type: console
---

## Introduction

This code notebook provides a fully reproducible workflow for the paper *Playful iconicity: Structural markedness underlies the relation between funniness and iconicity*. To increase readability, not all code chunks present in the .Rmd source are shown in the output. A separate code notebook has the [supplementary analyses](./playful_iconicity_supplements.md).

```{r global_options, include=F}
knitr::opts_chunk$set(fig.width=8, fig.height=6, fig.path='out/',
                      echo=TRUE, warning=FALSE, message=FALSE)
options(knitr.kable.NA = '') # set NA values in knitr tables as blank

```

```{r preliminaries, include=F}

# Packages and useful functions
list.of.packages <- c("knitr","kableExtra","stringr","tidyverse","GGally","ggthemes","readxl","ggrepel","ppcor","car","cowplot","flextable","effsize")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)>0) install.packages(new.packages)
lapply(list.of.packages, require, character.only=T)
rm(list.of.packages,new.packages)

`%notin%` <- function(x,y) !(x %in% y) 

# with thanks to Bodo Winter:
mean.na <- function(x) mean(x, na.rm = T)
sd.na <- function(x) sd(x, na.rm = T)

# print model outputs using kable
print_table <- function(lm.input) {
  lm.anova <- anova(lm.input) %>%
    mutate(predictor = row.names(.),
           pes = c(`Sum Sq`[-nrow(.)],NA)/(`Sum Sq` + `Sum Sq`[nrow(.)])) %>%
    setNames(., c("df", "SS","MS", "$F$", "$p$","predictor", "partial $\\eta^2$")) %>%
    dplyr::select(c(predictor,1:2,4:5,7))
  kable(lm.anova,digits=3)
}


```
### Data sources

Primary data sources:

* *iconicity ratings*: Perry, Lynn K. et al. Iconicity in the Speech of Children and Adults. Developmental Science. doi:10.1111/desc.12572
* *funniness ratings*: Engelthaler, Tomas, and Thomas T. Hills. 2017. Humor Norms for 4,997 English Words. Behavior Research Methods, July, 1-9. doi:10.3758/s13428-017-0930-6

We use these ratings in our analyses, but we also feed them to our [imputation method](/benchmark-prediction.py), which regresses the human ratings against semantic vectors in order to generate imputed ratings for an additional 63.680 words.

Secondary data sources:

* *number of morphemes*: Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., … Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459. doi: 10.3758/BF03193014
* *word frequency*: Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods, 44(4), 991–997. doi: 10.3758/s13428-012-0190-4 (for *word frequency*)
* *lexical decision times*: Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287-304. doi: 10.3758/s13428-011-0118-4
* *phonotactic measures*: Vaden, K.I., Halpin, H.R., Hickok, G.S. (2009). Irvine Phonotactic Online Dictionary, Version 2.0. [Data file]. Available from http://www.iphod.com.

Secondary data sources used in supplementary analyses:

* *valence, arousal and dominance*: Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191-1207
* *age of acquisition*: Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978-990. doi: 10.3758/s13428-012-0210-4


```{r data, include=F}

words <- read_csv("data/experimental-and-imputed-norms-for-funniness-iconicity-logletterfreq.csv") %>%
  plyr::rename(c("iconicity" = "ico","iconicity_imputed" = "ico_imputed")) %>%
  distinct()

# deduplicate
words <- words[!duplicated(words$word),]

# add sets
words <- words %>%
  mutate(set = ifelse(is.na(ico) & is.na(fun), "C",
                      ifelse(is.na(ico) & !is.na(fun), "B", 
                             ifelse(!is.na(fun) & !is.na(ico),"A","D"))))


subtlex <- read_excel(path="data/SUBTLEX-US frequency list with PoS and Zipf information.xlsx") %>%
  plyr::rename(c("Word" = "word","FREQcount" = "freq_count","Lg10WF" = "logfreq","Dom_PoS_SUBTLEX" = "POS")) %>%
  dplyr::select(word,logfreq,POS) %>%
  filter(word %in% words$word)

# which words are in the norms, but not in subtlex? some pretty colourful ones, but
# not too many (62) - we'll exclude them so we have frequency data for every word
unique(words$word)[unique(words$word) %notin% unique(subtlex$word)]

words <- words %>%
  left_join(subtlex) %>%
  drop_na(logfreq,POS)
rm(subtlex)


# add RT data from Keuleers et al. 2012
RT <- read_tsv("data/blp-items-RT.txt") %>%
  plyr::rename(c("spelling" = "word")) %>%
  dplyr::select(word,rt)

words <- words %>%
  left_join(RT)
rm(RT)

# center variables for use in models
words <- words %>%
  mutate(fun_z = (fun - mean.na(fun)) / sd.na(fun),
         ico_z = (ico - mean.na(ico)) / sd.na(ico))

# add frequency residuals so we can present plots with frequency residualised out 
# (but see Wurm & Fisicaro, 2014 for details and caveats on using residualised measures in regression analyses)

words$fun_resid <- NA
words[which(!is.na(words$logfreq) & !is.na(words$fun)),]$fun_resid <- residuals(lm(fun ~ logfreq,words))

words$fun_imputed_resid <- NA
words[which(!is.na(words$logfreq) & !is.na(words$fun_imputed)),]$fun_imputed_resid <- residuals(lm(fun_imputed ~ logfreq,words))

```


```{r data_2, include=F}
iphod.raw <- read_delim("data/IPhOD2_Words.txt",delim="\t") %>%
  plyr::rename(c("Word" = "word")) 

# have a peek at homographs
# iphod.raw %>% 
#   filter(word %in% iphod.raw[duplicated(iphod.raw$word),]$word) %>% 
#   arrange(desc(SFreq)) %>% 
#   dplyr::select(word,StTrn,SFreq) %>% 
#   slice(1:20)

# keep only unique orthographic words and the most informative non-redundant columns: number of syllables (NSyll), number of phonemes (NPhon), transciption (UnTrn), phonological neighbourhood density (unsDENS), average biphone probability (unsBPAV), average positional probability (unsPOSPAV)
iphod <- iphod.raw[!duplicated(iphod.raw$word),] %>%
  dplyr::select(word,NSyll,UnTrn,NPhon,unsDENS,unsBPAV,unsTPAV,unsPOSPAV,unsLCPOSPAV)

# add to words
words <- words %>% 
  left_join(iphod)
rm(iphod,iphod.raw)

# and we add number of morphemes from the British Lexicon Project
morph <- read_csv("data/EnglishLexiconProject_nmorph_Items.csv") %>% 
  plyr::rename(c("Word" = "word", "NMorph" = "nmorph"))

words <- words %>% 
  left_join(morph)
rm(morph)

words %>% drop_na(nmorph) %>%
  group_by(set) %>%
  summarise(n=n())

```

After collating these data sources we add a range of summary variables, mainly for easy plotting and subset selection.

```{r data_prep, results='hide'}

words <- words %>%
  mutate(fun_perc = ntile(fun,10),
         fun_resid_perc = ntile(fun_resid,10),
         ico_perc = ntile(ico,10),
         diff_rank = fun_perc + ico_perc,
         ico_imputed_perc = ntile(ico_imputed,10),
         fun_imputed_perc = ntile(fun_imputed,10),
         fun_imputed_resid_perc = ntile(fun_imputed_resid,10),
         diff_rank_setB = fun_perc + ico_imputed_perc,
         diff_rank_setC = fun_imputed_perc + ico_imputed_perc,
         diff_rank_setD = fun_imputed_perc + ico_perc,
         logletterfreq_perc = ntile(logletterfreq,10),
         dens_perc = ntile(unsDENS,10),
         biphone_perc = ntile(unsBPAV,10),
         triphone_perc = ntile(unsTPAV,10),
         posprob_perc = ntile(unsPOSPAV,10),
         valence_perc = ntile(valence,10))


```

### Descriptive data

We have **4.996** words rated for funniness, **2.945** rated for iconicity, and **1.419** in the intersection (set A). We have **3.577** words with human funniness ratings and imputed iconicity ratings (set B). We have imputed data for a total of **70.202** words, and we're venturing outside the realm of rated words for **63.680** of them (set C). 

(We also have 1.526 words with human iconicity ratings and imputed funniness ratings in set D, the mirror image of set B; this is not used in the paper but reported on in Supplementary Analyses below.)

```{r numbers, echo=F}
words %>%
  drop_na(set) %>% group_by(set) %>% summarise(n=n()) %>% kable()

# waggle example
examples <- c("wiggle","wobble","waggle") 

words %>%
  filter(word %in% examples) %>%
  dplyr::select(word,ico,fun,ico_perc,fun_perc,ico_imputed,fun_imputed,ico_imputed_perc,fun_imputed_perc)


```

The most important columns in the data are shown below for set A. Sets B and C feature `ico_imputed` and `fun_imputed` instead of or in addition to the human ratings. The field `diff_rank` is the sum of `fun` and `ico` deciles for a given word: a word with `diff_rank` 2 occurs in the first decile (lowest 10%) of both funniness and iconicity ratings, and a word with `diff_rank` 20 occurs in the 10th decile (highest 10%) of both.

```{r prelim_desc,echo=F}

words %>% 
  filter(set == "A") %>%
  group_by(diff_rank) %>%
  sample_n(1) %>% ungroup() %>%
  arrange(-diff_rank) %>%
  dplyr::select(word,ico,fun,logletterfreq,logfreq,rt,nmorph,diff_rank) %>%
  slice(1:10) %>% kable(caption="Structure of the data")

```
### Figures

For a quick impression of the main findings, this section reproduces the figures from the paper.

**Figure 1: Overview**
```{r figures, echo=F, fig.width=8,fig.height=5}

words.setA <- words %>% filter(set == "A")
words.setB <- words %>% filter(set == "B")
words.setC <- words %>% filter(set == "C")

# use plotly to interactively select words to display on overview plot
# ggplot(words.setA,aes(ico,fun,label=word)) +
#   theme_tufte() + ggtitle("Iconicity and funniness") +
#   labs(y = "funniness (residuals)") +
#   geom_smooth(method="loess")
# library(plotly)
# ggplotly()

these_words <- c("baboon","jiggle","giggle","smooch","zigzag","murmur","roar","scratch","victim","grade","grenade","business","canoe","magpie","deuce","buttocks","plush","grain","mud","tender","waddle","fluff","sound")

pA <- ggplot(words.setA,aes(ico,fun,label=word)) +
  theme_tufte() + ggtitle("Iconicity and funniness (n = 1.419)") +
  labs(x="iconicity", y = "funniness") +
  stat_smooth(method="loess",colour="grey",span=0.8) +
  geom_point(alpha=0.5,na.rm=T) +
    geom_label_repel(
    data=subset(words.setA, word %in% these_words),
    aes(label=word),
    size=4,
    alpha=0.8,
    label.size=NA,
    label.r=unit(0,"lines"),
    box.padding=unit(0.35, "lines"),
    point.padding=unit(0.3,"lines"),
    min.segment.length = unit(1.5,"lines")
  ) +
  NULL

pB <- ggplot(words.setA,aes(ico)) +
  theme_tufte() + ggtitle("Iconicity ratings (n = 2.945)") +
  labs(x = "iconicity") + scale_x_continuous(limits=c(-5,5)) +
  stat_density(geom="line") + geom_rug()

pC <- ggplot(words.setA,aes(fun)) +
  theme_tufte() + ggtitle("Funniness ratings (n = 4.996)") +
  labs(x = "funniness") + scale_x_continuous(limits=c(1,5)) +
  stat_density(geom="line") + geom_rug()

right_panel <- plot_grid(pB, pC,ncol = 1,labels=c("B","C"))

plot_grid(pA, right_panel,labels=c("A",NA,NA), label_size=14, rel_widths = c(2,1))

```

**Figure 3: Funniness and iconicity**

```{r fig_3_panel, fig.width=10.2,fig.height=4, echo=F}


p3A <- ggplot(words.setA,aes(ico,fun_resid)) +
  theme_tufte() + ggtitle("Funniness & iconicity", subtitle="(n = 1.419)") +
  labs(x="iconicity", y="funniness (residuals)") +
  geom_point(alpha=0.5,na.rm=T) +
  stat_smooth(method="lm",se=T,colour="grey",fill="white",alpha=0.9)

p3B <- ggplot(words.setB,aes(ico_imputed,fun_resid)) +
  theme_tufte() + ggtitle("Funniness & imputed iconicity",subtitle="(n = 3.577)") +
  labs(x="imputed iconicity",y="funniness (residuals)") +
  geom_point(alpha=0.5,na.rm=T) +
  stat_smooth(method="lm",se=T,colour="grey",fill="white",alpha=0.9)

p3C <- ggplot(words.setC,aes(ico_imputed,fun_imputed_resid)) +
  theme_tufte() + ggtitle("Imputed funniness & imputed iconicity", subtitle="(n = 63.680)") +
  labs(x="imputed iconicity",y="imputed funniness (residuals)") +
  geom_point(alpha=0.5,na.rm=T) +
  stat_smooth(method="lm",se=T,colour="grey",fill="white",alpha=0.9)

plot_grid(p3A, p3B, p3C, labels="AUTO", label_size=14,nrow=1)


```

**Figure 4: Highest rated words**

```{r figure_upper, fig.width=10.2,fig.height=6, echo=F}

ggplot(words.setA,aes(ico,fun)) +
  theme_tufte() + ggtitle("Funniness and iconicity: highest rated words") +
  labs(x="iconicity",y="funniness") +
  geom_point(alpha=0.5,na.rm=T) +
  geom_label_repel(
    data=sample_n(subset(words.setA,diff_rank > 18),40),
    aes(label=word),
    size=4,
    alpha=0.8,
    segment.colour="grey50",
    label.size=NA,
    label.r=unit(0,"lines"),
    box.padding=unit(0.35, "lines"),
    point.padding=unit(0.3,"lines")
  )

```

**Figure 5: Structural markedness**

```{r figure_foregrounding, fig.width=10.2,fig.height=4, echo=F}

onsets <- "^(bl|cl|cr|dr|fl|sc|sl|sn|sp|spl|sw|tr|pr|sq)"
codas <- "(nch|mp|nk|rt|rl|rr|sh|wk)$"
verbdim <- "([b-df-hj-np-tv-xz]le)$" # i.e., look for -le after a consonant

# tag words for these patterns, applying verbdim only to verbs
# add a cumulative measure of complexity
words <- words %>%
  mutate(group = ifelse(diff_rank > 18,"highest","other")) %>%
  mutate(complex.coda = ifelse(str_detect(word,pattern=codas),1,0),
         complex.onset = ifelse(str_detect(word,pattern=onsets),1,0),
         complex.verbdim = ifelse(str_detect(word,pattern=verbdim),
                                  ifelse(POS == "Verb",1,0),0)) %>%
  mutate(cumulative = rowSums(.[c("complex.coda","complex.onset","complex.verbdim")])) 

# define snippets to minimise repetition
markedness_layers <- list(
  stat_smooth(method="loess", span=0.8,color="black",se=T),
  stat_smooth(method="loess", span=0.7,se=F, color="black",show.legend = T,linetype="longdash",aes(y=onset)),
  stat_smooth(method="loess", span=0.7,se=F, color="black",show.legend = T,linetype="dashed",aes(y=coda)),
  stat_smooth(method="loess", span=0.7,se=F, color="black",show.legend = T,linetype="dotted",aes(y=verbdim))
)

# there are many other avoidable redundancies here but okay
p4A <- words %>%
  drop_na(diff_rank) %>%
  mutate(fun_perc = ntile(fun,100)) %>%
  group_by(fun_perc) %>%
  summarise(n=n(),
            onset=mean.na(complex.onset),
            coda=mean.na(complex.coda),
            verbdim=mean.na(complex.verbdim),
            complexity=mean.na(cumulative)) %>%
  ggplot(aes(fun_perc,complexity)) +
  theme_tufte() +
  ggtitle("Structural markedness") +
  labs(y="cumulative markedness",x="funniness (percentile)") +
  scale_y_continuous(limits=c(0,1)) +
  geom_point(shape=1) +
  markedness_layers +
  annotate("segment",x=5,xend=15,y=0.96,yend=0.96,colour="black",linetype="longdash",size=0.8) +
  annotate("segment",x=5,xend=15,y=0.88,yend=0.88,colour="black",linetype="dashed",size=0.8) +
  annotate("segment",x=5,xend=15,y=0.80,yend=0.80,colour="black",linetype="dotted",size=0.8) +
  annotate("segment",x=5,xend=15,y=0.72,yend=0.72,colour="black",linetype="solid",size=0.8) +
  annotate("text",x=20,y=c(0.97,0.89,0.81,0.73),
           label=c("onset","coda","'-le' suffix","cumulative"),
           hjust=0,size=3.8,family="serif")

p4B <- words %>%
  drop_na(diff_rank) %>%
  mutate(ico_perc = ntile(ico,100)) %>%
  group_by(ico_perc) %>%
  summarise(n=n(),
            onset=mean.na(complex.onset),
            coda=mean.na(complex.coda),
            verbdim=mean.na(complex.verbdim),
            complexity=mean.na(cumulative)) %>%
  ggplot(aes(ico_perc,complexity)) +
  theme_tufte() +
  ggtitle("") +
  labs(y="cumulative markedness",x="iconicity (percentile)") +
  scale_y_continuous(limits=c(0,1)) +
  geom_point(shape=1) +
  markedness_layers 

p4C <- words %>%
  drop_na(diff_rank) %>%
  mutate(funico_perc = ntile(ico_z + fun_z,100)) %>%
  group_by(funico_perc) %>%
  summarise(n=n(),
            onset=mean.na(complex.onset),
            coda=mean.na(complex.coda),
            verbdim=mean.na(complex.verbdim),
            complexity=mean.na(cumulative)) %>%
  ggplot(aes(funico_perc,complexity)) +
  theme_tufte() +
  ggtitle("") +
  labs(y="cumulative markedness",x="funniness + iconicity (percentile)") +
  scale_y_continuous(limits=c(0,1)) +
  geom_point(shape=1) +
  markedness_layers

plot_grid(p4A, p4B, p4C, labels="AUTO", label_size=14,nrow=1)

```


```{r export_words, include=F}

words.export <- words %>%
  drop_na(set)
write.csv(words.export, file="data/words.csv")

```


## Main analyses

### Funniness and iconicity

#### Reproducing prior results
Engelthaler & Hills report frequency as the strongest correlate with funniness (less frequent words are rated as more funny), and lexical decision RT as the second strongest (words with slower RTs are rated as more funny). By way of sanity check let's replicate their analysis.

Raw correlations hover around 28%, as reported (without corrections or controls) in their paper. A linear model with funniness as dependent variable and frequency and RT as predictors shows a role for both, though frequency accounts for a much larger portion of the variance (15%) than rt (0.6%).

```{r frequency_1, include=F}

# raw correlations at ~28% as reported (without corrections or controls) in E&T
cor.test(words$fun,words$logfreq)
cor.test(words$fun,words$rt)
``` 

To what extent do frequency and RT predict funniness?

```{r freq_2, include=F}
m0 <- lm(fun ~ logfreq + rt, words %>% drop_na(fun))
summary(m0)

# model validation
plot(fitted(m0),
     residuals(m0))   # no obvious nonlinearity
qqnorm(residuals(m0)) # looks OK
qqline(residuals(m0)) # looks OK, slight right skew
vif(m0)               # below 2 so no indication of multicollinearity


```

**Model m0:** `r format(m0$call)`

`r print_table(m0)`

#### Known knowns

If frequency and RT explain some of the variance in funniness ratings, how much is left for iconicity? We'll do this analysis on the core set of 1419 words for which we have funniness and iconicity ratings.

Turns out that the magnitude estimate of iconicity is about half that of frequency, and with positive sign instead of a negative one (higher funniness ratings go with higher iconicity ratings). The effect of iconicity ratings is much larger than RT, the second most important correlate reported by Engelthaler & Hill.

```{r lm_1, include=F}

m1.1 <- lm(fun ~ logfreq + rt, words %>% filter(set=="A"))
summary(m1.1)

m1.2 <- lm(fun ~ logfreq + rt + ico, words %>% filter(set=="A"))
plot(fitted(m1.2),residuals(m1.2))  # no obvious linearity
qqnorm(residuals(m1.2))
qqline(residuals(m1.2))           # looks OK, slight right skew or light tailed as above
vif(m1.2)                         # all below 2 so no indications of multicollinearity

anova(m1.1,m1.2)
summary(m1.2)


```

**Model m1.1**: `r format(m1.1$call)`
`r print_table(m1.1)`

**Model m1.2**: `r  format(m1.2$call)` 
`r print_table(m1.2)`

`r kable(anova(m1.1,m1.2),caption="model comparison of m1.1 and m1.2")`

Partial correlations show 20.6% covariance between funniness and iconicity, partialing out log frequency as a mediator. This shows the effects of iconicity and funniness are not reducible to frequency alone. 

```{r partial_correlations,echo=F}

words.setA <- words %>% filter(set=="A")

pcor.test(x=words.setA$fun,y=words.setA$ico,z=words.setA$logfreq) %>% kable(caption="funniness and iconicity controlling for word frequency")

# the other two:
# pcor.test(x=words.setA$fun,y=words.setA$logfreq,z=words.setA$ico)
# pcor.test(x=words.setA$ico,y=words.setA$logfreq,z=words.setA$fun)

```

**Example words**

Both high: *`r words %>% filter(diff_rank > 19) %>% arrange(-ico) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`* 

Both low: *`r words %>% filter(diff_rank <= 2) %>% arrange(ico) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

High funniness, low iconicity: *`r words %>% filter(fun_perc > 9, ico_perc < 4) %>% arrange(-ico) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

High iconicity, low funniness: *`r words %>% filter(ico_perc > 9, fun_perc < 4) %>% arrange(-ico) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

N.B. controlling for frequency in these lists (by using `fun_resid` instead of `fun`) does not make a difference in ranking, so not done here and elsewhere.

What about compound nouns among high iconicity words? From eyeballing, it seems to be about 10% in a set of the highest rated 200 nouns. Many probable examples can be found by looking at highly rated nouns with multiple morphemes: *`r words %>% filter(set == "A", ico_perc > 8, POS == "Noun", nmorph > 1) %>% arrange(-ico) %>% slice(1:200) %>% dplyr::select(word) %>% unlist %>% unname()`* (*zigzag*,  one of the few reduplicative words in English, is included here because the Balota et al. database lists it as having 2 morphemes).

### Funniness and imputed iconicity

Here we study the link between funniness ratings and imputed iconicity ratings.

```{r lm_2, include=F}

words.setB <- words %>%
  filter(!is.na(fun),
         is.na(ico))

m2.1 <- lm(fun ~ logfreq + rt, words.setB)
summary(m2.1)

m2.2 <- lm(fun ~ logfreq + rt + ico_imputed, words.setB)
plot(fitted(m2.2),residuals(m2.2))  # no obvious linearity
qqnorm(residuals(m2.2))
qqline(residuals(m2.2))           # looks OK, slight right skew or light tailed as above
vif(m2.2)                         # all below 2 so no indications of multicollinearity

summary(m2.2)
anova(m2.1,m2.2)

```

Compared to model m2.1 with just log frequency and lexical decision time as predictors, model m2.2 including imputed iconicity as predictor provides a significantly better fit and explains a larger portion of the variance.

**Model m2.1**: `r format(m2.1$call)`
`r print_table(m2.1)`

**Model m2.2**: `r format(m2.2$call)`
`r print_table(m2.2)`

`r kable(anova(m2.1,m2.2),caption="model comparison")`

A partial correlations analysis shows that imputed iconicity values correlate with funniness ratings at at least the same level as actual iconicity ratings, controlling for frequency (r = 0.32, p < 0.0001).

```{r pcor_2, include=F}
# partial correlation
pcor.test(x=words.setB$fun,y=words.setB$ico_imputed,z=words.setB$logfreq) %>% kable(caption="funniness and imputed iconicity controlling for word frequency")


```

**Example words**

High imputed funniness and high imputed iconicity: *`r words.setB %>% filter(diff_rank_setB > 19) %>% arrange(-ico_imputed) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

Low imputed funniness and low imputed iconicity: *`r words.setB %>% filter(diff_rank_setB <= 2) %>% arrange(-desc(ico_imputed)) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

High funniness and low imputed iconicity: *`r words.setB %>% filter(fun_perc > 9, ico_imputed_perc < 4) %>% arrange(desc(fun)) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

Low funniness and high imputed iconicity: *`r words.setB %>% filter(ico_imputed_perc > 9, fun_perc < 3) %>% arrange(-ico_imputed) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

What about analysable compounds among high iconicity nouns? Here too about 10%, with examples like *heartbeat, mouthful, handshake, bellboy, comeback, catchphrase*.

```{r analysable_compounds, include=F}
words.setB %>% 
  filter(ico_imputed_perc > 9,
         POS == "Noun") %>%
  arrange(-ico_imputed) %>%
  slice(1:200) %>%
  dplyr::select(word) %>% unlist %>% unname() 

```

### Imputed funniness and imputed iconicity

```{r lm_3, include=F}

words.setC <- words %>%
  filter(is.na(fun),
         is.na(ico))

m3.1 <- lm(fun_imputed ~ logfreq + rt, words.setC)
summary(m3.1)

m3.2 <- lm(fun_imputed ~ logfreq + rt + ico_imputed, words.setC)
plot(fitted(m3.2),residuals(m3.2))  # no obvious linearity
qqnorm(residuals(m3.2))
qqline(residuals(m3.2))           # looks OK, slight right skew or light tailed as above
vif(m3.2)                         # all below 2 so no indications of multicollinearity

summary(m3.2)
anova(m3.1,m3.2)

```

Model 3.1: `r format(m3.1$call)`
`r print_table(m3.1)`

Model 3.2: `r format(m3.2$call)`
`r print_table(m3.2)`

`r kable(anova(m3.1,m3.2),caption="model comparison")`

Partial correlations show that imputed iconicity and imputed funniness share 43% covariance not explained by word frequency.

```{r both_imputed_ppcor,echo=F}
# partial correlation
pcor.test(x=words.setC$fun_imputed,y=words.setC$ico_imputed,z=words.setC$logfreq)  %>% kable(caption="imputed funniness and imputed iconicity controlling for word frequency")

```

**Example words**

High imputed funniness and high imputed iconicity: *`r words.setC %>% filter(diff_rank_setC > 18) %>% arrange(desc(ico_imputed)) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

Low imputed funniness and low imputed iconicity: *`r words.setC %>% filter(diff_rank_setC <= 2) %>% arrange(-desc(ico_imputed)) %>%   dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

High imputed funniness and low imputed iconicity: *`r words.setC %>% filter(fun_imputed_perc > 9, ico_imputed_perc < 4) %>% arrange(desc(fun)) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

Low imputed funniness and high imputed iconicity: *`r words.setC %>% filter(ico_imputed_perc > 9, fun_imputed_perc < 3) %>% arrange(-ico_imputed) %>% dplyr::select(word) %>% slice(1:20) %>% unlist() %>% unname()`*

What about compound nouns here? In the top 200 nouns we can spot ~5 (*shockwave, doodlebug, flashbulb, backflip, footstep*) but that is of course a tiny tail end of a much larger dataset than the earlier two.

A better way is to sample 200 random nouns from a proportionate slice of the data, i.e. 200 * 17.8 = 3560 top nouns in imputed iconicity. In this subset we find at least 30 non-iconic analysable compounds: *fireworm, deadbolt, footstep, pockmark, uppercut, woodwork, biotech, notepad, spellbinder, henchmen, quicksands, blowgun, heartbreaks, moonbeams, sketchpad*, et cetera.

```{r compound_nouns, results='hide'}

words.setC %>% 
  filter(ico_imputed_perc > 9,
         POS == "Noun") %>%
  arrange(-ico_imputed) %>%
  slice(1:200) %>%
  dplyr::select(word) %>% unlist %>% unname() 

set.seed(1983)
words.setC %>% 
  filter(ico_imputed_perc > 9,
         POS == "Noun") %>%
  arrange(-ico_imputed) %>%
  slice(1:3560) %>%
  sample_n(200) %>%
  dplyr::select(word) %>% unlist %>% unname() 

```

### Structural properties of highly rated words

#### Log letter frequency

Mean iconicity and mean funniness are higher for lower log letter frequency quantiles:

```{r logletterfreq_1, echo=F}
words %>%
  filter(set == "A") %>%
  group_by(logletterfreq_perc) %>%
  summarise(mean_ico = mean.na(ico),mean_fun = mean.na(fun)) %>%
  kable(caption="Mean funniness and iconicity by log letter frequency quantiles")
```

High-iconicity high-funniness words tend to have lower log letter frequencies:

```{r logletterfreq_2, echo=F}
words %>%
  filter(set == "A",
         diff_rank > 18) %>%
  arrange(desc(ico)) %>%
  dplyr::select(word,fun,ico,diff_rank,logletterfreq_perc) %>%
  slice(1:20) %>%
  kable(caption="Log letter frequency percentiles for upper quantiles of funniness + iconicity")
```

Model comparison with funniness as the DV and log letter frequency as an additional predictor shows that a model including log letter frequency provides a significantly better fit. 

```{r markedness,include=F}

# m4.1 = m1.2, reproduced here for clarity
m4.1 <- lm(fun ~ logfreq + rt + ico,data=words %>% filter(set=="A"))

# add logletterfreq as predictor
m4.2 <- lm(fun ~ logfreq + rt + ico + logletterfreq, data=words %>% filter(set=="A"))

plot(fitted(m4.2),residuals(m4.2))  # no obvious linearity
qqnorm(residuals(m4.2))
qqline(residuals(m4.2))           # looks OK, slight right skew or light tailedness
vif(m4.2)                         # all below 2 so no indications of multicollinearity

summary(m4.2)
anova(m4.1,m4.2)

```

**Model m4.1**: `r format(m4.1$call)`
`r print_table(m4.1)`

**Model m4.2**: `r format(m4.2$call)`
`r print_table(m4.2)`

`r kable(anova(m4.1,m4.2),caption="model comparison")`

Partial correlations show that funniness rating and log letter frequency have a covariance of -15.7% controlling for iconicity, and that iconicity and log letter frequency have a covariance of -16.3% controlling for funniness ratings (all p < 0.0001 correcting for multiple comparisons). 

```{r pcor_34, echo=F}
#partial correlations 
pcor.test(x=words.setA$fun,y=words.setA$logletterfreq,z=words.setA$ico) %>% kable(caption="funniness and log letter frequency controlling for iconicity")

pcor.test(x=words.setA$ico,y=words.setA$logletterfreq,z=words.setA$fun) %>% kable(caption="iconicity and log letter frequency controlling for funniness")
```

Model comparison for combined funniness and iconicity scores suggests that having log letter frequency as a predictor significantly improves fit over and above word frequency and lexical decision time. 

```{r lm_combined_scores,include=F}
words <- words %>%
  mutate(funico = ico_z + fun_z,
         funico_perc = ntile(funico,100))

words.setA <- words %>% filter(set == "A")
m5.1 <- lm(funico ~ logfreq + rt,data=words.setA)
summary(m5.1)

m5.2 <- lm(funico ~ logfreq + rt + logletterfreq,data=words.setA)
summary(m5.2)
anova(m5.1,m5.2)

```

**Model m5.1**: `r format(m5.1$call)`
`r print_table(m5.1)`

**Model m5.2**: `r format(m5.2$call)`
`r print_table(m5.2)`

`r kable(anova(m5.1,m5.2),caption="model comparison")`

#### Structural analysis
We carry out a qualitative analysis of the 80 highest ranked words (top deciles for funniness+iconicity) to see if there are formal cues of foregrounding and structural markedness that can help predict funniness and iconicity ratings. Then we find these cues in the larger dataset and see if the patterns hold up.

This analysis reveals the following sets of complex onsets, codas, and verbal diminutive suffixes that are likely structural cues of markedness (given here in the form of regular expressions):

* onsets: `^(bl|cl|cr|dr|fl|sc|sl|sn|sp|spl|sw|tr|pr|sq)`
* codas: `(nch|mp|nk|rt|rl|rr|sh|wk)$`
* verbal suffix: `[b-df-hj-np-tv-xz]le)$`" (i.e., look for -le after a consonant)

We tag these cues across the whole dataset (looking for the *-le* suffix only in verbs because words like *mutable, unnameable, scalable, manacle* are not the same phenomenon) in order to see how they relate to funniness and iconicity.

```{r qualitative, include=F}

# words.high <- words %>%
#   filter(set=="A",
#          diff_rank > 18) %>%
#   dplyr::select(word,ico,fun,POS,logletterfreq,UnTrn,NSyll,NPhon,diff_rank) %>%
#   write_excel_csv(path="data/words_highest.csv",delim = ",")

# BTW, some versions of excel cope better with output from csv2
# write_excel_csv2(path="data/words_highest.csv",delim = ",")

# qualitative analysis of the top 80 words reveals the following sets of complex onsets, codas, and verbal diminutive suffixes that are likely structural cues of markedness:

onsets <- "^(bl|cl|cr|dr|fl|sc|sl|sn|sp|spl|sw|tr|pr|sq)"
codas <- "(nch|mp|nk|rt|rl|rr|sh|wk)$"
verbdim <- "([b-df-hj-np-tv-xz]le)$" # i.e., look for -le after a consonant

# tag words for these patterns, applying verbdim only to verbs
# add a cumulative measure of complexity
words <- words %>%
  mutate(group = ifelse(diff_rank > 18,"highest","other")) %>%
  mutate(complex.coda = ifelse(str_detect(word,pattern=codas),1,0),
         complex.onset = ifelse(str_detect(word,pattern=onsets),1,0),
         complex.verbdim = ifelse(str_detect(word,pattern=verbdim),
                                  ifelse(POS == "Verb",1,0),0)) %>%
  mutate(cumulative = rowSums(.[c("complex.coda","complex.onset","complex.verbdim")])) 

# give a few examples of each
words %>% 
  filter(complex.onset == 1 & diff_rank > 18) %>% dplyr::select(word) %>%
  slice(1:10) %>% unlist() %>% unname() %>% kable(caption="10 words with complex onsets")
  
words %>% 
  filter(complex.coda == 1 & diff_rank > 18) %>% dplyr::select(word) %>%
  slice(1:10) %>% unlist() %>% unname() %>% kable(caption="10 words with complex codas")

words %>% 
  filter(complex.verbdim == 1 & diff_rank > 18) %>% dplyr::select(word) %>%
  slice(1:10) %>% unlist() %>% unname() %>% kable(caption="10 words with the verbal -le suffix")

# sanity check: all words ending in -le vs all verbs
#words[str_detect(words$word,pattern=verbdim),]$word
#words[which(words$complex.verbdim == 1),]$word

# compare these cues in the 80 highest rated words versus the rest
words %>%
  group_by(group) %>% drop_na(group) %>% drop_na(cumulative) %>%
  summarise(n=n(),
            ico=mean.na(ico),
            fun=mean.na(fun),
            onset=mean.na(complex.onset),
            coda=mean.na(complex.coda),
            verbdim=mean.na(complex.verbdim),
            cumul=mean.na(cumulative),
            sd=sd.na(cumulative)) %>%
  kable(caption="Markedness cues in highest rated words versus the rest")

# t-test & effect size
t.test(words[which(words$group == "highest"),]$cumulative,
       words[which(words$group == "other"),]$cumulative)

cohen.d(words[which(words$group == "highest"),]$cumulative,
       words[which(words$group == "other"),]$cumulative)

```

Model the contribution of markedness relative to logletter frequency. Model comparison shows that a model including the measure of cumulative markedness as predictor provides a significantly better fit (F = 52.78, p < 0.0001) and explains a larger portion of the variance (adjusted R2 = 0.21 vs. 0.18) than a model with just word frequency, lexical decision time and log letter frequency.
  
```{r model_markedness, include=F}

# predicting fun+ico from markedness

words.setA <- words %>% filter(set=="A")

m5.3 <- lm(funico ~ logfreq + rt + logletterfreq + cumulative,data=words.setA)
anova(m5.2,m5.3)
summary(m5.3)

``` 

**Model m5.3**: `r format(m5.3$call)`
`r print_table(m5.3)`

`r kable(anova(m5.2,m5.3),caption="Model comparison of m5.2 and m5.3")`

Now we trace cumulative markedness in the imputed portions of the dataset, and do the same model comparison as above. 

First have a look at a random sample of top imputed words and their markedness:

```{r markedness_in_imputed, echo=F}
words %>%
  filter(is.na(ico),
         cumulative > 0,
         ico_imputed_perc > 9) %>%
  group_by(cumulative) %>%
  sample_n(10) %>%
  arrange(-ico_imputed) %>%
  dplyr::select(word,ico_imputed_perc,ico_imputed,cumulative) %>%
  kable(caption="Cumulative markedness in a random sample of words from the highest quantile of imputed iconicity")

```

And at a random sample of words from lower quadrants and their markedness:

```{r markedness_in_imputed_1, echo=F}
# have a look at a random sample of words from lower quadrants and their markedness
words %>%
  filter(is.na(ico),
         cumulative > 0,
         ico_imputed_perc < 8) %>%
  group_by(cumulative) %>%
  sample_n(10) %>%
  arrange(-ico_imputed) %>%
  dplyr::select(word,ico_imputed_perc,ico_imputed,cumulative) %>%
  kable(caption="Cumulative markedness in a random sample of words from lower quantiles of imputed iconicity")

```

Looks like random samples of 20 high-complexity words always feature a majority of high iconicity words:

```{r markedness_in_imputed_2, echo=F}
# random samples of 20 high-complexity words always feature a majority of high iconicity words
words %>%
  filter(is.na(ico),
         cumulative == 2) %>%
  sample_n(20) %>%
  arrange(-ico_imputed) %>%
  dplyr::select(word,ico_imputed_perc,ico_imputed,fun_imputed,cumulative) %>%
  kable(caption="Imputed ratings for 20 random words high in cumulative markedness")

```

Let's have a closer look at subsets. First quadrants, then deciles.
```{r markedness_in_imputed_subsets, echo=F}
# closer look at subsets
words.setB <- words %>% filter(set=="B")

words.setC <- words %>% filter(set=="C")

# compare four quadrants
words.setC %>%
  mutate(target_perc = ntile(ico_imputed,4)) %>%
  group_by(target_perc) %>%
  summarise(n=n(),
            onset=mean.na(complex.onset),
            coda=mean.na(complex.coda),
            verbdim=mean.na(complex.verbdim),
            complexity=mean.na(cumulative)) %>%
    kable(caption="Markedness cues across quartiles of imputed iconicity")

# or deciles
words.setC %>%
  mutate(target_perc = ntile(ico_imputed,10)) %>%
  group_by(target_perc) %>%
  summarise(n=n(),
            onset=mean.na(complex.onset),
            coda=mean.na(complex.coda),
            verbdim=mean.na(complex.verbdim),
            complexity=mean.na(cumulative)) %>%
  kable(caption="Markedness cues across deciles of imputed iconicity")

```

```{r markedness_in_imputed_lm, include=F}
# model comparison

# create funico_imputed measure
words.setC <- words.setC %>%
  mutate(fun_imputed_z = (fun_imputed - mean.na(fun_imputed)) / sd.na(fun_imputed),
         ico_imputed_z = (ico_imputed - mean.na(ico_imputed)) / sd.na(ico_imputed),
         funico_imputed = fun_imputed_z + ico_imputed_z)


m5.4 <- lm(funico_imputed ~ logfreq + rt + logletterfreq,data=words.setC)
summary(m5.4)

m5.5 <- lm(funico_imputed ~ logfreq + rt + logletterfreq + cumulative,data=words.setC)
summary(m5.5)
anova(m5.4,m5.5)
```

Comparison of models with combined imputed funniness and iconicity as a dependent variable shows that a linear model including cumulative markedness as predictor provides a significantly better fit (F1,19230 = 337.3, p < 0.0001) and explains a little bit more the variance (adjusted R2 = 0.124 vs. 0.109) than a model with just word frequency, lexical decision time and log letter frequency.

**Model m5.4**: `r format(m5.4$call)`
`r print_table(m5.4)`

**Model m5.5**: `r format(m5.5$call)`
`r print_table(m5.5)`

`r kable(anova(m5.4,m5.5),caption="model comparison")`


## End
Thanks for your interest. Also see the separate code notebook with[supplementary analyses](./playful_iconicity_supplements.md).


If you find this useful, consider checking out the following resources that have been helpful in preparing this Rmarkdown document:

* Two of my own past projects (remember, the person most grateful for your well-documented past code is future you):
  * [Expressiveness and grammatical integration](http://ideophone.org/collab/expint/) (by Mark Dingemanse) 
  * [Coloured vowels: open data and code](https://github.com/mdingemanse/colouredvowels/blob/master/BRM_colouredvowels_opendata.md) (by Mark Dingemanse & Christine Cuskley)
* [Formatting ANOVA tables in R](http://www.understandingdata.net/2017/05/11/anova-tables-in-r/) (by Rose Hartman, Understanding Data)
* [Iconicity in the speech of children and adults](https://github.com/bodowinter/iconicity_acquisition) (by Bodo Winter)
* [English letter frequencies](http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/english-letter-frequencies/)

And of course have a look at the paper itself — latest preprint here: [Playful iconicity](https://psyarxiv.com/9ak7e/)