takeup_analysis2.Rmd

---
title: "TakeUp Analysis Notebook"
author:
- Anne Karing^[University of California Berkeley]
- Karim Naguib^[Evidence Action]
output:
  html_notebook:
    fig_align: "center"
    fig_caption: yes
    fig_height: 5
    fig_width: 8
    number_sections: yes
    theme: flatly
    toc: yes
    toc_float:
      collapsed: false
      smooth_scroll: false
    toc_depth: 5
header-includes:
   - \usepackage{bbm}
date: "`r format(Sys.time(), '%B %d, %Y')`"
---

```{r, echo=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

```{r setup, include=FALSE}
library(magrittr)
library(plyr)
library(tidyverse)
# library(multidplyr)
library(lubridate)
library(forcats)
library(haven)
library(lmtest)
library(car)
library(mlogit)
library(broom)
library(ggrepel)
library(sp)
library(rgeos)
library(ggmap)
library(knitr)

library(econometr)

source("takeup_rct_assign_clusters.R")
source("analysis_util.R")

knitr::read_chunk("analysis_util.R", labels = "analysis-util")

config <- yaml::yaml.load_file("../local_config.yaml")

doParallel::registerDoParallel(cores = config$cores)

wgs.84 <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
kenya.proj4 <- "+proj=utm +zone=36 +south +ellps=clrk80 +units=m +no_defs"

do.neyman.analysis <- TRUE

options(dplyr.show_progress = FALSE, digits = 4)
```

# Data

```{r load-data, include=FALSE}
load(file.path("data", "analysis.RData"))
load(file.path("data", "mht_results.RData"))
```

These are the _monitored_: baseline and endline survey respondents, as well as a sample of people from whom we received reconsent without surveying.

```{r, results='asis', eval=FALSE}
census.data %>% 
  filter(!is.na(wave), # Actually in the study (not in one of the dropped clusters)
         monitored) %>% {
    count(., endline.type) %>% kable %>% print
    
    filter(., is.na(endline.type)) %>% count(hh.baseline.sample.pool)
  } %>% 
  kable
```

~~Below is the name matching code. We are using this function to identify individuals in a census who received treatment in their cluster's point-of-treatment and were not recorded by enumerators during their take-up monitoring. This is necessary to estimate take-up for non-phone owners, who were not included in the monitoring list provided to enumerators, in four of the six experiment strata (wave 1).~~ (Moved to external script).

We consider two individuals (in the take-up and census data) to be matched if the sum of edit (Levenshtein) distances of the first and last names is less than or equal to 1.

Below we are linking the consent, census (including take-up status), and endline data to generate the analysis data set. At this stage, we are including everyone in the census--the entire cluster populations. For our analysis, we will filter on consent. (Code moved to separate script).

```{r outliers-plot, echo=FALSE, fig.width=12, fig.height=10}
# bind_rows(monitored = cluster.takeup.data, all = unmonitored.cluster.takeup.data, .id = "monitored.sample") %>% 
#   ggplot(aes(stratum, takeup.prop, color = monitored.sample)) +
#   geom_boxplot() + #aes(color = assigned.treatment)) + 
#   geom_text_repel(aes(label = cluster.id), data = . %>% filter(outlier)) +
#   xlab("Strata") +
#   scale_y_continuous("Cluster Proportion Proportion", breaks = seq(0, 1, 0.1)) +
#   # scale_color_discrete("Treatment") +
#   facet_grid(sms.treatment ~ assigned.treatment) +
#   theme(legend.position = "bottom", axis.text.x = element_blank())
```

Outlier cells plot:

```{r outlier-cells-plot, echo=FALSE, fig.width=12, fig.height=10}
cell.takeup.data %>% 
  unite(stratum, county_dist_stratum, mon_status, remove = FALSE) %>% 
  ggplot(aes(stratum, takeup.prop)) +
  geom_boxplot() + #aes(color = assigned.treatment)) + 
  geom_text_repel(aes(label = cluster.id), data = . %>% filter(outlier)) +
  xlab("Strata") +
  scale_y_continuous("Cluster Proportion Proportion", breaks = seq(0, 1, 0.1)) +
  # scale_color_discrete("Treatment") +
  facet_grid(sms.treatment ~ assigned.treatment + mon_status, scales = "free_x") +
  theme(legend.position = "bottom", axis.text.x = element_blank())
```

The outlier ~~clusters~~ cells are:

```{r}
# bind_rows(monitored = outlier.clusters, all = unmonitored.outlier.clusters, .id = "monitored.sample") %>% 
outlier.cells %>% 
  select(cluster.id, sms.treatment, mon_status) %>% 
  arrange(mon_status, sms.treatment) %>% 
  kable
```

# Knowledge and Beliefs

## Knowledge

```{r}
know.bel.cat.plot("who_worms") + labs(title = "Who is at risk of worms?")
```

```{r}
know.bel.cat.plot("effect_worms") + labs(title = "What are the effects of worms?")
```

```{r}
know.bel.cat.plot("spread_worms") +
  labs(title = "Can a worms infected person spread worms to others?")
```

```{r}
know.bel.cat.plot("how_spread") +
  labs(title = "How are worms spread?")
```

```{r}
know.bel.cat.plot("stop_worms") +
  labs(title = "How to stop worms?")
```

Low proportion know drugs are effective.

```{r}
know.bel.cat.plot("when_treat") +
  labs(title = "When should people deworm?")
```


## Externalities

```{r}
know.bel.cat.plot("worms_affect", na.rm = TRUE) +
  labs(title = "If infected, can you affect others' health?")
```


```{r}
know.bel.cat.plot("neighbours_worms_affect", na.rm = TRUE) +
  labs(title = "Can neighbors or relatives worm infection affect your health?", x = "")
```

### Baseline Only

```{r, fig.height=3, fig.width=8}
baseline.data %>% 
  select(few_deworm) %>% 
  ggplot(aes(few_deworm)) +
  geom_bar(aes(y = ..count../sum(..count..))) +
  coord_flip() +
  labs(title = "Deworm if less than half of others deworm?", x = "", y = "Proportion")
```

```{r}
baseline.data %>% 
  select(many_deworm) %>% 
  ggplot(aes(many_deworm)) +
  geom_bar(aes(y = ..count../sum(..count..))) +
  coord_flip() +
  labs(title = "Deworm if more than half of others deworm?", x = "", y = "Proportion")
```


```{r}
baseline.data %>% 
  select(more_less) %>% 
  ggplot(aes(more_less)) +
  geom_bar(aes(y = ..count../sum(..count..))) +
  coord_flip() +
  labs(title = "Would you be more likely to deworm if few/many others get dewormed?", x = "", y = "Proportion")
```
## Baseline Social Image Questions

I'm dropping information about the question group: _A-D_.

```{r praise-stigma}
praise.stigma.plot <- baseline.data %>% 
  select(matches("^(praise|stigma)_[^_]+$")) %>% 
  gather(key = key, value = response) %>% 
  separate(key, c("praise.stigma", "topic"), "_") %>% 
  separate(topic, c("topic", "question.group"), -2) %>% 
  filter(!is.na(response)) %>% 
  count(praise.stigma, topic, response) %>% 
  group_by(praise.stigma, topic) %>% 
  mutate(n = n/sum(n)) %>% 
  ungroup %>%
  mutate_at(vars(praise.stigma, response), funs(fct_relabel(factor(.), str_to_title))) %>% 
  mutate(topic = fct_recode(factor(topic), 
                            "Wearing/not wearing nice clothes to church" = "clothe",
                            "Use Latrine/open defecation" = "defecat",
                            "Deworming/not deworming during MDA" = "dewor",
                            "Immunize/not immunize children" = "immuniz")) %>% 
  ggplot(aes(response)) +
  geom_col(aes(y = n), alpha = 0.5) +
  labs(y = "Proportion", x = "") +
  scale_y_continuous(breaks = seq(0.25, 1, 0.25)) +
  coord_flip() +
  facet_grid(topic ~ praise.stigma, labeller = label_wrap_gen(width = 20)) +
  theme_bw() +
  theme(legend.position = "bottom",
      strip.text.y = element_text(angle = 0), 
      strip.background = element_rect(colour = NA), 
      panel.border = element_blank()) 
```

```{r}
plot(praise.stigma.plot)
```

```{r, fig.width=10}
baseline.data %>% 
  select(matches("^(praise|stigma)_[^_]+_scale")) %>% 
  gather(key = key, value = response) %>% 
  separate(key, c("praise.stigma", "topic"), "_", extra = "merge") %>% 
  separate(topic, c("topic", "question.group"), "_scale") %>% 
  filter(!is.na(response)) %>% 
  ggplot(aes(response)) +
  geom_freqpoly(aes(y = ..density.., color = topic), binwidth = 1) +
  scale_x_continuous("Scale", breaks = 1:10) +
  scale_color_discrete("") +
  # facet_grid(topic ~ praise.stigma) +
  facet_wrap(~ praise.stigma) +
  labs(title = "Praise and Stigma Scale", y = "Density") +
  theme(legend.position = "bottom")
```

## Baseline Pre-RCT Experience with Deworming Questions

```{r, fig.width=10, fig.height=5}
baseline.data %>% 
  select(treated, family_treated) %>% 
  gather(key = who.treated) %>% 
  mutate(who.treated = if_else(who.treated == "treated", "self", "family")) %>% 
  count(who.treated, value) %>% 
  group_by(who.treated) %>% 
  mutate(n = n/sum(n)) %>%
  ungroup %>% 
  ggplot(aes(value)) +
  geom_col(aes(y = n)) +
  coord_flip() +
  facet_wrap(~ who.treated) +
  labs(title = "Ever been dewormed before?", x = "", y = "Proportion")
```
```{r}
baseline.data %>% 
  select(family_treated, who_treated) %>% 
  filter(family_treated == "yes") %>% 
  ggplot(aes(who_treated)) +
  geom_bar(aes(y = ..count../sum(..count..))) +
  coord_flip() +
  labs(title = "Who in family got dewormed?", x = "", y = "Proportion")
```

```{r}
baseline.data %>% 
  select(treated, treated_when) %>% 
  filter(treated == "yes") %>% 
  ggplot(aes(treated_when)) +
  geom_bar(aes(y = ..count../sum(..count..))) +
  coord_flip() +
  labs(title = "When did you last get dewormed?", x = "", y = "Proportion")
```

```{r, fig.width=10, fig.height=5}
baseline.data %>% 
  select(treated, family_treated, treated_where, where_family_treated) %>% 
  gather(key = who.treated, value = value, -c(treated, family_treated)) %>% 
  mutate(who.treated = if_else(who.treated == "treated_where", "self", "family")) %>% 
  filter((who.treated == "self" & treated == "yes") | (who.treated == "family" & family_treated == "yes")) %>% 
  count(who.treated, value) %>% 
  group_by(who.treated) %>% 
  mutate(n = n/sum(n)) %>% 
  ungroup %>% 
  ggplot(aes(value)) +
  geom_col(aes(y = n)) +
  coord_flip() +
  facet_wrap(~ who.treated) +
  labs(title = "Where last get dewormed?", x = "", y = "Proportion")
```

## Endline Treatment Questions

```{r}
endline.data %>% 
  select(know_deworm) %>% 
  ggplot(aes(know_deworm)) +
  geom_bar(aes(y = ..count../sum(..count..))) +
  coord_flip() +
  labs(x = "", y = "Proportion", title = "Know of community-based MDA?")
```

```{r}
endline.data %>% 
  select(treat_begin, treat_end, days_available) %>% 
  gather %>%
  mutate(key = case_when(.$key == "treat_begin" ~ "begin date", 
                         .$key == "treat_end" ~ "end date", 
                         TRUE ~ "days available")) %>% 
  count(key, value) %>% 
  group_by(key) %>% 
  mutate(n = n/sum(n)) %>% 
  ungroup %>% 
  ggplot(aes(value)) +
  geom_col(aes(y = n)) +
  # geom_bar(aes(y = ..count../sum(..count..), group = key)) +
  coord_flip() +
  facet_wrap(~ key, ncol = 1) +
  labs(x = "", y = "Proportion", title = "Know ...?")
```

```{r mda-dates-dist}
day1.wave1 <- as_date("2016-10-03")
day12.wave1 <- day1.wave1 + days(11)
day1.wave2 <- as_date("2016-10-24")
day12.wave2 <- day1.wave2 + days(11)

wave.dates <- tribble(~ wave, ~ begin.end, ~ day,
                      1,      "begin",     day1.wave1,
                      1,      "end",       day12.wave1,
                      2,      "begin",     day1.wave2,
                      2,      "end",       day12.wave2)

mda.dates.dist.plot <- endline.data %>% 
  select(wave, treat_begin, treat_end, treat_begin_date, treat_end_date) %>% 
  gather(key = begin.end, value = day, -c(treat_begin, treat_end, wave)) %>% 
  mutate(begin.end = if_else(begin.end == "treat_begin_date", "begin", "end")) %>% 
  filter((treat_begin == "knows" & begin.end == "begin") | (treat_end == "knows" & begin.end == "end")) %>% 
  ggplot(aes(day)) +
  geom_histogram(aes(fill = begin.end), alpha = 0.75,  binwidth = 2, position = "identity") +
  geom_vline(aes(xintercept = as.numeric(day)), linetype = "dotted", data = wave.dates) +
  # scale_x_date(breaks = c(day1.wave1, day12.wave1, day1.wave2, day12.wave2)) + 
  scale_x_date("", date_breaks = "4 weeks", date_minor_breaks = "1 week", limits = c(as_date("2016-09-05"), as_date("2016-11-28"))) +
  scale_y_continuous("") +
  scale_fill_discrete("", labels = c("Begin Date", "End Date")) +
  facet_wrap(~ wave, scales = "free_x", labeller = as_labeller(. %>% sprintf("Wave %s", .))) +
  labs(title = "When did the MDA begin and end?", caption = "Dotted vertical lines identify correct MDA start and end days.") +
  theme(legend.position = "bottom")
```

```{r, fig.width=10, fig.height=6}
plot(mda.dates.dist.plot)
```

```{r}
endline.data %>% 
  select(treat_days) %>% 
  ggplot(aes(treat_days)) +
  geom_bar(aes(y = ..count../sum(..count..))) +
  scale_x_continuous(breaks = seq(0, 40, 2)) +
  labs(x = "", y = "Proportion", title = "How many deworming days?")
```

```{r}
know.bel.cat.plot("find_out", .baseline.data = NULL) + labs(title = "From whom did you hear about MDA?")
```

```{r}
know.bel.cat.plot("chv_visit", .baseline.data = NULL) + labs(title = "Did a CHV visit you to tell you about MDA?")
```

## Beliefs

```{r survey-of-10-beliefs, warning=FALSE}
survey.takeup.beliefs.10 <- list(Baseline = baseline.data, Endline = endline.data) %>% 
  compact %>% 
  map_df(select, one_of("dworm_rate", "ink_dworm_rate"), .id = "survey.type") %>% 
  gather(key = incentive, value = value, -survey.type) %>% 
  mutate(incentive = if_else(incentive == "dworm_rate", "None", "Ink")) %>% 
  ggplot(aes(value)) +
  geom_freqpoly(aes(y = ..density.., color = survey.type, linetype = incentive), binwidth = 1) +
  scale_x_continuous("Reported Rate", breaks = 0:10) +
  scale_y_continuous("Density") +
  scale_color_discrete("") +
  scale_linetype_manual("Signal", values = c("dashed", "solid")) +
  theme(legend.position = "bottom") +
  labs(title = "How many out of 10 will come for deworming?")
```

```{r}
plot(survey.takeup.beliefs.10)
```

```{r survey-of-10-beliefs-boxplot, warning=FALSE}
survey.takeup.beliefs.10.boxplot <- list(Baseline = baseline.data, Endline = endline.data) %>% 
  compact %>% 
  map_df(select, one_of("dworm_rate", "ink_dworm_rate"), .id = "survey.type") %>% 
  gather(key = incentive, value = value, -survey.type) %>% 
  mutate(incentive = if_else(incentive == "dworm_rate", "None", "Ink")) %>% 
  ggplot(aes(survey.type)) +
  geom_boxplot(aes(y = value, color = survey.type, linetype = incentive), position = position_dodge(width = 1.1)) +
  scale_y_continuous("Reported Rate", breaks = 0:10) +
  scale_color_discrete("") +
  scale_x_discrete("") +
  coord_flip() +
  scale_linetype_manual("Signal", values = c("dotdash", "solid")) +
  theme(legend.position = "bottom", axis.text.y = element_blank(), axis.ticks.y = element_blank()) 
```

```{r}
plot(survey.takeup.beliefs.10.boxplot)
```

```{r, fig.width=10}
endline.data %>% 
  filter(sms.treatment == "sms.control", sms.ctrl.subpop == "non.phone.owner") %>% 
  ggplot(aes(x = dworm_rate)) + 
  geom_freqpoly(aes(y = ..density.., color = dist.pot.group), binwidth = 1) +
  scale_x_continuous("Rate", breaks = 1:10) +
  scale_y_continuous("Density") +
  scale_color_discrete("") +
  facet_wrap(~ assigned.treatment) +
  theme(legend.position = "bottom") +
  labs(title = "How many out of 10 will come for deworming?")
```

```{r, fig.width=10}
endline.data %>% 
  filter(sms.treatment != "sms.control" | sms.ctrl.subpop == "phone.owner") %>% 
  mutate(sms.treated = if_else(sms.treatment != "sms.control", "treated", "control")) %>% 
  ggplot(aes(x = assigned.treatment)) +
  geom_boxplot(aes(y = dworm_rate, color = dist.pot.group)) +
  facet_wrap(~ sms.treated) +
  scale_color_discrete("") +
  theme(legend.position = "bottom") 
```

```{r}
reg.endline.beliefs <- endline.data %>% 
  filter(sms.treatment != "sms.control" | sms.ctrl.subpop == "phone.owner", sms.treatment != "reminder.only") %>% 
  mutate(sms.treated = if_else(sms.treatment != "sms.control", "treated", "control"),
         sms.treatment = factor(sms.treatment) %>% relevel(ref = "social.info")) %>% 
  group_by(sms.treated, assigned.treatment, dist.pot.group) %>% 
  do(filter(., !is_outlier(.$dworm_rate))) %>% 
  ungroup %>% 
  run_strat_reg(dworm_rate ~ assigned.treatment * dist.pot.group * sms.treatment, .strat.by = "county", .cluster = "cluster.id", .covariates = c("school", "floor", "ethnicity"))

reg.endline.beliefs %>% 
  tidy %>% 
  kable(digits = 4)
```

```{r, echo=TRUE}
linear_tester(reg.endline.beliefs, 
              # Effect of incentive
              c("far",
                "far + far:sms.control",
                "ink",
                "ink + ink:sms.control",
                "ink + ink:far",
                "ink + ink:far + ink:far:sms.control + ink:sms.control",
                "calendar",
                "calendar + calendar:sms.control",
                "calendar + calendar:far",
                "calendar + calendar:far + calendar:far:sms.control + calendar:sms.control",
                "bracelet",
                "bracelet + bracelet:sms.control",
                "bracelet + bracelet:far",
                "bracelet + bracelet:far + bracelet:far:sms.control + bracelet:sms.control",
                # Effect of distance
                "calendar:far + calendar:far:sms.control + far:sms.control + far",
                "calendar:far + far",
                "bracelet:far + bracelet:far:sms.control + far:sms.control + far",
                "bracelet:far + far",
                # Effect of SMS
                "- sms.control",
                "- sms.control - far:sms.control",
                "- ink:sms.control - ink:far:sms.control - far:sms.control - sms.control",
                "- ink:sms.control - sms.control",
                "- ink:sms.control - ink:far:sms.control - far:sms.control - sms.control",
                "- calendar:sms.control - sms.control",
                "- calendar:sms.control - calendar:far:sms.control - far:sms.control - sms.control",
                "- bracelet:sms.control - sms.control",
                "- bracelet:sms.control - bracelet:far:sms.control - far:sms.control - sms.control")) %>% 
  kable(digits = 4)
```

```{r}
baseline.data %>% 
  select(dworm_proportion) %>% 
  ggplot(aes(dworm_proportion)) +
  geom_bar() + 
  labs(title = "How many will come for deworming?", x = "")
```

```{r}
baseline.data %>% 
  select(ink_more_less) %>% 
  ggplot(aes(ink_more_less)) +
  geom_bar() +
  coord_flip() +
  labs(title = "More/less will come with ink?", x = "")
```

# Treatment Assignment

## Balance

```{r, echo=TRUE}
balance.test.covar <- c(setdiff(reg.covar, "ethnicity"), "ethnicity2", "dist.pot.group")

balance.data <- analysis.data %>%  
  filter(!sms.treated, #!hh.baseline.sample, 
         !is.na(floor), !is.na(school), !is.na(dist.pot.group), !is.na(ethnicity)) %>% 
  # anti_join(outlier.cells, c("assigned.treatment", "sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  select_(.dots = c("assigned.treatment", balance.test.covar)) %>%
  mlogit.data(choice = "assigned.treatment", shape = "wide") 

# Had to remove "floor" so the covariates are non-singular
unrestricted.reg <- balance.data %>% 
  mlogit::mlogit(as.formula(sprintf("assigned.treatment ~ 0 | %s", paste(balance.test.covar, collapse = " + "))), data = .)

restricted.reg <- balance.data %>% 
  na.omit %>% 
  mlogit::mlogit(assigned.treatment ~ 0 | 1, data = .) 

waldtest(unrestricted.reg, restricted.reg) 
```

## Distance Assignment

```{r actual-distance-distribution}
analysis.data %>%  
  # anti_join(unmonitored.outlier.clusters, c("sms.treatment", "cluster.id")) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id")) %>% 
  mutate(assigned.treatment = fct_relabel(assigned.treatment, str_to_title)) %>% 
  ggplot(aes(dist.to.pot)) +
  geom_density(aes(color = dist.pot.group, linetype = "Household")) +
  geom_density(aes(color = dist.pot.group, linetype = "Cluster Center"), 
               data = mutate(village.centers, assigned.treatment = fct_relabel(assigned.treatment, str_to_title))) +
  geom_vline(xintercept = c(1250), linetype = "dashed") +
  labs(y = "Density", 
       caption = "Cluster centers were calculated as the centroid location of all households in cluster.") +
  scale_x_continuous("Distance to Treatment Location (meters)", breaks = seq(0, 10000, 2500/4)) +
  scale_color_discrete("Cluster Distance Assignment", labels = c("Close", "Far")) +
  scale_linetype_discrete("Distance From") +
  theme(legend.position = "bottom") +
  facet_wrap(~ assigned.treatment)
```

# Incentive Preferences

## Self-reported Preferences (Endline)

```{r reported-gift-pref}
reported.gift.pref.plot <- analysis.data %>% 
  plot.pref.unfaceted() +
  facet_wrap(~ assigned.treatment, nrow = 1) 
```

```{r, fig.width=8}
plot(reported.gift.pref.plot)
```
 
```{r reported-gift-pref-deworm}
reported.gift.pref.dewormed.plot <- analysis.data %>% 
  mutate(dewormed.any = if_else(dewormed.any, "Dewormed", "Not Dewormed")) %>% 
  plot.pref.unfaceted("dewormed.any") +
  facet_grid(dewormed.any ~ assigned.treatment)
```

```{r, fig.width=8}
plot(reported.gift.pref.dewormed.plot)
```

```{r other-incentive-plot}
other.have.incentive.plot <- analysis.data %>% 
  # filter(!is.na(gift_choice), monitored, !hh.baseline.sample, !is.na(hh_cal) | !is.na(hh_bracelet), 
  filter(!is.na(gift_choice), monitored, !is.na(hh_cal) | !is.na(hh_bracelet), 
         assigned.treatment %in% c("calendar", "bracelet")) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id")) %>% 
  gather(hh.incentive, hh_switch, hh_cal, hh_bracelet) %>% 
  filter(!is.na(hh_switch)) %>% 
  mutate(dewormed.any = if_else(dewormed.any, "Dewormed", "Not Dewormed"),
         assigned.treatment = fct_relabel(assigned.treatment, . %>% paste("arm")),
         hh_switch = fct_recode(hh_switch, "Other in household got calendar" = "yes", "No one else got calendar" = "no")) %>%
  group_by(dewormed.any, assigned.treatment, hh_switch) %>% 
  mutate(arm.size = n()) %>% 
  group_by(gift_choice, add = TRUE) %>%
  summarize(pref.prop = n() / first(arm.size)) %>% 
  ungroup %>%
  mutate_at(vars(gift_choice, assigned.treatment), funs(fct_relabel(., str_to_title))) %>% 
  ggplot(aes(gift_choice, pref.prop)) +
  geom_col(aes(fill = assigned.treatment), position = "dodge", alpha = 0.5, color = alpha("black", 0.5)) +
  scale_fill_discrete("") +
  labs(x = "Gift", y = "Preferred Proportion", title = "Endline Gift Preference", 
       subtitle = "Split by whether household has other calendar(s) and deworming take-up") +
  facet_grid(dewormed.any ~ assigned.treatment + hh_switch) +
  theme(legend.position = "bottom")
```

```{r, fig.width=8}
plot(other.have.incentive.plot)
```

The above graph seems to show that conditional on someone in my household having a calendar my desire for a calendar is the same regardless of whether I already have a calendar (i.e. dewormed) or do not have a calendar (i.e. not dewormed). Meaning one of my household members obtaining a calendar is the same as me having a calendar. However, given that overall the calendar is still prefered to the bracelet (see graph before that combines households) since for most households (70$\%$ in data) no one came for deworming, I think we are fine. Ideally, we find a way to control for these negatively correlated preferences in calendars but not sure how exactly.

Irrespective of whether you have come for deworming or not, or whether someone else has come for deworming in your households and received a bracelet, you still prefer the calendar. People just did not really like the bracelets! If you did not come for deworming you value a bracelet the same regardless of whether no one else in your household has a bracelet or someone else has a bracelet. There is no evidence for a complementarity or substitution here. 

```{r, fig.width=8}
analysis.data %>% 
  filter(monitored, !is.na(cal_value), assigned.treatment == "calendar") %>% # !hh.baseline.sample, 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id")) %>% 
  mutate(dewormed.any = if_else(dewormed.any, "Dewormed", "Not Dewormed"),
         cal_value = fct_recode(cal_value, "Would still want a calendar" = "yes", "Would not want another calendar" = "no")) %>% 
  group_by(dewormed.any) %>% 
  mutate(arm.size = n()) %>% 
  group_by(cal_value, add = TRUE) %>%
  summarize(pref.prop = n() / first(arm.size)) %>% 
  ungroup %>% 
  ggplot(aes(cal_value, pref.prop)) +
  geom_col(alpha = 0.5, color = "black") +
  labs(x = "", y = "Proportion", title = "Would Still Value a Calendar When Household Already Has One", 
       subtitle = "Split by deworming take-up", 
       caption = "This is only for the calendar arm of the experiment and limited to those who reported other calendar(s) in their household") +
  facet_wrap(~ dewormed.any, ncol = 1) + 
  coord_flip()
```

```{r, fig.width=8}
analysis.data %>% 
  filter(monitored, !is.na(cal_value), !is.na(gift_choice), assigned.treatment == "calendar") %>% #  !hh.baseline.sample,
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id")) %>% 
  mutate(dewormed.any = if_else(dewormed.any, "Dewormed", "Not Dewormed"),
         cal_value = fct_recode(cal_value, "Would still want a calendar" = "yes", "Would not want another calendar" = "no")) %>% 
  group_by(dewormed.any, cal_value) %>% 
  mutate(arm.size = n()) %>% 
  group_by(gift_choice, add = TRUE) %>%
  summarize(pref.prop = n() / first(arm.size)) %>% 
  ungroup %>% 
  ggplot(aes(gift_choice, pref.prop)) +
  geom_col(alpha = 0.5, color = "black") +
  labs(x = "Gift", y = "Preferred Proportion", title = "Endline Gift Preference", 
       subtitle = "Split by stated desire for another calendar if household already has one and deworming take-up", 
       caption = "This is only for the calendar arm of the experiment and limited to those who reported other calendar(s) in their household") +
  facet_grid(dewormed.any ~ cal_value)
```

## Willing-to-Pay Survey

```{r first-choice-wtp}
first.choice.wtp.plot <- wtp.data %>% 
  filter(!is.na(first_choice)) %>% 
  ggplot() +
  geom_bar(aes(fct_relabel(first_choice, str_to_title), y = ..count../sum(..count..)), alpha = 0.5) +
  labs(x = "First Choice", title = "First Choice When Offered Calendars or Bracelets", 
       caption = "This data is for the control arm only.") +
  scale_y_continuous("Proportion", breaks = seq(0, 1, 0.1)) +
  coord_flip()
```

```{r, fig.height=2, fig.width=8}
plot(first.choice.wtp.plot)
```

```{r}
wtp.data %>% 
  ggplot() +
  geom_bar(aes(factor(price), y = ..count../sum(..count..)), alpha = 0.5, color = "black") +
  labs(x = "Offered Price (KSh)", y = "Proportion", title = "Prices Offered to Switch Gift Choice")
```

```{r switch-price-wtp}
switch.price.plot <- wtp.data %>% 
  filter(!is.na(first_choice)) %>%
  mutate(first_choice = fct_relabel(first_choice, str_to_title) %>% fct_relevel("Calendar")) %>% 
  group_by(first_choice, price) %>% 
  summarize(prop.switch = sum(second_choice == "switch")/n()) %>% 
  ungroup %>% 
  ggplot(aes(factor(price), prop.switch, group = first_choice)) +
  geom_point(aes(color = first_choice)) +
  geom_line(aes(color = first_choice)) +
  scale_color_discrete("First Choice") +
  labs(x = "Offered Price (KSh)", y = "Proportion", title = "Proportion Switching From First Choice of Gift When Offered Cash",
       caption = "This data is for the control arm only.")
```

```{r}
plot(switch.price.plot)
```

## Estimate utility difference Calendar Bracelet 
```{r}
wtp.cdf <- wtp.data %>%
  select(first_choice,second_choice,price) %>%
  filter(is.na(first_choice) == FALSE) 

table(wtp.cdf$first_choice)
n_bra <- sum(wtp.cdf$first_choice == "bracelet") 
n_cal <- sum(wtp.cdf$first_choice == "calendar") 
N <- nrow(wtp.cdf)
bra_share <- n_bra/N
cal_share <- n_cal/N  

wtp.cdf$price[wtp.cdf$first_choice == "bracelet"] <-  - wtp.cdf$price[wtp.cdf$first_choice == "bracelet"]

wtp.cdf <- wtp.cdf %>% 
  group_by(price) %>% 
  summarise(N = n(), Pr = sum(second_choice == "switch")/N)
wtp.cdf$tot_pr <- NA
wtp.cdf$tot_pr[wtp.cdf$price >0] <- bra_share + cal_share*wtp.cdf$Pr[wtp.cdf$price >0]
wtp.cdf$tot_pr[wtp.cdf$price <0] <- bra_share - bra_share*wtp.cdf$Pr[wtp.cdf$price <0]

ggplot(wtp.cdf, aes(x=price,y=tot_pr)) + 
  geom_point() + 
  geom_line() +
  scale_x_continuous(breaks = seq(-100, 100, 10)) +
  theme_bw() 
```

Note: -60 KSH is a problem of small sample in bracelet choice group. should not observe 100 percent switching. Median u diff approx. KSH 40. Next step: fit CDF through LPM incl. possibly interaction terms. 

```{r, eval=FALSE}
wtp.data %>%
  select(first_choice, second_choice, price) %>%
  filter(!is.na(first_choice)) %>% 
  mutate(choice.calendar = first_choice == if_else(second_choice == "keep", "calendar", "bracelet"),
         price = (1 - 2 * (first_choice == "calendar")) * price) %>% 
  bind_rows(mutate(., 
                      choice.calendar = first_choice == "calendar",
                      price = 0)) %>% 
  group_by(price) %>% 
  summarize(prop.pref.calendar = mean(choice.calendar),
            price.prop = n() / nrow(.)) %>% 
  ungroup %>%
  arrange(price) %>% 
  mutate(cumul.prop.pref.calendar = cumsum(prop.pref.calendar * price.prop) / cumsum(price.prop)) %>% 
  arrange(desc(price)) %>% 
  mutate(cumul.prop.pref.bracelet = cumsum((1 - prop.pref.calendar) * price.prop) / cumsum(price.prop)) %>% 
  gather(ref.incentive, cumul.prop.pref, cumul.prop.pref.bracelet, cumul.prop.pref.calendar) %>% 
  # mutate(#price = (1 - 2 * (ref.incentive == "cumul.prop.pref.bracelet")) * price,
  #        cumul.prop.pref = if_else(ref.incentive == "cumul.prop.pref.bracelet", 1 - cumul.prop.pref, cumul.prop.pref)) %>% 
  ggplot(aes(factor(price), cumul.prop.pref, group = ref.incentive)) +
  geom_point() +
  geom_line() +
  scale_y_continuous(breaks = seq(0, 1, 0.05)) +
  labs(x = "Offered Price (KSh)", y = "Proportion", title = "Proportion Switching From First Choice of Gift When Offered Cash",
       caption = "This data is for the control arm only.")
```


# Reduced Form Analysis

Below is the main regression specification, where $m(\cdot)$ and $m_1(\cdot)\dots m_{K-1}(\cdot)$ define the causal effects to estimate. $l(\cdot)$ and $l_1(\cdot)\dots l_{K-1}(\cdot)$ define the regression intercepts.

$$
\begin{equation}
Y_{ij} = \left(m(Z_j, B_j;\theta) + l(B_j;\theta^c) + X_{ij} \cdot \beta \right) \cdot \frac{B_{j}(K)}{N(K)/N} + \sum_{k=1}^{K-1} \left(m_k(Z_j, B_j;\theta_k) + l_k(B_j;\theta^c_k) + X_{ij} \cdot \beta_k \right) \cdot \left( B_{j}(k) - B_{j}(K)\cdot\frac{N(k)}{N(K)} \right) + \varepsilon_{ij} 
\end{equation}
$$

* $i$ indexes individuals, $j$ clusters
* $Z_j \in \mathcal{Z} = \{ control, ink, calendar, bracelet \}$
* $Z_{j}(z) = \mathbf{1}\{Z_j = z\}$ is an indicator for whether cluster $j$ was assigned treatment $z$.
* $B_{j}(k) = \mathbf{1}\{B_j = k\}$ is an indicator for whether cluster $j$ is in stratum $k$.
* $X_{ij}$ is a vector of covariates
* $N(k)$ is the size of stratum $k$.
* $N$ is the total number of observations

To clarify what this stratified regression is doing, we are estimating
$$
\tau(z) = E[Y_{ij}(z) - Y_{ij}(control)] = \sum_{k = 1}^{K} \frac{N(k)}{N} E[Y_{ij}(z) - Y_{ij}(control)|B_j = k]
$$

## Basic Treatment Effect

Here we estimate the below models

$$
\begin{align}
m(Z_j, B_j; \theta) &= \sum_{z \in \mathcal{Z}\setminus \{control\}} \tau(z) \cdot Z_{j}(z)  \\
m_k(Z_j, B_j; \theta_k) &= \sum_{z\in \mathcal{Z}\setminus\{control\}} \tau_k(z) \cdot Z_{j}(z) \\
l(B_j; \theta^c) &= \alpha \\
l_k(B_j; \theta^c_k) &= \alpha_k
\end{align}
$$
where $E[Y_{ij}(z) - Y_{ij}(control)] = \tau(z)$ and $E[Y_{ij}(control)] = \alpha$.

### Regression Without Controls

Without controls (using cluster robust standard errors). ~~Also including block bootstrapped standard errors and p-values.~~

```{r}
# analysis.data %>%
#   filter(monitored, monitor.consent, sms.treatment == "sms.control", !hh.baseline.sample) %>%
#   anti_join(outlier.clusters, c("sms.treatment", "cluster.id")) %>%
#   unite(stratum, county, dist.pot.group, sep = ".") %>%
#   select(dewormed.any, assigned.treatment, stratum, cluster.id) %>%
#   na.omit %>%
#   group_by(stratum) %>%
#   mutate(stratum.weight = n()) %>%
#   ungroup %>%
#   lm(dewormed.any ~ assigned.treatment, data = ., weights = stratum.weight)
```

```{r basic-reg}
reg.output.sms.ctrl <- analysis.data %>%  
  filter(monitored, sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment, 
                .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)
    
reg.output.sms.ctrl.calendar <- analysis.data %>%
  filter(monitored, sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>%
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate( assigned.treatment = relevel(assigned.treatment, ref = "calendar")) %>%
  run_strat_reg(dewormed.any ~ assigned.treatment, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)
```

```{r}
kable(tidy(reg.output.sms.ctrl), digits = 4)
```

With _calendar_ as the omitted category, to estimate $E[Y_{ij}(bracelet) - Y_{ij}(calendar)] = \tau(bracelet) - \tau(calendar)$:

```{r}
reg.output.sms.ctrl.calendar %>% 
  tidy %>% 
  filter(term %in% c("(intercept)", "bracelet")) %>% 
  kable(digits = 4)
```

```{r, fig.width=10}
prep.sms.ctrl.plot.data(reg.output.sms.ctrl) %>% {
  plot.sms.ctrl.takeup(.) +
    labs(subtitle = "Without control")
}
```

Running regression without controls with the same sample as in the with controls regression:

```{r, warning=FALSE}
reg.output.sms.ctrl.lim <- analysis.data %>%  
  filter(monitored, sms.treatment == "sms.control", !is.na(floor)) %>% #  !hh.baseline.sample,
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)
    
reg.output.sms.ctrl.calendar.lim <- analysis.data %>%
  filter(monitored, sms.treatment == "sms.control", !is.na(floor)) %>% #  !hh.baseline.sample,
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate( assigned.treatment = relevel(assigned.treatment, ref = "calendar")) %>%
  run_strat_reg(dewormed.any ~ assigned.treatment, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)

kable(tidy(reg.output.sms.ctrl.lim), digits = 4)
```

With _calendar_ as the omitted category for the limited sample, to estimate $E[Y_{ij}(bracelet) - Y_{ij}(calendar)] = \tau(bracelet) - \tau(calendar)$:

```{r}
reg.output.sms.ctrl.calendar.lim %>% 
  tidy %>% 
  filter(term %in% c("(intercept)", "bracelet")) %>% 
  kable(digits = 4)
```

```{r, fig.width=10}
prep.sms.ctrl.plot.data(reg.output.sms.ctrl.lim) %>% {
  plot.sms.ctrl.takeup(.) +
    labs(subtitle = "Without control restricted sample")
}
```

We can see that changes in coefficients e.g. for ink are not due to controlling for covariates but due to the fact that we are restricting the sample to only those individuals for whom we observe covariates. Let's talk about this and make a decision on how to best move forward. Commonly from presentations I remember people working with one sample - without controls - and then showing same sample with controls to show "show" that coefficient estimates do not change and increase precision.

```{r}
neyman.results <- analysis.data %>% 
  filter(monitored, sms.treatment == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id")) %>% 
  analyze.neyman.blk.bs(0) %>% 
  # analyze.neyman.blk.bs(2000) %>% 
  mutate(treatment.group = factor(treatment.group, levels = c("control", "ink", "calendar", "bracelet")))  
```

```{r}
# neyman.results %>% 
#   unnest(treatment.data) %>%
#   filter(rhs.treatment.group == "control" | (treatment.group == "bracelet" & rhs.treatment.group == "calendar")) %>% 
#   select(treatment.group, rhs.treatment.group, ate, ate_sd, ate_tstat, ate_pvalue) %>% 
#   kable(digits = 4)
```

Stratum-level deworming probability

```{r}
neyman.results %>% 
  unnest(strata.data) %>% 
  select(county, dist.pot.group, treatment.group, stratum.assign.mean.dewormed) %>% 
  spread(treatment.group, stratum.assign.mean.dewormed) %>% 
  kable(digits = 4)
```

```{r}
neyman.results %>% 
  unnest(treatment.data) %>% 
  distinct(treatment.group, .keep_all = TRUE) %>% 
  ggplot() +
    geom_col(aes(treatment.group, treatment.mean.dewormed), alpha = 0.5, color = "black") +
    scale_x_discrete("Treatment") +
    scale_y_continuous("Proportion Dewormed", breaks = seq(0, 0.6, 0.05))
```

```{r}
neyman.results %>% 
  unnest(strata.data) %>% 
  ggplot(aes(factor(treatment.group))) +
  geom_col(aes(y = stratum.assign.mean.dewormed), color = "black", alpha = 0.5) +
  scale_x_discrete("") +
  scale_y_continuous("Take-up Proportion") +
  facet_grid(county ~ dist.pot.group)
```

Strata sizes:

```{r}
neyman.results %>% 
  unnest(strata.data) %>% 
  select(county, dist.pot.group, treatment.group, stratum.assign.size) %>% 
  spread(treatment.group, stratum.assign.size) %>% 
  kable()
```

#### Unmonitored

```{r unmon-basic-reg}
unmon.reg.output.sms.ctrl <- analysis.data %>%  
  filter(sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * mon_status, .strat.by = "county_dist_stratum", .cluster = "cluster.id")

restrict.unmon.reg.output.sms.ctrl <- analysis.data %>%  
  filter(sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment + mon_status, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)
```

```{r}
kable(tidy(unmon.reg.output.sms.ctrl), digits = 4)
```

```{r}
linear_tester(unmon.reg.output.sms.ctrl, str_c(c("ink", "calendar", "bracelet"), "unmonitored", sep = ":"), joint = TRUE) %>% 
  tidy() %>% 
  kable()
```

```{r}
kable(tidy(restrict.unmon.reg.output.sms.ctrl), digits = 4)
```

```{r, fig.width=10}
prep.sms.ctrl.plot.data(unmon.reg.output.sms.ctrl) %>% {
  plot.sms.ctrl.takeup(.) +
    labs(subtitle = "Entire census sample, without controls")
}
```

```{r, fig.width=10}
prep.sms.ctrl.plot.data(restrict.unmon.reg.output.sms.ctrl) %>% {
  plot.sms.ctrl.takeup(.) +
    labs(subtitle = "Entire census sample, without controls, restricted")
}
```

### Regression With Controls

These are the regressors we are controlling for:
```{r covariates}
reg.covar 
```

```{r basic-reg-covar}
reg.output.sms.ctrl.covar <- analysis.data %>%  
  filter(monitored, sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = reg.covar)

reg.output.sms.ctrl.calendar.covar <- analysis.data %>%  
  filter(monitored, sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate( assigned.treatment = relevel(assigned.treatment, ref = "calendar")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = reg.covar)
```

```{r}
kable(tidy(reg.output.sms.ctrl.covar, .include_covar = FALSE), digits = 4)
```

```{r, echo=FALSE}
reg.output.sms.ctrl.calendar.covar %>% 
  tidy %>% 
  filter(term %in% c("(intercept)", "bracelet")) %>% 
  kable(digits = 4)
```

Some linear hypotheses testing:
```{r}
c("bracelet - calendar - ink") %>% 
  linear_tester(reg.output.sms.ctrl.covar, .) %>% 
  kable(digits = 4)
```

```{r, fig.width=10}
prep.sms.ctrl.plot.data(reg.output.sms.ctrl.covar) %>% {
  plot.sms.ctrl.takeup(.) 
}
```

#### Unmonitored

```{r restrict-unmon-basic-reg-covar}
unmon.reg.output.sms.ctrl.covar <- analysis.data %>%  
  filter(sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * mon_status, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = reg.covar)

restrict.unmon.reg.output.sms.ctrl.covar <- analysis.data %>%  
  filter(sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment + mon_status, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = reg.covar)
```

```{r}
kable(tidy(restrict.unmon.reg.output.sms.ctrl.covar), digits = 4)
```

```{r, fig.width=10}
prep.sms.ctrl.plot.data(unmon.reg.output.sms.ctrl.covar) %>% {
  plot.sms.ctrl.takeup(.) +
    labs(subtitle = "Entire census sample, with controls")
}
```

```{r, fig.width=10}
prep.sms.ctrl.plot.data(restrict.unmon.reg.output.sms.ctrl.covar) %>% {
  plot.sms.ctrl.takeup(.) +
    labs(subtitle = "Entire census sample, with controls, restricted")
}
```

## Phone Ownership

### Regression Without Controls

```{r, echo=2:3}
reg.output.nonphone <- analysis.data %>%  
  filter(monitored, sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.ctrl.subpop, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)

kable(tidy(reg.output.nonphone), digits = 4)
```

Testing the joint hypothesis of difference in impact between phone and non-phone owners:
```{r}
linear_tester(reg.output.nonphone, c("ink:phone.owner", "calendar:phone.owner", "bracelet:phone.owner"), joint = TRUE) %>% 
  tidy %>% 
  kable(digits = 4)
```

```{r, fig.width=12}
prep.sms.ctrl.plot.data(reg.output.nonphone, .interact.with = "phone.owner") %>% 
  mutate(grp = factor(grp, levels = c("ref.grp", "compare.grp"), labels = c("Non Phone Owners", "Phone Owners"))) %>% {
  plot.sms.ctrl.takeup(., .facet.formula = grp ~ ref.treatment) +
    labs(subtitle = "Without controls") 
}
```

### Regression With Controls

```{r}
reg.output.nonphone.covar <- analysis.data %>%  
  filter(monitored, sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% #, sms.ctrl.subpop == "non.phone.owner") %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.ctrl.subpop, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = setdiff(reg.covar, "sms.ctrl.subpop"))

kable(tidy(reg.output.nonphone.covar, .include_covar = FALSE), digits = 4)
```

Testing the joint hypothesis of difference in impact between phone and non-phone owners:
```{r}
linear_tester(reg.output.nonphone.covar, c("ink:phone.owner", "calendar:phone.owner", "bracelet:phone.owner"), joint = TRUE) %>% 
  tidy %>% 
  kable(digits = 4)
```

```{r, fig.width=12}
prep.sms.ctrl.plot.data(reg.output.nonphone.covar, .interact.with = "phone.owner") %>% 
  mutate(grp = factor(grp, levels = c("ref.grp", "compare.grp"), labels = c("Non Phone Owners", "Phone Owners"))) %>% {
  plot.sms.ctrl.takeup(., .facet.formula = grp ~ ref.treatment) +
    labs(subtitle = "With controls") 
}
```

```{r}
# neyman.results.no.phone <- analysis.data %>% 
#   filter(monitored, monitor.consent, sms.treatment == "sms.control", sms.ctrl.subpop == "non.phone.owner") %>% 
#   anti_join(outlier.clusters, c("sms.treatment", "cluster.id")) %>% 
#   analyze.neyman.blk.bs(0) %>% 
#   mutate(treatment.group = factor(treatment.group, levels = c("control", "ink", "calendar", "bracelet")))  
```

```{r}
# neyman.results.no.phone %>% 
#   unnest(treatment.data) %>%
#   filter(rhs.treatment.group == "control") %>% 
#   select(treatment.group, ate, ate_sd, ate_tstat, ate_pvalue) %>% 
#   kable(digits = 4)
```

## Treatment Effect Heterogeneity by Distance to PoT

### Discrete Distance Categories

Here we estimate the below models

$$
\begin{align}
m(Z_j, D_j, B_j; \theta) &= \sum_{z \in \mathcal{Z}\setminus \{control\}} Z_{j}(z) \cdot \left\{\tau(z) + \lambda(z) \cdot D_j \right\} + \delta \cdot D_j  \\
m_k(Z_j, D_j, B_j; \theta) &= \sum_{z \in \mathcal{Z}\setminus \{control\}}Z_{j}(z)  \cdot \left\{\tau_k(z) + \lambda_k(z) \cdot  D_j \right\} + \delta_k \cdot D_j 
\end{align}
$$

where $D_j$ is an indicator for whether cluster $j$ is in the _far_ group. The average treatment estimated here is
$$
E[Y_{ij}(z) - Y_{ij}(control) | D_j = d] = \tau(z) + \lambda(z)\cdot d 
$$

#### Regression Without Controls

```{r dist-reg}
reg.output.sms.ctrl.dist <- analysis.data %>%  
  filter(monitored, sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar)
```

```{r}
kable(tidy(reg.output.sms.ctrl.dist), digits = 4)
```

The below test the null hypotheses of no effect of incentives in the _far_ strata, $E[\tau(z) - \tau(control)| D_j = 1]$. 
```{r}
c("ink", "calendar", "bracelet") %>% 
  map(~ paste0(., c("", ":far"))) %>%
  map(~ paste(., collapse = " + ")) %>% 
  linear_tester(reg.output.sms.ctrl.dist, .) %>% 
  kable(digits = 4)
```

The below test the null hypotheses of no interaction between distance and incentives, $E[\tau(z)|D_j = 1] - E[\tau(z)| D_j = 0]$. Since distance was exogenous assigned to clusters this can also be stated as $E[\tilde{\tau}(z, 1) - \tilde{\tau}(z, 0)]$. 
```{r}
c("ink", "calendar", "bracelet") %>% 
  map(~ c(paste0(., ":far"), "far")) %>%
  map(~ paste(., collapse = " + ")) %>% 
  linear_tester(reg.output.sms.ctrl.dist, .) %>% 
  kable(digits = 4)
```

```{r, fig.width=12}
prep.sms.ctrl.plot.data(reg.output.sms.ctrl.dist, .interact.with = "far") %>% 
  mutate(grp = factor(grp, levels = c("ref.grp", "compare.grp"), labels = c("Close", "Far"))) %>% {
  plot.sms.ctrl.takeup(., .facet.formula = grp ~ ref.treatment) +
    labs(subtitle = "Without controls") 
}
```

Running regression without controls with the same sample as in the with controls regression:

```{r}
reg.output.sms.ctrl.dist.lim <- analysis.data %>%  
  filter(monitored, sms.treatment == "sms.control", !is.na(floor)) %>% #  !hh.baseline.sample,
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar)

kable(tidy(reg.output.sms.ctrl.dist), digits = 4)
```

```{r, fig.width=12}
prep.sms.ctrl.plot.data(reg.output.sms.ctrl.dist.lim, .interact.with = "far") %>% 
  mutate(grp = factor(grp, levels = c("ref.grp", "compare.grp"), labels = c("Close", "Far"))) %>% {
  plot.sms.ctrl.takeup(., .facet.formula = grp ~ ref.treatment) +
    labs(subtitle = "Without controls restricted sample") 
}
```
We can see that most of the changes that we observe between the regression without and with control are due to change in sample.

<!-- Below is the corresponding estimation using Neyman's approach: -->

```{r}
neyman.results.dist <- analysis.data %>% 
  filter(monitored, sms.treatment == "sms.control") %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  analyze.neyman.blk.bs(0, .interact.with = "dist.pot.group") %>% 
  mutate(treatment.group = factor(treatment.group, levels = c("control", "ink", "calendar", "bracelet"))) 
```


Stratum-level deworming probability:

```{r}
neyman.results.dist %>% 
  unnest(strata.data) %>% 
  select(county, dist.pot.group, treatment.group, stratum.assign.mean.dewormed) %>% 
  unite(incentive_dist, treatment.group, dist.pot.group) %>% 
  spread(incentive_dist, stratum.assign.mean.dewormed) %>% 
  kable(digits = 4)
```

```{r}
neyman.results.dist %>% 
  unnest(treatment.data) %>% 
  distinct(treatment.group, dist.pot.group, .keep_all = TRUE) %>% 
  ggplot(.) +
  geom_col(aes(treatment.group, treatment.mean.dewormed, fill = dist.pot.group), alpha = 0.5, color = "black", position = "dodge") +
  scale_x_discrete("Treatment") +
  scale_y_continuous("Proportion Dewormed", breaks = seq(0, 0.6, 0.05)) +
  scale_fill_discrete("Distance to PoT")
```

Strata sizes:

```{r}
neyman.results %>% 
  unnest(strata.data) %>% 
  select(county, dist.pot.group, treatment.group, stratum.assign.size) %>% 
  unite(incentive_dist, treatment.group, dist.pot.group) %>% 
  spread(incentive_dist, stratum.assign.size) %>% 
  kable()
```

##### Unmonitored

```{r unmon-dist-reg}
unmon.reg.output.sms.ctrl.dist <- analysis.data %>%  
  filter(sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * dist.pot.group * mon_status, .strat.by = "county", .cluster = "cluster.id")

restrict.unmon.reg.output.sms.ctrl.dist <- analysis.data %>%  
  filter(sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ (assigned.treatment + mon_status) * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar)
```

```{r}
kable(tidy(unmon.reg.output.sms.ctrl.dist), digits = 4)
```

```{r}
linear_tester(unmon.reg.output.sms.ctrl.dist, 
              c("ink", "calendar", "bracelet") %>% 
                c(., str_c(., "far", sep = ":")) %>% 
                str_c("unmonitored", sep = ":"), 
              joint = TRUE) %>% 
  tidy() %>% 
  kable()
```

```{r}
kable(tidy(restrict.unmon.reg.output.sms.ctrl.dist), digits = 4)
```

```{r, fig.width=12}
prep.sms.ctrl.plot.data(unmon.reg.output.sms.ctrl.dist, .interact.with = "far") %>% 
  mutate(grp = factor(grp, levels = c("ref.grp", "compare.grp"), labels = c("Close", "Far"))) %>% {
  plot.sms.ctrl.takeup(., .facet.formula = grp ~ ref.treatment) +
    labs(subtitle = "Entire census sample, without controls") 
}
```

```{r, fig.width=12}
prep.sms.ctrl.plot.data(restrict.unmon.reg.output.sms.ctrl.dist, .interact.with = "far") %>% 
  mutate(grp = factor(grp, levels = c("ref.grp", "compare.grp"), labels = c("Close", "Far"))) %>% {
  plot.sms.ctrl.takeup(., .facet.formula = grp ~ ref.treatment) +
    labs(subtitle = "Entire census sample, without controls, restricted") 
}
```

```{r}
c("ink", "calendar", "bracelet") %>% 
  map(~ paste0(., c("", ":far"))) %>%
  map(~ paste(., collapse = " + ")) %>%
  c("(intercept) + far", .) %>% 
  linear_tester(unmon.reg.output.sms.ctrl.dist, .) %>% 
  kable(digits = 4)
```

#### Regression With Controls

```{r dist-reg-covar}
reg.output.sms.ctrl.dist.covar <- analysis.data %>%  
  filter(monitored, sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = reg.covar)
```

```{r}
kable(tidy(reg.output.sms.ctrl.dist.covar, .include_covar = FALSE), digits = 4)
```

```{r, fig.width=12}
prep.sms.ctrl.plot.data(reg.output.sms.ctrl.dist.covar, .interact.with = "far") %>% 
  mutate(grp = factor(grp, levels = c("ref.grp", "compare.grp"), labels = c("Close", "Far"))) %>% {
  plot.sms.ctrl.takeup(., .facet.formula = grp ~ ref.treatment) +
    labs(subtitle = "With controls") 
}
```

##### Unmonitored

```{r umon-dist-reg-covar}
unmon.reg.output.sms.ctrl.dist.covar <- analysis.data %>%  
  filter(sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * mon_status * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = reg.covar)

restrict.unmon.reg.output.sms.ctrl.dist.covar <- analysis.data %>%  
  filter(sms.treatment.2 == "sms.control") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ (assigned.treatment + mon_status) * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = reg.covar)
```

```{r}
kable(tidy(restrict.unmon.reg.output.sms.ctrl.dist.covar), digits = 4)
```

```{r, fig.width=12}
prep.sms.ctrl.plot.data(unmon.reg.output.sms.ctrl.dist.covar, .interact.with = "far") %>% 
  mutate(grp = factor(grp, levels = c("ref.grp", "compare.grp"), labels = c("Close", "Far"))) %>% {
  plot.sms.ctrl.takeup(., .facet.formula = grp ~ ref.treatment) +
    labs(subtitle = "Entire census sample, without controls") 
}
```

```{r, fig.width=12}
prep.sms.ctrl.plot.data(restrict.unmon.reg.output.sms.ctrl.dist.covar, .interact.with = "far") %>% 
  mutate(grp = factor(grp, levels = c("ref.grp", "compare.grp"), labels = c("Close", "Far"))) %>% {
  plot.sms.ctrl.takeup(., .facet.formula = grp ~ ref.treatment) +
    labs(subtitle = "Entire census sample, without controls, restricted") 
}
```

### Continuous Distance Measure

```{r}
reg.output.sms.ctrl.cont.dist <- analysis.data %>%  
  filter(monitored, sms.treatment == "sms.control") %>% #, !hh.baseline.sample) %>% 
  mutate(dist.to.pot = dist.to.pot / 1000, # Convert to km
         dist.to.pot.2 = dist.to.pot^2) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * (dist.to.pot + dist.to.pot.2), .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar)

kable(tidy(reg.output.sms.ctrl.cont.dist), digits = 4)
```

```{r}
reg.output.sms.ctrl.cont.dist %>% 
  # linear_tester(c("ink:dist.to.pot + ink",
  #                 "calendar:dist.to.pot + calendar",
  #                 "bracelet:dist.to.pot + bracelet",
  #                 "bracelet - ink",
  #                 "bracelet:dist.to.pot + bracelet - ink:dist.to.pot - ink",
  #                 "1.5 * bracelet:dist.to.pot + bracelet - 1.5 * ink:dist.to.pot - ink",
  #                 "2 * bracelet:dist.to.pot + bracelet - 2 * ink:dist.to.pot - ink",
  #                 "2.5 * bracelet:dist.to.pot + bracelet - 2.5 * ink:dist.to.pot - ink")) %>% 
  linear_tester(c("bracelet:dist.to.pot + bracelet:dist.to.pot.2 + bracelet - calendar:dist.to.pot - calendar:dist.to.pot.2 - calendar",
                  "2 * bracelet:dist.to.pot + 4 * bracelet:dist.to.pot.2 + bracelet - 2 * calendar:dist.to.pot - 4 * calendar:dist.to.pot.2 - calendar",
                  "2.5 * bracelet:dist.to.pot + 6.25 * bracelet:dist.to.pot.2 + bracelet - 2.5 * calendar:dist.to.pot - 6.25 * calendar:dist.to.pot.2 - calendar")) %>% 
  kable(digits = 4)
```
```{r}
reg.output.sms.ctrl.cont.dist.lim <- analysis.data %>%  
  filter(monitored, sms.treatment == "sms.control", floor!="NA") %>% #  !hh.baseline.sample,
  mutate(dist.to.pot = dist.to.pot / 1000, # Convert to km
         dist.to.pot.2 = dist.to.pot^2) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * (dist.to.pot + dist.to.pot.2), .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar)

kable(tidy(reg.output.sms.ctrl.cont.dist), digits = 4)
```

## SMS Treatment

Here we estimate the below models

$$
\begin{align}
m(Z_j, M_{ij}, B_j; \theta) &= \sum_{z \in \mathcal{Z}\setminus \{control\}}Z_{j}(z) \cdot \left\{\tau(z) + \sum_{s \in \mathcal{M}\setminus \{none\}} \gamma(z, s) \cdot M_{ij}(s) \right\} + \sum_{s\in\mathcal{M}\setminus \{none\}} \psi(s) \cdot M_{ij}(s)  \\
m_k(Z_j, M_{ij}, B_j; \theta) &= \sum_{z \in \mathcal{Z}\setminus \{control\}} Z_{j}(z) \cdot \left\{\tau_k(z) + \sum_{s \in \mathcal{M}\setminus \{none\}} \gamma_k(z, s)\cdot M_{ij}(s) \right\} + \sum_{s\in\mathcal{M}\setminus \{none\}} \psi_k(s) \cdot M_{ij}(s)  
\end{align}
$$
where $M_{ij}(s)$ is an indicator of whether individual $i$ received SMS treatment $s$.

The average treatment estimated here is
$$
E[Y_{ij}(z, s) - Y_{ij}(z', s')] = \tau(z) - \tau(z') + \gamma(z, s) - \gamma(z', s') + \psi(s) - \psi(s'),
$$
where 

* $\tau(control) = \psi(none) = 0$ 
* $z = control \lor s = none \implies \gamma(z, s) = 0$

The following table simplifies thinking about the analysis outlined in the pre-analysis plan:

|                             | Control  | Ink | Calendar | Bracelet |
|-----------------------------|:--------:|:---:|:--------:|:--------:|
| No SMS                      | 1        | 4   | 6        | 8        |
| Reminder SMS                | 2        | --- | ---      | ---      |
| Reminder  + Peer Info SMS   | 3        | 5   | 7        | 9        |

:Cluster-level Treatment Arms\label{tab:cluster-level-treatment}

### Regression Without Controls

```{r sms-reg}
reg.output.sms.treat <- analysis.data %>%  
  filter(monitored, sms.treated | have_phone == "Yes") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  # mutate(sms.treatment = factor(sms.treatment, levels = c("sms.control", "reminder.only", "social.info"))) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.treatment, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)
```

```{r}
reg.output.sms.treat %>% tidy %>% kable(digits = 4)
```

Some linear hypotheses testing:
```{r}
c("ink", "calendar", "bracelet") %>% 
                  map(~ c(paste0(., ":social.info"), "social.info")) %>% 
                  map(~ paste(., collapse = " + ")) %>% 
                  c("bracelet - calendar",
                    "ink - social.info", 
                    "bracelet - calendar - calendar:social.info - social.info",
                    "ink + ink:social.info", 
                    "bracelet - calendar + bracelet:social.info - calendar:social.info") %>% 
  linear_tester(reg.output.sms.treat, .) %>% 
  kable(digits = 4)
```

Running regression without controls with the same sample as in the with controls regression:

```{r}
reg.output.sms.treat.lim <- analysis.data %>%  
  filter(monitored, sms.treated | have_phone == "Yes", !is.na(floor)) %>% # !hh.baseline.sample,
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  # mutate(sms.treatment = factor(sms.treatment, levels = c("sms.control", "reminder.only", "social.info"))) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.treatment, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)

reg.output.sms.treat %>% tidy %>% kable(digits = 4)
```

Some linear hypotheses testing:
```{r}
c("ink", "calendar", "bracelet") %>% 
                  map(~ c(paste0(., ":social.info"), "social.info")) %>% 
                  map(~ paste(., collapse = " + ")) %>% 
                  c("bracelet - calendar",
                    "ink - social.info", 
                    "bracelet - calendar - calendar:social.info - social.info",
                    "ink + ink:social.info", 
                    "bracelet - calendar + bracelet:social.info - calendar:social.info") %>% 
  linear_tester(reg.output.sms.treat.lim, .) %>% 
  kable(digits = 4)
```


```{r}
neyman.results.sms <- analysis.data %>% 
  filter(monitored, sms.treated | sms.ctrl.subpop == "phone.owner") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  analyze.neyman.blk.bs(0, .treatment = c("assigned.treatment", "sms.treatment")) %>% 
  separate(treatment.group, c("assigned.treatment", "sms.treatment"), sep = "_") %>% 
  mutate(assigned.treatment = factor(assigned.treatment, levels = c("control", "ink", "calendar", "bracelet")))  
```

```{r}
# neyman.results.sms %>% 
#   unnest(treatment.data) %>% 
#   filter(str_detect(rhs.treatment.group, paste0("^", assigned.treatment)) | 
#            (assigned.treatment == "ink" & str_detect(rhs.treatment.group, "^control")) |
#            (assigned.treatment == "bracelet" & str_detect(rhs.treatment.group, "^calendar"))) %>% 
#          # sms.treatment != "sms.control") %>% 
#   select(assigned.treatment, sms.treatment, rhs.treatment.group, ate, ate_sd, ate_tstat, ate_pvalue) %>% 
#   kable
```

```{r, fig.height=6}
prep.sms.treat.plot.data(reg.output.sms.treat) %>% { 
  plot.sms.treat.takeup(.) +
    labs(subtitle = "Without Controls") 
}
```
```{r, fig.height=6}
prep.sms.treat.plot.data(reg.output.sms.treat.lim) %>% { 
  plot.sms.treat.takeup(.) +
    labs(subtitle = "Without Controls restricted sample") 
}
```


```{r, echo=FALSE, fig.align="center"}
# neyman.results.sms %>% 
#   unnest(strata.data) %>% 
#   select(assigned.treatment, sms.treatment, county, dist.pot.group, treatment.mean.dewormed) %>% 
#   distinct(assigned.treatment, sms.treatment, .keep_all = TRUE) %>% 
#   mutate(sms.treatment = factor(sms.treatment, 
#                                 levels = c("sms.control", "reminder.only", "social.info"),
#                                 labels = c("None", "Reminder Only", "Social Information")),
#          assigned.treatment = factor(str_to_title(assigned.treatment))) %>% 
#   ggplot(aes(sms.treatment, treatment.mean.dewormed)) +
#   geom_point(aes(color = assigned.treatment, group = assigned.treatment)) +
#   geom_line(aes(color = assigned.treatment, group = assigned.treatment, linetype = "Including Reminder-Only"), 
#             data = . %>% filter(assigned.treatment == "Control")) +
#   geom_line(aes(color = assigned.treatment, group = assigned.treatment, linetype = "Excluding Reminder-Only"), 
#             data = . %>% filter(sms.treatment != "Reminder Only")) +
#   scale_x_discrete("Text Messaging Treatment") +
#   scale_y_continuous("Take-up Proportion", breaks = seq(0, 0.7, 0.025)) +
#   scale_color_discrete("Incentive Treatment") +
#   scale_linetype_manual("", values = c("solid", "dashed"))
```

Stratum-level deworming probability:

```{r}
neyman.results.sms %>% 
  unnest(strata.data) %>% 
  select(county, dist.pot.group, assigned.treatment, sms.treatment, stratum.assign.mean.dewormed) %>% 
  unite(incentive_sms, assigned.treatment, sms.treatment) %>% 
  spread(incentive_sms, stratum.assign.mean.dewormed) %>% 
  kable(digits = 4)
```

Strata sizes:

```{r}
neyman.results %>% 
  unnest(strata.data) %>% 
  select(county, dist.pot.group, treatment.group, stratum.assign.size) %>% 
  spread(treatment.group, stratum.assign.size) %>% 
  kable()
```

#### Unmonitored

```{r unmon-sms-reg}
unmon.reg.output.sms.treat <- analysis.data %>%  
  filter(sms.treated | have_phone == "Yes") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.treatment * mon_status, 
                .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)

restrict.unmon.reg.output.sms.treat <- analysis.data %>%  
  filter(sms.treated | have_phone == "Yes") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.treatment + mon_status, 
                .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = census.reg.covar)
```

```{r}
kable(tidy(restrict.unmon.reg.output.sms.treat), digits = 4)
```

```{r, fig.height=6}
prep.sms.treat.plot.data(unmon.reg.output.sms.treat) %>% { 
  plot.sms.treat.takeup(.) +
    labs(subtitle = "Without Controls, full census sample") 
}
```

```{r, fig.height=6}
prep.sms.treat.plot.data(restrict.unmon.reg.output.sms.treat) %>% { 
  plot.sms.treat.takeup(.) +
    labs(subtitle = "Without Controls, full census sample, restricted") 
}
```

### Regression With Controls

```{r sms-reg-covar}
reg.output.sms.treat.covar <- analysis.data %>%  
  filter(monitored, sms.treated | have_phone == "Yes") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.treatment, .strat.by = "county_dist_stratum", .cluster = "cluster.id", .covariates = setdiff(reg.covar, "sms.ctrl.subpop"))
```

```{r}
reg.output.sms.treat.covar %>% 
  tidy(.include_covar = FALSE) %>% 
  kable(digits = 4)
```

Some linear hypotheses testing:
```{r}
c("ink", "calendar", "bracelet") %>% 
  map(~ c(paste0(., ":social.info"), "social.info")) %>% 
  map(~ paste(., collapse = " + ")) %>% 
  c("bracelet - calendar",
    "ink - social.info", # What's this?
    "reminder.only - social.info",
    "bracelet - calendar - calendar:social.info - social.info",
    "ink + ink:social.info", 
    "bracelet - calendar + bracelet:social.info - calendar:social.info",
    "calendar:social.info - ink:social.info",
    "bracelet:social.info - calendar:social.info") %>% 
  linear_tester(reg.output.sms.treat.covar, .) %>% 
  kable(digits = 4)
```

```{r, fig.height=6}
prep.sms.treat.plot.data(reg.output.sms.treat.covar) %>% {
  plot.sms.treat.takeup(.) +
    labs(subtitle = "With Controls") 
} 
```

#### Unmonitored

No unmonitored phone owners with endline covariates

## Take-Up and SMS Treatment Over Time

```{r, echo=TRUE}
after.msg.days <- seq(3, 11, 2)

time.takeup.data <- analysis.data %>%  
  filter(monitored, !is.na(sms.treatment)) %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  group_by(assigned.treatment, sms.treatment) %>% 
  mutate(n.total = n()) %>% 
  group_by(dewormed.day.any, add = TRUE) %>% 
  summarize(takeup.prop = sum(dewormed.any) / first(n.total)) %>% 
  ungroup %>% 
  arrange(assigned.treatment, dewormed.day.any) %>% 
  group_by(assigned.treatment, sms.treatment) %>% 
  mutate(cumul.takeup.prop = cumsum(takeup.prop)) %>% 
  ungroup 
```

```{r, fig.width=10}
time.takeup.data %>% 
  filter(sms.treatment == "sms.control") %>% {
    plot.takeup.dynamics(.) +
      labs(y = "Take-up Proportion", title = "Daily Take-up Proportions", subtitle = "No SMS treatment group")
  }
```


```{r, echo=TRUE, fig.width=14}
time.takeup.data %>% { 
  plot.takeup.dynamics(., .aes = aes(dewormed.day.any, takeup.prop, color = assigned.treatment, linetype = sms.treatment)) +
    scale_linetype_manual("", values = c("dotted", "solid", "dashed")) +
    labs(y = "Take-up Proportion", title = "Daily Take-up Proportions", subtitle = "Split by SMS treatment assignment") +
    facet_wrap(~ assigned.treatment, scales = "free_y")
}
```

```{r, fig.width=10, echo=TRUE}
social.info.data %>% {
  plot.takeup.dynamics(., .aes = aes(dewormed.day, cumul, color = assigned.treatment, linetype = dist.pot.group, shape = "social.info")) +
    # geom_ribbon(aes(ymin = qant.25, ymax = qant.75, fill = assigned.treatment, color = NULL), alpha = 0.15) +
    labs(y = "Reported Take-up", title = "Social Information Reported to Subjects", subtitle = "Split by incentive treatment") +
    scale_shape_manual(values = 1, guide = "none") +
    scale_linetype_discrete("Distance from PoT", labels = c("Close", "Far"))
}
```

```{r, fig.width=14, echo=TRUE}
time.takeup.data %>% 
  transmute(dewormed.day = dewormed.day.any, 
            cumul = cumul.takeup.prop, 
            assigned.treatment, 
            sms.treatment) %>% 
  bind_rows(takeup = ., 
            social.info = distinct(social.info.data, dewormed.day, assigned.treatment, .keep_all = TRUE) %>% 
              rename(cumul = all.dist.cumul), 
            .id = "cumul.type") %>% 
  mutate(cumul.type.sms = if_else(!is.na(sms.treatment), paste(cumul.type, sms.treatment, sep = "-"), cumul.type)) %>% {
    plot.takeup.dynamics(., .aes = aes(dewormed.day, cumul, color = assigned.treatment, linetype = cumul.type.sms, shape = cumul.type)) +
      scale_linetype_manual("", labels = c("Reported Take-up", "Reminders Only", "No SMS", "Social Info SMS"), values = c("dotdash", "dotted", "solid", "dashed")) +
      scale_shape_manual("", labels = c("Reported Take-up", "Take-up"), values = c(1, 16)) +
      labs(y = "", title = "Comparing Reported and Observed Take-up", subtitle = "Split by SMS treatment group") +
      facet_wrap(~ assigned.treatment)
}
```

## SMS Treatment And Distance to PoT

Here we estimate the below models

$$
\begin{align}
m(Z_j, M_{ij}, B_j; \theta) &= \sum_{z \in \mathcal{Z}\setminus \{control\}} Z_{j}(z) \cdot \left\{\tau(z) + \lambda(z) \cdot D_j + \sum_{s \in \mathcal{M}\setminus \{none\}} M_{ij}(s) \cdot \left[ \gamma(z, s) + \rho(z, s) \cdot D_j \right] \right\} \\
&+ \sum_{s\in\mathcal{M}\setminus \{none\}}  M_{ij}(s) \left\{\psi(s) + \pi(s) \cdot D_j \right\}  + \delta \cdot D_j  \\
m_k(Z_j, M_{ij}, B_j; \theta) &= \sum_{z \in \mathcal{Z}\setminus \{control\}} Z_{j}(z) \cdot \left\{\tau_k(z) + \lambda_k(z) \cdot D_j + \sum_{s \in \mathcal{M}\setminus \{none\}} M_{ij}(s) \cdot \left[ \gamma_k(z, s) + \rho_k(z, s) \cdot D_j \right] \right\} \\ 
&+ \sum_{s\in\mathcal{M}\setminus \{none\}}  M_{ij}(s) \left\{\psi_k(s) + \pi_k(s) \cdot D_j \right\}  + \delta_k \cdot D_j  \\
\end{align}
$$

$$
\begin{align}
E[Y_{ij}(z, s) - Y_{ij}(z', s')|D_j = d] &= \tau(z) - \tau(z') + \left\{\lambda(z) - \lambda(z')\right\} \cdot d + \left\{\rho(z,s) - \rho(z', s')\right\} \cdot d + \gamma(z, s) - \gamma(z', s') \\
&+ \psi(s) - \psi(s') + \left\{\pi(s) - \pi(s')\right\}\cdot d,
\end{align}
$$

### Regression Without Controls

```{r dist-sms-reg}
reg.output.sms.treat.dist <- analysis.data %>%  
  filter(monitored, sms.treated | have_phone == "Yes") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.treatment * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar)
```

```{r}
reg.output.sms.treat.dist %>% tidy %>% kable(digits = 4)
```

```{r, fig.width=12}
prep.sms.treat.dist.plot.data(reg.output.sms.treat.dist) %>% {
  plot.sms.treat.takeup(.) +
    facet_wrap(~ dist) +
    labs(subtitle = "Split by distance to PoT, without controls")
}
```

Running regression without controls with the same sample as in the with controls regression:

```{r}
reg.output.sms.treat.dist.lim <- analysis.data %>%  
  filter(monitored, sms.treated | have_phone == "Yes", !is.na(floor)) %>% #  !hh.baseline.sample,
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.treatment * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar)

reg.output.sms.treat.dist.lim %>% tidy %>% kable(digits = 4)
```
```{r, fig.width=12}
prep.sms.treat.dist.plot.data(reg.output.sms.treat.dist.lim) %>% {
  plot.sms.treat.takeup(.) +
    facet_wrap(~ dist) +
    labs(subtitle = "Split by distance to PoT, without controls restricted sample")
}
```

#### Unmonitored

```{r unmon-dist-sms-reg}
unmon.reg.output.sms.treat.dist <- analysis.data %>%  
  filter(sms.treated | have_phone == "Yes") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.treatment * mon_status * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar)

restrict.unmon.reg.output.sms.treat.dist <- analysis.data %>%  
  filter(sms.treated | have_phone == "Yes") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  run_strat_reg(dewormed.any ~ (assigned.treatment * sms.treatment + mon_status) * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar)
```

```{r}
restrict.unmon.reg.output.sms.treat.dist %>% tidy %>% kable(digits = 4)
```

```{r, fig.width=12}
prep.sms.treat.dist.plot.data(unmon.reg.output.sms.treat.dist) %>% {
  plot.sms.treat.takeup(.) +
    facet_wrap(~ dist) +
    labs(subtitle = "Split by distance to PoT, without controls, entire census sample")
}
```

```{r, fig.width=12}
prep.sms.treat.dist.plot.data(restrict.unmon.reg.output.sms.treat.dist) %>% {
  plot.sms.treat.takeup(.) +
    facet_wrap(~ dist) +
    labs(subtitle = "Split by distance to PoT, without controls, entire census sample, restricted")
}
```

### Regression With Controls

```{r sms-dist-reg-covar, echo=2}
reg.output.sms.treat.dist.covar <- analysis.data %>%  
  filter(monitored, sms.treated | have_phone == "Yes") %>% #, !hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2) %>% 
  run_strat_reg(dewormed.any ~ assigned.treatment * sms.treatment * dist.pot.group, .strat.by = "county", .cluster = "cluster.id", .covariates = setdiff(reg.covar, "sms.ctrl.subpop"))
```

```{r}
reg.output.sms.treat.dist.covar %>% 
  tidy(.include_covar = FALSE) %>% 
  kable(digits = 4)
```

```{r, fig.width=12}
prep.sms.treat.dist.plot.data(reg.output.sms.treat.dist.covar) %>% {
  plot.sms.treat.takeup(.) +
    facet_wrap(~ dist) +
    labs(subtitle = "Split by distance to PoT, with controls")
}
```

Some linear hypotheses testing:
```{r}
  linear_tester(reg.output.sms.treat.dist.covar, 
                c("(intercept) + bracelet",
                  "(intercept) + bracelet + far + bracelet:far",
                  "(intercept) + bracelet +  social.info + bracelet:social.info",
                  "(intercept) + bracelet + far + bracelet:far + bracelet:social.info + social.info + bracelet:social.info:far + social.info:far")) %>% 
  kable(digits = 4)
```

##### Unmonitored

No unmonitored phone owners with endline covariates

# Multiple Hypothesis Adjustment (False Discovery Control)

```{r}
mht_analysis_formula <- dewormed.any ~ (assigned.treatment * sms.treatment + mon_status) * dist.pot.group * phone_owner

mht_analysis_data <- analysis.data %>% 
  # filter(!hh.baseline.sample) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate(sms.treatment = sms.treatment.2,
         phone_owner = factor(phone_owner, levels = c(TRUE, FALSE), labels = c("phone_owner", "non_phone_owner"))) 

mht_selected_treatment <- get_treatment_map(mht_analysis_data, mht_analysis_formula) %>% 
  filter(mon_status == "monitored") %>% 
  arrange(phone_owner, dist.pot.group, sms.treatment, assigned.treatment) 

mht_treatment_map_dm <- get_treatment_map_design_matrix(mht_analysis_data, mht_analysis_formula, mht_selected_treatment) %>% 
  select_if(~ n_distinct(.x) > 1)

mht_selected_treatment %<>% mutate(ate_id = seq_len(n()))

incentive_ate_hypo <- tribble(
  ~ left, ~ right, # Incentive ATE for all subgroups (phone-ownership + distance from PoT)
     2,      1, 
     3,      1, 
     4,      1,
     4,      3
  ) %>% 
  bind_rows(map_df(c(10, 19, 23), function(offset, original) original + offset - 1, original = .))

sms_ate_hypo <- tribble(
  ~ left, ~ right,
     5,       1,
     6,       1,
     7,       1,
     8,       1,
     9,       1,
     6,       5,
     7,       2,
     8,       3,
     9,       4
  ) %>% 
  bind_rows(. + 10 - 1)

sms_ate_between_hypo <- tribble(
  ~ left1, ~ right1, ~ left2, ~ right2,
     7,       2,        6,        1,
     8,       3,        6,        1,
     9,       4,        6,        1,
     9,       4,        8,        3
) %>% 
  bind_rows(. + 10 - 1)

mht_test_mat <- bind_rows(incentive_ate_hypo, sms_ate_hypo) %$% 
  subtract(mht_treatment_map_dm[left, ], mht_treatment_map_dm[right, ]) 

mht_test_mat <- sms_ate_between_hypo %$% 
  subtract(subtract(mht_treatment_map_dm[left1, ], mht_treatment_map_dm[right1, ]),
           subtract(mht_treatment_map_dm[left2, ], mht_treatment_map_dm[right2, ])) %>% 
  bind_rows(mht_test_mat, .) %>% 
  as.matrix()

# reg_res <- mht_analysis_data %>%
#   run_strat_reg(mht_analysis_formula,
#                 .strat.by = "county", .cluster = "cluster.id", .covariates = census.reg.covar) 
  # tidy() %>%
  # filter(term %in% colnames(test_mat))
```

```{r, eval=FALSE}
mht_results <- mht_analysis_data %>%
  strat_mht(mht_analysis_formula,
            strat_by = "county", cluster = "cluster.id", covar = census.reg.covar,
            hypotheses = mht_test_mat, num_resample = 3500) %>% 
  mutate(hypo_id = seq_len(n())) 

sig_mht_results <- mht_results %>% 
  filter(adj_p_value <= 0.1)

mht_formatted_results <- bind_rows(incentive_ate_hypo, sms_ate_hypo) %>% 
  mutate(hypo_id = seq_len(n())) %>% 
  left_join(mht_selected_treatment, c("left" = "ate_id")) %>% 
  left_join(mht_selected_treatment, c("right" = "ate_id"), suffix = c("_l", "_r")) %>% 
  select(-starts_with("mon_status")) %>% 
  # select(-c(phone_owner_l, dist.pot.group_l, left, right)) 
  inner_join(mht_results, "hypo_id") 
  
mht_between_formatted_results <- sms_ate_between_hypo %>% 
  left_join(select(mht_formatted_results, left, right, hypo_id), c("left1" = "left", "right1" = "right")) %>% 
  left_join(select(mht_formatted_results, left, right, hypo_id), c("left2" = "left", "right2" = "right"), suffix = c("_left", "_right")) %>% 
  mutate(hypo_id = nrow(mht_formatted_results) + seq_len(n())) %>% 
  inner_join(mht_results, "hypo_id") 

save(mht_results, sig_mht_results, mht_formatted_results, mht_between_formatted_results, file = file.path("data", "mht_results.RData"))

# mht_analysis_data %>% 
#   strat_mht(mht_analysis_formula,
#             strat_by = "county", cluster = "cluster.id", covar = census.reg.covar,
#             hypotheses = c("ink", "calendar", "bracelet", "bracelet - calendar"), num_resample = 100)
```

```{r}
mht_formatted_results %>% 
  select(-c(phone_owner_l, dist.pot.group_l, left, right)) %>% 
  kable()
```

```{r}
mht_between_formatted_results %>% 
  select(-starts_with("left"), -starts_with("right")) %>% 
  kable()
```

```{r mht-ate-plot, fig.width=12, fig.height=12}
mht_formatted_results %>% 
  # filter(white_p_value <= 0.1 | hypo_id %in% c(mht_between_formatted_results$hypo_id_left, mht_between_formatted_results$hypo_id_right)) %>% 
  mutate(phone_owner = fct_recode(phone_owner_l, "Phone Owner" = "phone_owner", "Non Phone Owner" = "non_phone_owner"),
         dist.pot.group = fct_relabel(dist.pot.group_l, str_to_title)) %>% 
  mutate_at(vars(starts_with("assigned.treatment")), funs(fct_relabel(., str_to_title))) %>% 
  mutate_at(vars(starts_with("sms.treatment")), funs(fct_recode(., "No SMS" = "sms.control", "Reminder SMS" = "reminder.only", "Social Info SMS" = "social.info"))) %>% 
  mutate(hypo_text = if_else(phone_owner_l == "phone_owner", 
                             sprintf("%s with %s vs. %s with %s [%d]", assigned.treatment_l, sms.treatment_l, assigned.treatment_r, sms.treatment_r, hypo_id),
                             sprintf("%s vs. %s [%d]", assigned.treatment_l, assigned.treatment_r, hypo_id))) %>% 
  select(phone_owner, dist.pot.group, hypo_text, ends_with("p_value"), hypo_id, estimate) %>% 
  gather(p_value_type, p_value, ends_with("p_value")) %>%
  mutate(hypo_text = fct_reorder(hypo_text, hypo_id, .desc = TRUE)) %>% 
  mutate(p_value_type = factor(p_value_type, levels = paste0(c("adj", "unadj", "white"), "_p_value"), c("Adjusted Bootstrap", "Unadjusted Bootstrap", "Unadjusted Cluster Robust"))) %>% 
  ggplot() +
  geom_segment(aes(x = hypo_text, xend = hypo_text, y = 0, yend = 1 - p_value, color = p_value_type)) +
  geom_point(aes(hypo_text, 1 - p_value, color = p_value_type), size = 2) +
  geom_text(aes(x = hypo_text, y = 1.1, label = sprintf("%.2f", estimate)), data = . %>% distinct(hypo_id, .keep_all = TRUE)) +
  # geom_line(aes(hypo_text, p_value, color = p_value_type, group = p_value_type)) +
  geom_hline(yintercept = c(0.9, 0.95), linetype = "dotted") +
  # geom_hline(yintercept = c(0.05, 0.1), linetype = "dotted") +
  scale_y_continuous("1 - p-value", breaks = seq(0, 1, 0.05)) +
  scale_color_discrete("") +
  scale_x_discrete("") +
  coord_flip() +
  theme(legend.position = "bottom") +
  # facet_grid(phone_owner ~ dist.pot.group, scales = "free_y")
  facet_wrap(~ phone_owner + dist.pot.group, scales = "free_y", ncol = 1)
```
```{r sms-ate-diff-plot, fig.width=12, fig.height=4}
mht_between_formatted_results %>% 
  # filter(white_p_value <= 0.1) %>% 
  mutate(hypo_text = sprintf("[%d] vs [%d]", hypo_id_left, hypo_id_right)) %>% 
  select(hypo_text, ends_with("p_value"), hypo_id, estimate) %>% 
  gather(p_value_type, p_value, ends_with("p_value")) %>%
  mutate(hypo_text = fct_reorder(hypo_text, hypo_id, .desc = TRUE)) %>% 
  mutate(p_value_type = factor(p_value_type, levels = paste0(c("adj", "unadj", "white"), "_p_value"), c("Adjusted Bootstrap", "Unadjusted Bootstrap", "Unadjusted Cluster Robust"))) %>% 
  ggplot() +
  geom_segment(aes(x = hypo_text, xend = hypo_text, y = 0, yend = 1 - p_value, color = p_value_type)) +
  geom_point(aes(hypo_text, 1 - p_value, color = p_value_type), size = 2) +
  geom_text(aes(x = hypo_text, y = 1.1, label = sprintf("%.2f", estimate)), data = . %>% distinct(hypo_id, .keep_all = TRUE)) +
  # geom_line(aes(hypo_text, p_value, color = p_value_type, group = p_value_type)) +
  geom_hline(yintercept = c(0.9, 0.95), linetype = "dotted") +
  # geom_hline(yintercept = c(0.05, 0.1), linetype = "dotted") +
  scale_y_continuous("1 - p-value", breaks = seq(0, 1, 0.05)) +
  scale_color_discrete("") +
  scale_x_discrete("") +
  coord_flip() +
  theme(legend.position = "bottom") 
```

# Robustness Checks

## Impact of Surveying

Test if there is a difference in impact in response to surveying. We do this by comparing take-up rates for those receiving a baseline sample and those not. 

```{r, echo=2, eval=FALSE}
reg.output.sms.ctrl.survey.effect <- analysis.data %>%  
  filter(monitored, sms.treatment == "sms.control") %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  mutate_at(vars(hh.baseline.sample, individ.baseline.sample), funs(factor(if_else(., "baseline", "not.baseline")) %>% relevel(ref = "not.baseline"))) %>% 
  mutate(sms.ctrl.subpop = relevel(factor(sms.ctrl.subpop), ref = "phone.owner")) %>% 
  run_strat_reg(dewormed.any ~ hh.baseline.sample * assigned.treatment, 
                .strat.by = "county_dist_stratum", 
                .cluster = "cluster.id",
                .covariates = setdiff(reg.covar, "sms.ctrl.subpop"))
```

```{r, eval=FALSE}
reg.output.sms.ctrl.survey.effect %>% 
  tidy %>% 
  kable(digits = 4)
```

```{r, eval=FALSE}
linear_tester(reg.output.sms.ctrl.survey.effect,
              paste0("baseline + baseline:", c("ink", "calendar", "bracelet"))) %>% 
  kable 
```

```{r, eval=FALSE}
linear_tester(reg.output.sms.ctrl.survey.effect,
              c("baseline", paste0("baseline + baseline:", c("ink", "calendar", "bracelet"))), joint = TRUE) %>% 
  tidy %>% 
  kable(digits = 4) 
```

## SMS Recruitment Effect

```{r}
sms.recruit.reg.res <- analysis.data %>% 
  filter(!is.na(any.sms.reported), sms.treatment %in% c("social.info")) %>% 
  anti_join(outlier.cells, c("assigned.treatment","sms.treatment", "mon_status", "cluster.id", "phone_owner")) %>% 
  run_strat_reg(dewormed.any ~ any.sms.reported * assigned.treatment, 
                .strat.by = "county_dist_stratum", 
                .cluster = "cluster.id",
                .covariates = setdiff(reg.covar, "sms.ctrl.subpop"))

sms.recruit.reg.res %>% 
  tidy %>% 
  kable(digits = 4)
```

```{r}
linear_tester(sms.recruit.reg.res,
              c("no + no:bracelet"))
```


<!-- *The below requires looking at the take-up rates for individuals not providing consent to use their data*. -->

```{r}
# reg.output.sms.ctrl.survey.effect.no.consent <- analysis.data %>%  
#   filter(baseline.sample | (monitored & monitor.consent & sms.treatment == "sms.control")) %>% 
#   anti_join(outlier.clusters, c("sms.treatment", "cluster.id")) %>% 
#   mutate(baseline.sample = factor(if_else(baseline.sample, "baseline", "not.baseline")) %>% relevel(ref = "not.baseline")) %>% 
#   run_strat_reg(dewormed.any ~ assigned.treatment * baseline.sample * sms.ctrl.subpop, .strat.by = c("county", "dist.pot.group"), .cluster = "cluster.id")
# 
# reg.output.sms.ctrl.survey.effect.no.consent %>% 
#   tidy %>% 
#   kable(digits = 4)
```

```{r}
# linear_tester(reg.output.sms.ctrl.survey.effect.no.consent, 
#               paste0(c("", "baseline + ink:", "baseline + calendar:", "baseline + bracelet:"), "baseline"), 
#               joint = TRUE) %>% 
#   tidy %>% 
#   kable
```

## Spillover

Looking into spillover of monitored individuals into neighboring clusters, using name-matching.

```{r}
sp_target_villages <- village.centers %>% 
  convert.to.sp(~ lon + lat, wgs.84) %>% 
  spTransform(kenya.proj4)

sp_pot <- village.centers %>% 
  convert.to.sp(~ pot.lon + pot.lat, wgs.84) %>% 
  spTransform(kenya.proj4)

spillover_matches <- gDistance(sp_target_villages, sp_pot, byid = TRUE) %>% 
  is_weakly_less_than(5000) %>% 
  `diag<-`(FALSE) %>%
  as_tibble() %>% 
  set_names(village.centers$cluster.id) %>% 
  mutate(census_cluster_id = village.centers$cluster.id) %>% 
  gather(takeup_cluster_id, is_neighbor, -census_cluster_id) %>% 
  filter(is_neighbor) %>% 
  mutate(takeup_cluster_id = as.numeric(takeup_cluster_id)) %>% 
  group_by(census_cluster_id) %>% 
  do(name.match.monitored(filter(analysis.data, cluster.id == .$census_cluster_id[1], !name_matched, !dewormed), 
                          filter(takeup.data, cluster.id == .$takeup_cluster_id[1]), 
                          max.cost = 2)) %>% 
  ungroup() %>% 
  left_join(transmute(analysis.data, KEY.individ, assigned.treatment), "KEY.individ") %>% 
  left_join(transmute(takeup.data, KEY.survey.individ, assigned.treatment), c("which.min.name.match.dist" = "KEY.survey.individ"), suffix = c("_census", "_target"))

spillover_matches %>% 
  group_by(assigned.treatment_census, assigned.treatment_target) %>% 
  summarize(num_spillover = sum(dewormed.matched), num_census = n()) %>% 
  group_by(assigned.treatment_census) %>% 
  mutate(num_census = sum(num_census),
    mean_spillover = num_spillover / num_census) %>% 
  filter(!is.na(assigned.treatment_target))
```