index.Rmd

---
title: "COVID-19 US and State Trends"
description: Wear a mask, save a life.
site: 
  distill::distill_website
---

```{r setup, include=FALSE}
# TODO: pull in rt.live, add cumulative CFR, pull pred data from true source
# TODO: reduce spacing between caption and plot
# TODO: Add bar for title + caption: https://stackoverflow.com/questions/42351011/customize-background-color-of-ggtitle
knitr::opts_chunk$set(echo = FALSE)
install.load::install_load(c('distill','dplyr', 'tidyverse', 'ggplot2', 'plotly', 'rmarkdown', 'magrittr', 'RCurl', 'lubridate', 'DT', 'ggthemes', 'zoo', 'glue', 'scales', 'cowplot'))
ggplot2::theme_set(theme_minimal())
```

**Last updated `r format(lubridate::now("US/Pacific"), '%a, %b %d %I:%M %p')`** PST.

See "Analyses" tab for past analyses performed.

# Plot Explanation
Decision makers monitor new cases as they're correlated with deaths 14 days later. Models often predict cumulative deaths, which I feel can hide the recent trends and make it harder to see how much cases are rising or falling. This site is created to show how new cases relates to deaths 14 days later and includes forecasts for deaths. The goal is for this to be a resource. As you see headlines reporting new cases, you can use these plots to put past trends in perspective with current and expected trends.

* **Title:** The chart title shows "current cumulative deaths and [predicted cumulative deaths over next 4 weeks]". CFR is reported as `# Dead/# Confirmed`.
* **Chart Points/Colors:** Blue = 14-day lag of new cases. Light blue correpsonds to the green (present day deaths deaths). Dark blue is just the last 14 days of new cases and doesn't have an associated green dot due to the lag shift. The dark blue can correspond to the dark green (predictions). This helps show how the current trends in new cases relate to the most recent predictions.
* **Chart Lines:** The lines represent a 7-day moving average. The dotted line represents forecasted trend.
* **Forecasts:** The shaded cone represents forecast uncertainty. The gray background shading represents forecast period. As time passes, some light-green dots appear in the forecast period which can help you assess the accuracy of predicted deaths as it relates to observed deaths.

Predictions were pulled from [CDC.gov](https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html#state-forecasts), which pulls them from [CovidHub](https://github.com/reichlab/covid19-forecast-hub). The forecast model used is the "Ensemble" model, which averages across various models in a wisdom-of-the-crowd type prediction.

As predictions are reported in cumulative terms, I subtracted cumulative deaths as of the day of prediction and then divided weekly cumulative predictions by 7 to get a daily prediction. E.g., if cumulative predictions are 1000 on a forecast date of 6/1, and 6/8 predictions are 1200, the weekly prediction is 200 and the daily predictions are 200/7=28.5. I plot the daily prediction. Forecast points are on every Saturday. Forecasts happen weekly on Mondays.

```{r}


read_covid <- function(url, var){
  df <- read_csv(url) %>%
    filter(Country_Region == 'US') %>%
    select(-UID, -FIPS, -iso2, -iso3, -code3, -Admin2, -Lat, -Long_, -Combined_Key) 
  
  # Dead has population, confirmed doesn't
  if (var == 'confirmed'){
    df %<>% gather(date_raw, value, -`Province_State`, -`Country_Region`)  %>%
      mutate(date = mdy(date_raw)) %>%
      group_by(Province_State, date) %>%
      summarize(value = sum(value))
  } else {
    df %<>% gather(date_raw, value, -Population, -`Province_State`, -`Country_Region`)  %>%
    mutate(date = mdy(date_raw)) %>%
    group_by(Province_State, date) %>%
    summarize(value = sum(value),
              Population = sum(Population))
  }
  df[var] = df$value
  df %<>% select(-value)
  
  # Clean column names
  colnames(df) %<>% 
    tolower() %>% 
    str_replace_all(fixed('/'), '_') %>%
    str_replace_all(fixed(' '), '_')
  
  return(df)
}

# Read in data
url_conf = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv'
url_dead = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv'
df <- read_covid(url_conf, 'confirmed') %>%
  left_join(read_covid(url_dead, 'dead')) %>% 
  group_by(province_state) %>%
  arrange(province_state, date) %>%
  mutate(new = (confirmed - lag(confirmed)),
         dead_cum = dead,
         dead = dead - lag(dead),
         new14 = lag(new, 14),
         new_ma = rollmean(new, k=7, fill = NA, align='right'),
         dead_ma = rollmean(dead, k=7, fill=NA, align='right'),
         new14_ma = rollmean(new14, k=7, fill=NA, align='right')) %>%
  filter(date > Sys.Date() - 45) %>%
  ungroup()

df_us = read_covid(url_conf, 'confirmed') %>%
  left_join(read_covid(url_dead, 'dead')) %>% 
  group_by(date) %>%
  summarize(confirmed = sum(confirmed),
            dead = sum(dead),
            population=sum(population)) %>%
  mutate(new = (confirmed - lag(confirmed)),
         dead_cum = dead,
         dead = dead - lag(dead),
         new14 = lag(new, 14),
         new_ma = rollmean(new, k=7, fill=NA, align='right'),
         dead_ma = rollmean(dead, k=7, fill=NA, align='right'),
         new14_ma = rollmean(new14, k=7, fill=NA, align='right')) %>%
  filter(date > Sys.Date() - 60) %>%
  ungroup()

# Rt.live predictions <- https://d14wlfuexuxgcm.cloudfront.net/covid/rt.csv
rt_url <- "https://d14wlfuexuxgcm.cloudfront.net/covid/rt.csv"
states = data.frame(list(state=state.name, region=state.abb)) %>%
  mutate(state = as.character(state), region = as.character(region))
df_rt <- read_csv(rt_url) %>%
  left_join(states, by='region') %>%
  select(date, state, lower_80, mean, upper_80)
# View(df_rt)


# PREDICTED DEATHS
# TODO: Update this to pull from Reichlab directly
# https://github.com/reichlab/covid19-forecast-hub/tree/master/data-processed
# forecast_date = '2020-07-06'
forecast_date = '2020-07-13'
# forecast_date = '2020-07-27'
# forecast_date = '2020-08-03'
# forecast_date = '2020-08-10'
forecast_date = '2020-08-24'
df_preds <-  read_csv(glue('https://www.cdc.gov/coronavirus/2019-ncov/covid-data/files/{forecast_date}-model-data.csv')) %>% 
  filter(model=='Ensemble') %>% #View()
  rename(date = target_week_end_date,
         state=location_name,
         ll = quantile_0.025,
         ul = quantile_0.975) %>%
  mutate(date = ymd(date)) %>%
  arrange(state, date)

process_preds <- function(df_preds, df_tmp, state, forecast_date){
  # subtract the dead_cum on the forecast date from the predictions
  cum_death = df_tmp %>% filter(date == !!forecast_date) %>% pull(dead_cum)
  day_death = df_tmp %>% filter(date == !!forecast_date) %>% pull(dead_ma)
  
  pred_cum_death <- df_preds %>% 
    filter(state == !!state) %>% 
    summarize(max(point)) %>% 
    pull()
  
  df_out <- df_preds %>% 
    filter(state == !!state) %>% 
    filter(str_detect(target, "wk ahead cum death")) %>% # could be inc death
    # data need to be re-calculated to be not in cumulative form
    # subtract the cum_death to get first week's pred adjusted
    # and then subtract pred from prior prediction bc it's predicting cum death
    # divide by 7 because it's predicting cum death for the week
    mutate(
      point = (point - cum_death)/7,
      point = point - lag(point, default=0),
      ll = (ll - cum_death)/7,
      ll = ll - lag(ll, default=0),
      ul = (ul - cum_death)/7,
      ul = ul - lag(ul, default=0)) %>%
    select(date, point, ll, ul) %>%
    # Join the 
    rbind(data.frame(
      list(
        date=forecast_date, 
        point=day_death, 
        ll=day_death, 
        ul=day_death)
    )) %>%
    arrange(date)
  return(list(df_pred=df_out, pred_cum_death=pred_cum_death))
}


df_tmp <- df_us
out <- process_preds(df_preds, df_tmp, state ='National', forecast_date = forecast_date)
df_pred <- out$df_pred
pred_cum_death <- out$pred_cum_death


# cor(df$new14, df$dead, use='complete.obs')


# PLOTLY GRAPH
# ay <- list(
#   tickfont = list(color = "red"),
#   overlaying = "y",
#   side = "right",
#   title = "second y axis"
# )
# 
# colors = palette_pander(2)
# p1 = plot_ly(df_tmp, x=~date) %>%
#   add_trace(y=~new_ma, name='new',
#             mode='lines', type='scatter',
#             line=list(color=colors[1])
#             ) %>%
#   add_trace(y=~new, name=' ', mode='markers', type='scatter', 
#             type='scatter', marker=list(color=colors[1])) %>%
#   subplot()
# p2 = plot_ly() %>%
#   add_trace(x=df_tmp$date, y=df_tmp$dead,
#             name='dead',  mode='lines+markers',
#             line=list(color=colors[2]),
#             marker=list(color=colors[2]),
#             type='scatter') %>%
#   subplot()
# subplot(p1, p2, nrows=2, shareX = T, margin=0.03, heights=c(0.6, 0.4)) %>%
#   layout(
#     title=list(text='Hello', xanchor='left')
#     # annotations = list(text = 'hi', showarrow=F, xref='paper', yref='paper',
#     #   xanchor='right', yanchor='auto', xshift=0, yshift=0, x = 1, y = -0.1)
#     )

```

```{r}
plot_trends <- function(df_tmp, state, df_pred, pred_cum_death, df_rt=NA) {
  colors = palette_pander(6) 
  # Dark green color is a shade of #009e73: https://www.color-hex.com/color/009e73
  dk_green = "#005e45"
  dk_red = "#9e002b"
  rt_red = "#dd5157"
  rt_green = "#3EB23A"
  
  cfr = percent(max(df_tmp$dead_cum)/max(df_tmp$confirmed), accuracy=0.1)
  
  df1 <- df_tmp %>% select(date, new14, new14_ma)
  df2 <- df_tmp %>% select(date, dead, dead_ma)
  df3 <- df_tmp %>% filter(date >= Sys.Date() - 14) %>% 
    select(date, new, new_ma) %>%
    mutate(date = date + 14)
  
  # Dates for plotting
  min_date = min(df1$date)
  pred_date = min(df_pred$date)
  min_pred_date = min(df_pred[2:nrow(df_pred),]$date)
  max_date = max(df_pred$date) + 10
  middle_date = (max_date - min_pred_date)/2 + min_pred_date
  forecast_title = glue('{month(pred_date)}/{day(pred_date)} FORECAST')
  
  # Title: deaths
  curr_dead = df_tmp %>% filter(date == max(df_tmp$date))
  curr_dead_cnt = curr_dead %>% pull(dead_cum)
  curr_date = curr_dead %>% pull(date) 
  
  max_pred = df_pred %>% filter(date == max(df_pred$date)) 
  dead_growth = pred_cum_death - curr_dead_cnt
  
  curr_date_str = glue('{month(curr_date)}/{day(curr_date)}')
  max_date_str = glue('{month(max_date)}/{day(max_date)}')
  title_str = glue('{state}\n{curr_date_str} = {comma(curr_dead_cnt)} [+{comma(dead_growth)} by {max_date_str} = {comma(pred_cum_death)}] CFR: {cfr}')
  
  # New cases
  p <- ggplot() + 
    geom_point(data=df1, aes(x=date, y=new14), alpha=.5, color=colors[1]) +
    geom_line(data=df1, aes(x=date, y=new14_ma), color=colors[1]) + 
    geom_point(data=df3, aes(x=date, y=new), alpha=.5, color=colors[4]) +
    geom_line(data=df3, aes(x=date, y=new_ma), color=colors[4]) +
    scale_x_date(limits = c(min_date, max_date)) + 
    scale_y_continuous(label=comma) + 
    theme(axis.text.x = element_blank(),
          plot.margin = unit(c(0, 0, 0, 0), "cm")) +
    annotate("label", x=min_date, y=max(df_tmp$new), 
             label='14-day Lag of New Cases', hjust=0, vjust=1) +
    labs(x='', y='')  +
    annotate('text', x=max(df3$date), 
             y=df3[df3$date == max(df3$date),]$new,
             hjust = -.3,
             vjust = .2,
             color=colors[4],
             label = comma(df3[df3$date == max(df3$date),]$new))
  
  
  # if (state == 'xxx') { # comment this if you want to have the rt chart
  if (state != 'US') {
    # RT CHART
    df_rt_tmp <- df_rt %>% filter(state == !!state)
    df_rt_tmp_below <- df_rt_tmp %>% 
      mutate(
        mean = ifelse(upper_80 < 1, mean, NA),
        lower_80 = ifelse(upper_80 < 1, lower_80, NA),
        upper_80 = ifelse(upper_80 < 1, upper_80, NA)
      )
    df_rt_tmp_mid <- df_rt_tmp %>% 
      mutate(
        mean = ifelse(lower_80 < 1 & upper_80 > 1, mean, NA),
        lower_80 = ifelse(lower_80 < 1 & upper_80 > 1, lower_80, NA),
        upper_80 = ifelse(lower_80 < 1 & upper_80 > 1, upper_80, NA)
      )
    
    df_rt_tmp_above <- df_rt_tmp %>% 
      mutate(
        mean = ifelse(lower_80 >= 1, mean, NA),
        lower_80 = ifelse(lower_80 >= 1, lower_80, NA),
        upper_80 = ifelse(lower_80 >= 1, upper_80, NA)
      )
    
    # only plot if not 100% missing
    not_missing <- function(df){
      # all of the rows for mean will be missing if conditions aren't met
      sum(is.na(df$mean)) != nrow(df)
    }
    
    # Rt plot start
    p_rt <- ggplot() +
      geom_hline(yintercept=1, color='gray') +
      geom_line(data=df_rt_tmp, aes(x=date, y=mean), color='gray') +
        geom_ribbon(data=df_rt_tmp, aes(x=date, ymin=lower_80, ymax=upper_80), 
                    color=NA, alpha=0.2, fill='gray') 
    
    # only plot if not 100% missing
    if (not_missing(df_rt_tmp_above)){
      p_rt = p_rt + 
        geom_line(data=df_rt_tmp_above, aes(x=date, y=mean), color=rt_red) +
        geom_ribbon(data=df_rt_tmp_above, aes(x=date, ymin=lower_80, ymax=upper_80), 
                    color=NA, alpha=0.2, fill=rt_red) 
    }
    
    if (not_missing(df_rt_tmp_below)){
      p_rt = p_rt + 
        geom_line(data=df_rt_tmp_below, aes(x=date, y=mean), color=rt_green) +
        geom_ribbon(data=df_rt_tmp_below, aes(x=date, ymin=lower_80, ymax=upper_80), 
                    color=NA, alpha=0.2, fill=rt_green) 
    }
    if (not_missing(df_rt_tmp_mid)) {
      p_rt = p_rt +
        geom_line(data=df_rt_tmp_mid, aes(x=date, y=mean), color=colors[3], size=1) +
        geom_ribbon(data=df_rt_tmp_mid, aes(x=date, ymin=lower_80, ymax=upper_80), 
                    color=NA, alpha=0.2, fill=colors[3]) 
    }
    
    last_rt_date = max(df_rt_tmp$date)
    last_rt = df_rt_tmp[df_rt_tmp$date == last_rt_date,]$mean %>% round(2)
    p_rt = p_rt +
      scale_x_date(limits = c(min_date, max_date)) + 
      scale_y_continuous(label=comma, limits = c(0.5, 1.5)) + 
      theme(axis.text.x = element_blank(),
            plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
      labs(y='', x='') + 
      annotate("label", x=min_date, y=1.4, 
               label='Rt', hjust=0, vjust=1) +
      annotate("label", x=last_rt_date, y=last_rt, 
               label=last_rt, hjust=0, vjust=0) 
  }
 
  
  p2 <- ggplot() +
    geom_rect(aes(xmin = min_pred_date, 
                  xmax = max_date, 
                  ymin = -Inf, 
                  ymax = Inf), fill='gray', alpha=0.2) + 
    geom_point(data=df2, aes(x=date, y=dead), alpha=.5, color=colors[2]) +
    geom_line(data=df2, aes(x=date, y=dead_ma), color=colors[2]) +
    scale_y_continuous(label=comma) + 
    scale_x_date(labels = scales::date_format("%m/%d"), 
                 limits=c(min_date, max_date)) +
    geom_line(data=df_pred, aes(x=date, y=point), color=dk_green, linetype='dashed') + 
    geom_point(data=df_pred, aes(x=date, y=point), color=dk_green) + 
    geom_ribbon(data=df_pred, aes(x=date, ymin=ll, ymax=ul), color=dk_green, alpha=0.1, fill=dk_green) +
    theme(plot.margin = unit(c(0, 0, 0, 0), "cm"), 
          plot.caption.position='plot',
          plot.caption = element_text(hjust=0, vjust=10)) + 
    annotate("text", x=middle_date, y=max(df_pred$ul, df_tmp$dead), 
             label=forecast_title) + 
    annotate("label", x=min_date, y=max(df_pred$ul, df_tmp$dead), 
             label='Deaths', hjust=0, vjust=1) + 
    annotate('text', x=max(df_pred$date), 
             y=df_pred[df_pred$date == max(df_pred$date),]$point,
             hjust = -.3,
             vjust = .2,
             color=dk_green,
             label = comma(df_pred[df_pred$date == max(df_pred$date),]$point)) +
    annotate('text', x=max(df2$date), 
             y=df2[df2$date == max(df2$date),]$dead,
             hjust = -.3,
             vjust = .2,
             color=colors[2],
             label = comma(df2[df2$date == max(df2$date),]$dead)) +
    labs(x='', y='', caption='Plot: bryanwhiting.github.io/covid19\nRt: rt.live; Forecasts: covid19forecasthub.org, cdc.gov; Data: Johns Hopkins') 
  # p2
  
  # title: https://wilkelab.org/cowplot/articles/plot_grid.html#joint-plot-titles
  p_title <- ggdraw() + 
    draw_label(
      title_str,
      fontface = 'bold',
      x = 0,
      hjust = 0
    ) +
    theme(plot.margin = margin(0, 0, 0, 7)) 
  
  # titles <- cowplot::plot_grid(title, subtitle, ncol=1, rel_heights=c(0.2, 1))
  # subtitle <- ggdraw() +
  #   draw_label(title_str, x = 0, hjust = 0) + 
  #    theme(plot.margin = margin(0, 0, 0, 0)) 
  
  # TODO: add rt.live on bottom 
  # TODO: Fix subtitle
  # plots <- cowplot::plot_grid(p, p2, p2, align='v', ncol=1, rel_heights = c(0.3, 0.3, 0.2))
  
  
  # if (state != 'US'){ 
  if (state == 'US'){# uncomment for rt.live
    plots <- plot_grid(p, p2, align='v', ncol=1, rel_heights=c(0.4, 0.6))
  } else {
    plots <- plot_grid(p_rt, p, p2, align='v', ncol=1,
                       rel_heights=c(0.15, 0.35, 0.45))
  }
  plot_grid(p_title, plots, ncol=1, rel_heights = c(0.15, .9))
    
  # Attempt to add legend
  # scale_color_identity(guide='legend')
  # scale_color_identity(name='Points', guide='legend',
                       # # labels=c('a', 'b'), 
                       # values=c('red'=colors[1], 'black'=colors[2]))
                       # breaks=colors)
                     # values=c(colors[1], colors[2]),
                     # labels=c('a', 'b'))
  # return(p)
}


```
# US
```{r 'us', fig.height=7, fig.width=6}
plot_trends(df_us, state='US', df_pred=df_pred, pred_cum_death = pred_cum_death) 
  
```

# States
```{r 'states', results='asis', fig.height=9, fig.width=7} 
#  fig.show="hold", out.width="50%",
#, layout="l-screen-inset"}
states = state.name
# states = c('California', 'Texas', 'Utah', 'Colorado', 'Virginia')
for (state in states){
  df_tmp <- df %>% filter(province_state == !!state) 
  
  out <- process_preds(df_preds, df_tmp, state =state, forecast_date = forecast_date)
  df_pred <- out$df_pred
  pred_cum_death <- out$pred_cum_death
  
  p <- plot_trends(df_tmp, state=state, df_pred=df_pred, pred_cum_death = pred_cum_death, df_rt=df_rt) 
  cat(glue('\n\n## {state}\n\n\n'))
  print(p)
  cat('\n ')
}

  
```