forked from TBrost/BYUI-Timeseries-Drafts
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchapter_1_lesson_1.qmd
449 lines (332 loc) · 13.5 KB
/
chapter_1_lesson_1.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
---
title: "Course Introduction"
subtitle: "Chapter 1: Lesson 1"
format: html
editor: source
sidebar: false
---
```{r}
#| include: false
source("common_functions.R")
```
## Learning Outcomes
{{< include outcomes/_chapter_1_lesson_1_outcomes.qmd >}}
## Introduction to the course structure and Canvas (30 min)
- Introduction of teacher(s)
- Introduction of students
- Syllabus
- Software: R and RStudio
- Textbook
- Cowpertwait, P. S. P., & Metcalfe, A. V. (2009). *Introductory Time Series with R*. Springer. ISBN 978-0-387-88697-8; e-ISBN 978-0-387-88698-5; DOI 10.1007/978-0-387-88698-5.
- [Supplement to the Textbook](https://byuistats.github.io/timeseries_supplement)
- Modern R code
- Time Series (TS) Notebook for in-class activities
- Lesson cadence
- Read assigned section(s) from the textbook
- Assigned sections listed in the TS notebook
- Reading Journals
- Record your learning
- Include all of the following from the assigned reading: vocabulary terms, nomenclature, models, important concepts, and your questions
- Review another student's learning journal at the beginning of class
- In-class Activities
- Homework
- Assessment Structure
- Daily Homework, Multi-week Projects, Three Exams
- Grading Categories
- Reading Journal (10%)
- Homework (40%)
- Projects (25%)
- Exams (25%)
- Grades: 93% = A
- Calendar
- Team structure for class activities
- Random assignment, frequent changes, partner with each student in the class
- We are all in this together
## Class Activity: Google Trends (Searches for "Chocolate") (10 min)
Google Trends allows you to download a time series showing the proportional number of searches for a given term. The month with the highest number of searches has a value of 100. The values for the other months are given as a percentage of the peak month's value. The following table illustrates the data, as given by Google Trends.
```{r}
#| echo: false
if (!require("pacman")) install.packages("pacman")
pacman::p_load("tidyverse", "rio","mosaic")
chocolate_raw <- rio::import("https://byuistats.github.io/timeseries/data/chocolate_raw.csv")
# Prints a few rows of the chocolate_raw data file.
temp <- chocolate_raw
temp[1,1] <- "Category:"
temp[1,2] <- paste0("All categories", strrep(" ",12))
temp2 <- temp |>
head(16) |>
bind_rows(data.frame(V1 = "⋮ ", V2 = "⋮")) |>
bind_rows(temp |> tail(3)) |>
mutate(
V1 =
case_when(
row_number() == 3 ~ paste0(V1, strrep(" ",4)),
row_number() > 3 ~ paste0(V1, strrep(" ",2)),
TRUE ~ V1
)
)
temp2 |>
rename(" " = "V1", " " = "V2") |> print(row.names = FALSE)
```
The cleaned version of the data used for this demonstration are available in the file <a href="data/chocolate.csv" download>chocolate.csv</a>. We can read this directly into a data frame using the command
`chocolate_month <- rio::import("https://byuistats.github.io/timeseries/data/chocolate.csv")`
In Lesson 3, we will practice converting data like this into a time series (tsibble) object.
```{r}
#| code-fold: true
#| code-summary: "Show the code"
if (!require("pacman")) install.packages("pacman")
pacman::p_load("tsibble", "fable",
"feasts", "tsibbledata",
"fable.prophet", "tidyverse",
"patchwork", "rio")
# read in the data from a csv and make the tsibble
# change the line below to include your file path
chocolate_month <- rio::import("https://byuistats.github.io/timeseries/data/chocolate.csv")
start_date <- lubridate::ymd("2004-01-01")
date_seq <- seq(start_date,
start_date + months(nrow(chocolate_month)-1),
by = "1 months")
chocolate_tibble <- tibble(
dates = date_seq,
year = lubridate::year(date_seq),
month = lubridate::month(date_seq),
value = dplyr::pull(chocolate_month, chocolate)
)
chocolate_month_ts <- chocolate_tibble |>
mutate(index = tsibble::yearmonth(dates)) |>
as_tsibble(index = index)
chocolate_month_ts |> head()
```
For now, we will use the tsibble object (which in this case is called *chocolate_month_ts*) to explore the time series. Here is a plot of the time series representing the proportional frequency of searches for the term "chocolate."
```{r}
#| code-fold: true
#| code-summary: "Show the code"
autoplot(chocolate_month_ts, .vars = value) +
labs(
x = "Month",
y = "Searches",
title = "Relative Number of Google Searches for 'Chocolate'"
) +
theme(plot.title = element_text(hjust = 0.5))
```
::: {.callout-tip icon=false title="Check Your Understanding"}
- What do you notice about this plot?
:::
::: panel-tabset
#### Characteristics
#### Trend
The red line represents the mean for each year. The point for this line was positioned to align with July of the year.
```{r}
#| echo: false
chocolate_annual_ts <- summarise(index_by(chocolate_month_ts, year), value = mean(value))
temp <- chocolate_month_ts |>
filter(month == 7) |>
dplyr::select(-value) |>
right_join(chocolate_annual_ts, by = join_by(year))
autoplot(chocolate_month_ts, .vars = value) +
labs(
x = "Month",
y = "Searches",
title = "Relative Number of Google Searches for 'Chocolate'"
) +
theme(plot.title = element_text(hjust = 0.5)) +
geom_line(data = temp,
aes(x = index, y = value),
color = "#D55E00")
```
:::: {.callout-tip icon=false title="Check Your Understanding"}
- What do you observe about the number of searches for "chocolate" each month?
- What might be causing this trend?
::::
#### Seasonality / Cycles
Consider the data for the last few years:
```{r}
#| echo: false
####################### FUTURE MAINTENANCE ###################
first_year_selected <- 2020 # Increase this in future semesters, so the figure shows each year clearly. There should be a major vertical line for January of each year, and the shape of the monthly pattern should be clearly evident.
lastfew_plot <- autoplot(chocolate_month_ts |> filter(dates >= lubridate::mdy(paste0("07/01/", first_year_selected))), .vars = value) +
labs(
x = "Month",
y = "Searches",
title = "Relative Number of Google Searches for Chocolate (Select Months)"
) +
theme(plot.title = element_text(hjust = 0.5))
lastfew_plot
```
:::: {.callout-tip icon=false title="Check Your Understanding"}
- Which month tends to have the greatest number of Google searches for "chocolate"?
- Which month has the second greatest number of Google searches for "chocolate"?
- When do the fewest number of Google searches for "chocolate" occur?
- How can you explain these observations?
::::
#### Autocorrelation
**Autocorrelation** is a fancy word that means that sequential values in a sequence of data are related in some way.
Consider searches in successive months. Are they independent?
```{r test}
#| echo: false
####################### FUTURE MAINTENANCE ###################
example_year <- 2023 # Change this in future semesters to be the last full year with data.
lastfew_plot
```
:::: {.callout-tip icon=false title="Check Your Understanding"}
- Think about what you know about the reported number of searches in December compared to the following February. The reported number of searches for "chocolate" in December `r example_year-1` is `r chocolate_month_ts |> filter(year == example_year - 1, month == 12) |> data.frame() |> dplyr::select(value) |> pull()`. Does it make sense that the reported number of searches in February `r example_year` is `r chocolate_month_ts |> filter(year == example_year, month == 2) |> data.frame() |> dplyr::select(value) |> pull()` ? Given the value from December, is the value in the following February independent and completely random?
- The value reported by Google for June `r example_year` is `r chocolate_month_ts |> filter(year == example_year, month == 6) |> data.frame() |> dplyr::select(value) |> pull()`. Based on what you have observed in the data, do you think the value for July `r example_year` will be close to or far from this value? Justify your answer.
::::
:::
Discuss these vocabulary terms in the context of the Google Trends ("Chocolate") example: - Time series - Sampling interval - Autocorrelation (or serial dependence) - Trend - Seasonal variation - Cycle
```{r}
#| include: false
# TS Plot (Monthly and Annual)
mp <- autoplot(chocolate_month_ts, .vars = value) +
labs(
x = "Month",
y = "Searches",
title = "Relative Number of Google Searches for 'Chocolate'"
) +
theme(plot.title = element_text(hjust = 0.5))
yp <- autoplot(chocolate_annual_ts, .vars = value) +
labs(
x = "Year",
y = "Searches",
title = "Mean Annual Google Searches for 'Chocolate'"
) +
scale_x_continuous(breaks = seq(2004, max(chocolate_month_ts$year), by = 2)) +
theme(plot.title = element_text(hjust = 0.5))
mp
# mp / yp
```
## Class Activity: S&P 500 (10 min)
The time series plot below illustrates the daily closing prices of the standard and Poor's 500 stock index (S&P 500).
```{r}
#| echo: false
##################### S&P 500
replaceCommas<-function(x){
x<-as.numeric(gsub("\\,", "", x))
}
sp500_dat <- rio::import("https://byuistats.github.io/timeseries/data/sp500.csv") |>
mutate(dates = mdy(Date))
sp500_day <- sp500_dat |>
mutate(date_seq = dates) |>
mutate(
dates = date_seq,
year = lubridate::year(date_seq),
month = lubridate::month(date_seq),
value = replaceCommas(Close)
) |>
dplyr::select(-date_seq) |>
tibble()
sp500_ts <- sp500_day |>
mutate(index = dates) |>
as_tsibble(index = index)
sp500_annual_ts <- summarise(index_by(sp500_ts, year), value = mean(value))
temp <- sp500_ts |> filter(month == 7 & day(dates) == 1) |>
dplyr::select(Date, year)
temp2 <- sp500_annual_ts |>
right_join(temp, by = join_by(year))
autoplot(sp500_ts, .vars = value) +
labs(
x = "Date",
y = "Closing Price",
title = "Daily Closing Price of the S&P 500 Stock Index"
) +
theme(plot.title = element_text(hjust = 0.5))
```
::: panel-tabset
#### Characteristics
#### Trend
The red line represents the mean for each year. The point for this line was positioned to align with July of the year.
```{r}
#| echo: false
##################### S&P 500
replaceCommas<-function(x){
x<-as.numeric(gsub("\\,", "", x))
}
sp500_dat <- rio::import("https://byuistats.github.io/timeseries/data/sp500.csv") |>
mutate(dates = mdy(Date))
sp500_day <- sp500_dat |>
mutate(date_seq = dates) |>
mutate(
dates = date_seq,
year = lubridate::year(date_seq),
month = lubridate::month(date_seq),
value = replaceCommas(Close)
) |>
dplyr::select(-date_seq) |>
tibble()
sp500_ts <- sp500_day |>
mutate(index = dates) |>
as_tsibble(index = index)
sp500_annual_ts <- summarise(index_by(sp500_ts, year), value = mean(value))
temp <- sp500_ts |> filter(month == 7 & day(dates) == 1) |>
dplyr::select(Date, year)
temp2 <- sp500_annual_ts |>
right_join(temp, by = join_by(year))
autoplot(sp500_ts, .vars = value) +
labs(
x = "Date",
y = "Closing Price",
title = "Daily Closing Price of the S&P 500 Stock Index"
) +
theme(plot.title = element_text(hjust = 0.5)) +
# geom_line(data = temp2,
# aes(x = index, y = value),
# color = "red") +
geom_smooth(formula = y ~ x, method = "loess", color = "#D55E00")
```
:::: {.callout-tip icon=false title="Check Your Understanding"}
- What do you observe about the value of the S&P 500 over time?
- What might be causing this trend?
::::
#### Seasonality / Cycles
:::: {.callout-tip icon=false title="Check Your Understanding"}
- Are there regularly-occurring seasonal trends in the data?
- Are there some random (stochastic) business cycles observable in the data?
- How can you explain these observations?
::::
#### Autocorrelation
:::: {.callout-tip icon=false title="Check Your Understanding"}
- Consider closing prices in successive days. Are they independent?
- Why would there be autocorrelation in the data?
::::
:::
Discuss these vocabulary terms in the context of the S&P 500 example:
- Time series
- Sampling interval
- Autocorrelation (or serial dependence)
- Trend
- Seasonal variation
- Cycle
- Deterministic vs. Stochastic
## Recap (5 min)
- What is time series data?
- Define "time series" (e.g. observations collected sequentially over time)
- Examples of time series data
- Why ordinary regression fails -- correlated error terms
- Examples of time series from different domains:
- Daily credit card balance
- Daily closing stock prices
- Monthly sales figures
- Yearly global temperature measurements
- Secondly wave heights in an ocean buoy
- Weekly unemployment rates
- Quarterly GDP estimates
- Importance of context and subject matter knowledge
- Role of models (explanation, prediction, simulation)
- Are there any questions on the course or time series data?
## Homework Preview (5 min)
- Review upcoming homework assignment
- Clarify questions
## Homework
::: {.callout-note icon=false}
## Download Assignment
<a href="https://byuistats.github.io/timeseries/homework/homework_1_1.qmd" download="homework_1_1.qmd"> homework_1_1.qmd </a>
## Preparation for the next class meeting
- Update R and RStudio
- Access
- Canvas course
- Time Series Notebook (Quarto file)
- Purchase the textbook
- Read sections 1.1-1.4 in the textbook
- Obtain a Learning Journal
- Prepare to share your Learning Journal with another student in the next class meeting
:::