-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path03 - preparing-exploring.qmd
161 lines (88 loc) · 2.27 KB
/
03 - preparing-exploring.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
title: "Preparing and exploring Data"
---
## Catch up
```{r}
library(dplyr)
library(dbplyr)
library(DBI)
con <- dbConnect(
odbc::databricks(),
HTTPPath = "/sql/1.0/warehouses/300bd24ba12adf8e"
)
```
## 01 - Selecting variables
*Simple strategies to order, and reduce, data to work with*
1. Load the `customer` table to a variable called `customer`
```{r}
```
2. Select all columns that end with "key"
```{r}
```
3. Move all columns that end with "key" to the front
```{r}
```
4. Select all columns that **do not** end with "key"
```{r}
```
## 02 - Join to tables
**Using left_join() to relate two tables**
1. Load the `nation` table into a variable called the same
```{r}
```
2. Use `left_join` to relate `customer` with `nation` using the nation key
```{r}
```
3. What are the 5 countries with the most customers?
```{r}
```
## 03 - Prepare base
*Building the base variable/query*
1. Load the `orders` table in a variable called `orders`
```{r}
```
2. Join `orders` to the `customer` variable (table). Relate them on the
`o_custkey` and `c_custkey` fields.
```{r}
```
3. Join the `nation` variable/table to the `orders` and `customer` variables.
Use the `c_nationkey` and the `n_nationkey` to relate them.
```{r}
```
4. Load the resulting code into a variable called `rel_orders`. **We do this
so to get autocomplete working**
```{r}
```
5. Create new columns for the year of the order date, and another for the
month of the order date. Name them `order_year` and `order_month` respectively.
```{r}
```
6. Remove any columns that end in "comment", and end in "key"
```{r}
```
7. Rename `o_custkey` to `customer`, insert code before the selection
```{r}
```
8. Load resulting code to a variable called `prep_orders`
```{r}
```
9. Preview `prep_orders` using `glimpse()`
```{r}
```
## 04 - Answering questions
*Using the base query to answer more complex questions*
1. What are the top 5 countries for total amount ordered?
```{r}
```
2. What are the top 5 countries for total amount ordered for 1998?
```{r}
```
3. What has been the top (1) country, in orders, by year?
```{r}
```
4. Who are the top 5 customers by amount ordered?
```{r}
```
5. What is the country, and market segment, of the top 5 customers by amount ordered?
```{r}
```