-
-
Notifications
You must be signed in to change notification settings - Fork 10
/
countryPick4.Rmd
189 lines (140 loc) · 6.43 KB
/
countryPick4.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
---
title: "Pick four - comparing trends in population over time"
output: pdf_document
---
## Purpose
The purpose of this report is to compare the population trends for four countries of your choosing. In addition, this serves as an example of literate programming. Literate programming is a way to document how you performed your analysis. It serves as a guide to other to others (and your future self) how to reproduce your work.
## Required Libraries
```{r}
library(ggplot2)
```
## Data
Always add as many details as possible about your data including where it came from, how it was processed, licensing, and where it can be accessed.
- Gapminder data [available here](http://www.gapminder.org/data/). [Gapminder data is licensed CC-BY 3.0](https://docs.google.com/document/pub?id=1POd-pBMc5vDXAmxrpGjPLaCSDSWuxX6FLQgq5DhlUhM#h.ul2gu2-uwathz).
- Processed data via [@jennybc](https://github.com/jennybc), [R package available here](https://github.com/jennybc/gapminder). The [data-raw](https://github.com/jennybc/gapminder/tree/master/data-raw) sub-directory reveals the journey from Gapminder.org's Excel workbooks to increasingly clean and tidy data.
**Read in data**: To read in the data, make sure this file is in the same directory/folder as the `gapminderDataFiveYear.txv` file. To set the proper working directory go to Session > Set Working Directory > To Source File Location.
```{r}
gapMinder <- read.delim("gapminderDataFiveYear.tsv")
### Check data
head(gapMinder) #First 10 lines of dataset
dim(gapMinder) #number of rows and columns in data set
```
You can see what countries are available by looking at the how many unique categories are in the country column of the gapMinder dataset.
```{r, results='hide'}
levels(gapMinder$country)
```
### Pick Four Countries
Now pick four countries that you are intrested in. Just replace with the countries name below.
```{r}
countryName1 <- "India"
countryName2 <- "United States"
countryName3 <- "Nigeria"
countryName4 <- "Germany"
```
## Individual countries
### Country One
We want to look at how population changes over time for the first country.
```{r}
country1 <- subset(gapMinder, country == countryName1)
ggplot(country1, aes(year, pop)) +
geom_path() +
ggtitle(countryName1) +
theme(plot.title = element_text(size = 15, face = "bold"))
```
This second graph is looking at the correlation between life expectancy (lifeExp) and GDP per person (gdpPercap). The size of the circles on the plot represents total population.
```{r}
ggplot(country1, aes(gdpPercap, lifeExp, size = pop)) +
geom_point() +
ggtitle(countryName1) +
theme(plot.title = element_text(size = 15, face = "bold"))
```
### Country Two
We will do this for each country. Since the code is very similar, we
will omit viewing it below by adding the named parameter `echo=FALSE`
(`TRUE` is the default):
```{r, echo=FALSE}
country2 <- subset(gapMinder, country == countryName2)
ggplot(country2, aes(year, pop)) +
geom_path() +
ggtitle(countryName2) +
theme(plot.title = element_text(size = 15, face = "bold"))
```
**Notes**: In a real report you can add information about the results of the analysis you are performing. That way your code, analysis, questions, and results are all in one place.
```{r, echo = FALSE}
ggplot(country2, aes(gdpPercap, lifeExp, size = pop)) +
geom_point() +
ggtitle(countryName2) +
theme(plot.title = element_text(size = 15, face = "bold"))
```
### Country Three
```{r, echo=FALSE}
country3 <- subset(gapMinder, country == countryName3)
ggplot(country3, aes(year, pop)) +
geom_path() +
ggtitle(countryName3) +
theme(plot.title = element_text(size = 15, face = "bold"))
```
**Notes** Maybe a country has an unusual distribution and we want to label the graph with the year. We added `label = year` to the first line of the code below. To display the text we also added the `geom_text(hjust = 1, vjust = 0, size = 5)` option.
```{r}
ggplot(country3, aes(gdpPercap, lifeExp, size = pop, label = year)) +
geom_point() +
geom_text(hjust = 1.3, vjust = 0, size = 3) +
ggtitle(countryName3) +
theme(plot.title = element_text(size = 15, face = "bold"))
```
### Country Four
```{r, echo=FALSE}
country4 <- subset(gapMinder, country == countryName4)
ggplot(country4, aes(year, pop)) +
geom_path() +
ggtitle(countryName4) +
theme(plot.title = element_text(size=15, face = "bold"))
```
**Notes**: Or we can try out labeling the year by adding color.
```{r}
ggplot(country4, aes(gdpPercap, lifeExp, size = pop, color = year)) +
geom_point() +
ggtitle(countryName4) +
theme(plot.title = element_text(size=15, face = "bold"))
```
## All four countries
Let's add all four countries together and to see how they compare.
```{r}
#Add subsetted data together
allCountries <- rbind(country1, country2, country3, country4)
#Notice the code for this is similar to when we are just looking at one country
#just with the added color option
ggplot(allCountries, aes(year, pop, color=country)) +
geom_path() +
xlab("Year") + ylab("Population Size") +
ggtitle("All four countries") +
theme(plot.title = element_text(lineheight=.8, face = "bold"))
```
What about what is occuring in a particular year? You can change the year by changing the code in the `year == 2007` section. To look at what years are possible use `allCountries$year`.
```{r}
yr <- 2007
ggplot(subset(allCountries, year == yr),
aes(x = gdpPercap, y = lifeExp, color = country, size = pop)) +
scale_x_log10(limits = c(500, 90000)) +
geom_point(alpha = 0.8) +
scale_size_area(max_size = 14) +
theme_bw() + # black grid on white background
xlab("GDP per capita") + ylab("Life Expectancy") +
ggtitle(paste("All 4 countries in", yr)) +
theme(plot.title = element_text(size = 15, face = "bold"))
```
You can plot all the years at once also!
```{r}
ggplot(allCountries,
aes(x = gdpPercap, y = lifeExp, color = country, size = pop)) +
scale_x_log10(limits = c(500, 90000)) +
ylim(c(30, 90)) +
geom_point(alpha = 0.8) +
scale_size_area(max_size = 14) +
theme_bw() + # black grid on white background
xlab("GDP per capita") + ylab("Life Expectancy") +
ggtitle("All 4 countries") +
theme(plot.title = element_text(size = 15, face = "bold"))
```
## Conclusions
In a real report you can add conclusions about your analysis or future plans for the project. The best part is that if you want to change something in your report you don't have to redo every step. You can just make the change and re-print the report.