layout | title | subtitle | minutes |
---|---|---|---|
page |
R for reproducible scientific analysis |
Reading data |
15 |
- To be able to read in regular data into R
Now that we've obtained the gapminder dataset, we want to load it into R. Before reading in data, it's a good idea to have a look at its structure. Let's take a look using our newly-learned shell skills:
cd gapminder/data/ # navigate to the data directory of the project folder
head gapminder-FiveYearData.csv
country,year,pop,continent,lifeExp,gdpPercap
Afghanistan,1952,8425333,Asia,28.801,779.4453145
Afghanistan,1957,9240934,Asia,30.332,820.8530296
Afghanistan,1962,10267083,Asia,31.997,853.10071
Afghanistan,1967,11537966,Asia,34.02,836.1971382
Afghanistan,1972,13079460,Asia,36.088,739.9811058
Afghanistan,1977,14880372,Asia,38.438,786.11336
Afghanistan,1982,12881816,Asia,39.854,978.0114388
Afghanistan,1987,13867957,Asia,40.822,852.3959448
Afghanistan,1992,16317921,Asia,41.674,649.3413952
As its file extension would suggest, the file contains comma-separated values, and seems to contain a header row.
We can use read.table
to read this into R
gapminder <- read.table(
file="data/gapminder-FiveYearData.csv",
header=TRUE, sep=","
)
head(gapminder)
country year pop continent lifeExp gdpPercap
1 Afghanistan 1952 8425333 Asia 28.801 779.4453
2 Afghanistan 1957 9240934 Asia 30.332 820.8530
3 Afghanistan 1962 10267083 Asia 31.997 853.1007
4 Afghanistan 1967 11537966 Asia 34.020 836.1971
5 Afghanistan 1972 13079460 Asia 36.088 739.9811
6 Afghanistan 1977 14880372 Asia 38.438 786.1134
Because we know the structure of the data, we're able
to specify the appropriate arguments to read.table
.
Without these arguments, read.table
will do its
best to do something sensible, but it is always more
reliable to explicitly tell read.table
the structure
of the data. read.csv
function provides a convenient shortcut
for loading in CSV files.
Another type of file you might encounter are tab-separated format. To specify a tab as a separator, use
"\t"
.You can also read in files from the internet by replacing the file paths with a web address.
You can read directly from excel spreadsheets without converting them to plain text by using the
xlsx
package.
To make sure our analysis is reproducible, we should put the code into a script file so we can come back to it later.
Go to file -> new file -> R script, and write an R script to load in the gapminder dataset. Put it in the
scripts/
directory and add it to version control.Run the script using the
source
function, using the file path as its argument (or by pressing the "source" button in RStudio).