diff --git a/03-data-wrangling1.Rmd b/03-data-wrangling1.Rmd index 55f79cc..fb23048 100644 --- a/03-data-wrangling1.Rmd +++ b/03-data-wrangling1.Rmd @@ -1,7 +1,7 @@ # Data Wrangling with Tidy Data, Part 1 ```{r, echo=F, message=F, warning=F, error=F} -install.packages("palmerpenguins") +install.packages("palmerpenguins", repos = "http://cran.us.r-project.org") library(palmerpenguins) library(tidyverse) load(url("https://github.com/fhdsl/Intro_to_R/raw/main/classroom_data/CCLE.RData")) diff --git a/04-data-wrangling2.Rmd b/04-data-wrangling2.Rmd index 8617068..4b53ad0 100644 --- a/04-data-wrangling2.Rmd +++ b/04-data-wrangling2.Rmd @@ -1,7 +1,7 @@ # Data Wrangling with Tidy Data, Part 2 ```{r, echo=F, message=F, warning=F, error=F} -install.packages("palmerpenguins") +install.packages("palmerpenguins", repos = "http://cran.us.r-project.org") library(palmerpenguins) library(tidyverse) load(url("https://github.com/fhdsl/Intro_to_R/raw/main/classroom_data/CCLE.RData")) diff --git a/05-data-visualization.Rmd b/05-data-visualization.Rmd index e91f443..ca74b4c 100644 --- a/05-data-visualization.Rmd +++ b/05-data-visualization.Rmd @@ -1,7 +1,7 @@ # Data Visualization ```{r, echo=F, message=F, warning=F, error=F} -install.packages("palmerpenguins") +install.packages("palmerpenguins", repos = "http://cran.us.r-project.org") ``` ```{r, echo=F, message=F, warning=F, error=F} @@ -9,7 +9,17 @@ library(tidyverse) library(palmerpenguins) ``` -In our final to last week together, we learn about how to do visualize our data. There are several different data visualization tools in R, and we focus on one of the most popular, "Grammar of Graphics", or known as "ggplot". The syntax for "ggplot" will look a bit different than the code we have been writing, with syntax such as `ggplot(penguins) + aes(x = bill_length_mm) + geom_histogram()`. The output of all of these functions, such as from `ggplot()` or `aes()` are not data types or data structures that we are familiar with...rather, they are graphical information. You should be worried less about how this syntax is similar to what we have learned in the course so far, but to view it as a new grammar (of graphics!) that you can "layer" on to create more sophisticated plots. +Now that we have learned basic data structures in R, we can now learn about how to do visualize our data. There are several different data visualization tools in R, and we focus on one of the most popular, "Grammar of Graphics", or known as "ggplot". + +The syntax for `ggplot` will look a bit different than the code we have been writing, with syntax such as: + +```{r} +ggplot(penguins) + aes(x = bill_length_mm) + geom_histogram() +``` + +The output of all of these functions, such as from `ggplot()` or `aes()` are not data types or data structures that we are familiar with...rather, they are graphical information. + +You should be worried less about how this syntax is similar to what we have learned in the course so far, but to view it as a new grammar (of graphics!) that you can "layer" on to create more sophisticated plots. To get started, we will consider these most simple and common plots: @@ -17,13 +27,11 @@ To get started, we will consider these most simple and common plots: **Univariate** - Numeric: histogram - - Character: bar plots **Bivariate** - Numeric vs. Numeric: Scatterplot, line plot - - Numeric vs. Character: Box plot Why do we focus on these common plots? Our eyes are better at distinguishing certain visual features more than others. All of these plots are focused on their position to depict data, which gives us the most effective visual scale. diff --git a/_bookdown.yml b/_bookdown.yml index fa2ed09..c57995b 100644 --- a/_bookdown.yml +++ b/_bookdown.yml @@ -4,9 +4,9 @@ repo: https://github.com/jhudsl/OTTR_Template/ rmd_files: ["index.Rmd", "01-intro-to-computing.Rmd", "02-data-structures.Rmd", + "05-data-visualization.Rmd", "03-data-wrangling1.Rmd", "04-data-wrangling2.Rmd", - "05-data-visualization.Rmd", "06-cheatsheet.Rmd", "About.Rmd", "References.Rmd"] diff --git a/images/gator_error.jpg b/images/gator_error.jpg new file mode 100644 index 0000000..19e6f8d Binary files /dev/null and b/images/gator_error.jpg differ diff --git a/images/hosrt_error_tweet.png b/images/hosrt_error_tweet.png new file mode 100644 index 0000000..1799d2e Binary files /dev/null and b/images/hosrt_error_tweet.png differ diff --git a/index.Rmd b/index.Rmd index 3bb087d..5244522 100644 --- a/index.Rmd +++ b/index.Rmd @@ -13,15 +13,148 @@ output: toc: true --- -# About this Course +# Course Logistics and Expectations -## Curriculum +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` -The course covers fundamentals of R, a high-level programming language, and use it to wrangle data for analysis and visualization. +## Course Description -## Target Audience +In this course, you will learn the fundamentals of R, a statistical programming language, and use it to wrangle data for analysis and visualization. The programming skills you will learn are transferable to learn more about R independently and other high-level languages such as Python. At the end of the class, you will be reproducing analysis from a scientific publication! -The course is intended for researchers who want to learn coding for the first time with a data science application, or have explored programming and want to focus on fundamentals. +## Learning Objectives + +After taking this course, you will be able to: + +- **Analyze** Tidy datasets in the R programming language via data wrangling, summary statistics, and visualization. +- **Describe** how the R programming environment interpret complex expressions made out of functions, operations, and data structures, in a step-by-step way. +- **Apply** problem solving strategies to debug broken code. + +## Course Website + +All course information will be available here: + +https://hutchdatascience.org/intro_to_r + +Course discussions will be done in the class slack Workspace. Invites will be sent before class. + +Lab Assignments will be done in the class Posit.cloud workspace. Students should register at https://posit.cloud before the lab. Link to join the workspace will be sent out before the first lab. + +## DaSL Courses are a Psychologically Safe Space + +We want everyone to feel ok with asking questions. That's why we adhere to the [Participation Guidelines](https://hutchdatascience.org/communitystudios/guidelines/) for each course. + +Be respectful of each other and how we learn differently. It is never ok to disparage people for their questions. + +## Office Hours + +Office Hours will be held via Teams on Fridays. Feel free to drop into the office hours and work and ask questions as needed. + +## Clinical Network Issues + +We know that learners on the Clinical network are having issues accessing material, including websites. We are working to figure out good workarounds for it. + +If you are connected via VPN, we recommend that you disconnect it while working. + +## Slack + +If you haven't yet joined the FH Data Slack, you can join here: https://hutchdatascience.org/slack/ + +Look for the #dasl-s4-intro-to-r channel - that's where we'll have conversations and field questions. + +## Instructor + +[Ted Laderas, PhD](https://laderast.github.io) +tladera2@fredhutch.org +**Preferred Method of Contact**: Email/Slack +**Expected Response Time**: 24hrs + +I've been teaching R for over 10 years, and have been an active user and data scientist for over 20. I write a lot, including on Data Science, Mental Health, and Bioinformatics. + +I'm always excited to see my learners surpass me, and if you are curious enough, I guarantee you will. + +## Words of Encouragement + +> This was adopted from Andrew Heiss. Thanks! + +I *promise* you can succeed in this class. + +Learning R can be difficult at first—it's like learning a new language, just like Spanish, French, or Chinese. Hadley Wickham—the chief data scientist at RStudio and the author of some amazing R packages you'll be using like `ggplot2`—[made this wise observation](https://r-posts.com/advice-to-young-and-old-programmers-a-conversation-with-hadley-wickham/): + +> It’s easy when you start out programming to get really frustrated and think, “Oh it’s me, I’m really stupid,” or, “I’m not made out to program.” But, that is absolutely not the case. Everyone gets frustrated. I still get frustrated occasionally when writing R code. It’s just a natural part of programming. So, it happens to everyone and gets less and less over time. Don’t blame yourself. Just take a break, do something fun, and then come back and try again later. + +Even experienced programmers find themselves bashing their heads against seemingly intractable errors. If you're finding yourself taking way too long hitting your head against a wall and not understanding, take a break, talk to classmates, e-mail me, etc. + +```{r echo=FALSE, out.width="60%"} +# https://twitter.com/allison_horst/status/1213275783675822080 +knitr::include_graphics("images/hosrt_error_tweet.png", error = FALSE) +``` + +[![Alison Horst: Gator error](images/gator_error.jpg)](https://twitter.com/allison_horst/status/1213275783675822080) + +## LeaRning is Social + +- Be curious, not afraid. +- Know that if you have a question, other people will have it. +- Asking questions is our way of taking care of others + +The students who have a bad time in my workshops and courses are the ones who don't work with each other to learn. We are a learning community, and we should help each other to learn. + +> Find a buddy to work with - and check in with them during class and out of class + +If you understand something and someone is struggling with it, try and help them. If you are struggling, take a breath, and try to pinpoint what you are struggling with. + +Our goal is to be better programmers each day, not to be the perfect programmer. There's no such thing as a perfect programmer. I've been learning new things almost every day. + +## Course Times + +I know that everyone is busy, and we'll do our best to accomodate everyone's schedule. + +Classes will be recorded, but please do not use this as an excuse to miss class. Again, those who are curious and ask questions will learn quite a bit. + +## Class Schedule + +There are two sections of Intro to R. + +- A hybrid (in-person and online) session on Wednesdays (12-1:30 PM PST) +- A completely remote session on Thursdays (2-3:30 PM PST) + +When you are enrolled, we will send you teams invites for your section. + +The Hybrid Sections will be held in the Data Science Lab Lounge - Arnold M1- and online. Please note that I will in town and teaching in person on the starred (*) dates below. + +Dates when I am not on campus, you are free to attend in the + +|Week|Subject|Hybrid Section Dates|Remote Session Dates| +|----|--------------------|--------------------|-----| +|1*|Introduction to R/RStudio|September 25|September 26| +|2|Data Structures|October 2|October 3| +|3*|Data Visualization|October 9|October 10| +|4 (optional)|Community Session|October 16|October 16| +|5*|Data Wrangling 1|October 23|October 24|| +|6|Data Wrangling 2|October 30|October 31|| +|7* (optional)|Community Session|November 6|November 6| +|8|Wrap-up/Discuss Code-a-thon|November 13|November 14| + +Note that the Community Sessions are Shared between the two sections. + +More details about the Code-a-thon to come. + +## Community Sessions + +Two times this quarter we will have learning community sessions, to talk about applications of what we're learning. These sessions are optional, but will help you solidify your learning during the course. + +These dates are: + +October 16 at 12-1:30 PM +November 6 at 12-1:30 PM + +These dates will be sent to you when you register for the course. + +## Patient / Clinical Data is a No on Posit Cloud + +The Posit Cloud workspace is for your learning. Please do not put any patient or clinical information on there. ## Offerings