diff --git a/01-intro.Rmd b/01-intro.Rmd index 886eb528..deec016a 100644 --- a/01-intro.Rmd +++ b/01-intro.Rmd @@ -5,17 +5,13 @@ ottrpal::set_knitr_image_path() # Introduction - -## Motivation - - ## Target Audience -The course is intended for ... +The course is intended for researchers who want to learn coding for the first time with a data science application, or have explored programming and want to focus on fundamentals. ## Curriculum -The course covers... +The course covers fundamentals of R, a high-level programming language, and use it to wrangle data for analysis and visualization. The programming skills you will learn are transferable to learn more about R independently and other high-level languages such as Python. At the end of the class, you will be reproducing analysis from a scientific publication! ```{r} diff --git a/02-chapter_of_course.Rmd b/02-chapter_of_course.Rmd deleted file mode 100644 index 68835cfd..00000000 --- a/02-chapter_of_course.Rmd +++ /dev/null @@ -1,260 +0,0 @@ - -# A new chapter - -If you haven't yet read the getting started Wiki pages; [start there](https://www.ottrproject.org/getting_started.html). - -To see the rendered version of this chapter and the rest of the template, see here: https://jhudatascience.org/OTTR_Template/. - -Every chapter needs to start out with this chunk of code: - - -```{r, include = FALSE} -ottrpal::set_knitr_image_path() -``` - -## Learning Objectives - -Every chapter also needs Learning objectives that will look like this: - -This chapter will cover: - -- {You can use https://tips.uark.edu/using-blooms-taxonomy/ to define some learning objectives here} -- {Another learning objective} - -## Libraries - -For this chapter, we'll need the following packages attached: - -*Remember to add [any additional packages you need to your course's own docker image](https://github.com/jhudsl/OTTR_Template/wiki/Using-Docker#starting-a-new-docker-image). - -```{r} -library(magrittr) -``` - -## Topic of Section - -You can write all your text in sections like this, using `##` to indicate a new header. you can use additional pound symbols to create lower levels of headers. - -See [here](https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf) for additional general information about how you can format text within R Markdown files. In addition, see [here](https://pandoc.org/MANUAL.html#pandocs-markdown) for more in depth and advanced options. - -### Subtopic - -Here's a subheading (using three pound symbols) and some text in this subsection! - -## Code examples - -You can demonstrate code like this: - -```{r} -output_dir <- file.path("resources", "code_output") -if (!dir.exists(output_dir)) { - dir.create(output_dir) -} -``` - -And make plots too: - -```{r} -hist_plot <- hist(iris$Sepal.Length) -``` - -You can also save these plots to file: - -```{r} -png(file.path(output_dir, "test_plot.png")) -hist_plot -dev.off() -``` - -## Image example - -How to include a Google slide. It's simplest to use the `ottrpal` package: - - -```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "Major point!! example image"} -ottrpal::include_slide("https://docs.google.com/presentation/d/1YmwKdIy9BeQ3EShgZhvtb3MgR8P6iDX4DfFD65W_gdQ/edit#slide=id.gcc4fbee202_0_141") -``` - -But if you have the slide or some other image locally downloaded you can also use HTML like this: - -Major point!! example image - -## Video examples -You may also want to embed videos in your course. If alternatively, you just want to include a link you can do so like this: - -Check out this [link to a video](https://www.youtube.com/embed/VOCYL-FNbr0) using markdown syntax. - -### Using `knitr` - -To embed videos in your course, you can use `knitr::include_url()` like this: -Note that you should use `echo=FALSE` in the code chunk because we don't want the code part of this to show up. If you are unfamiliar with [how R Markdown code chunks work, read this](https://rmarkdown.rstudio.com/lesson-3.html). - - -```{r, echo=FALSE} -knitr::include_url("https://www.youtube.com/embed/VOCYL-FNbr0") -``` - -### Using HTML - - - -## File examples - -You can again use simple markdown syntax to just include a link to a file like so: - -[A file](https://www.bgsu.edu/content/dam/BGSU/center-for-faculty-excellence/docs/TLGuides/TLGuide-Learning-Objectives.pdf). - -Alternatively you can embed files like PDFs. - -### Using `knitr` - -```{r, fig.align="center", echo=FALSE, out.width="100%"} -knitr::include_url("https://drive.google.com/file/d/1mm72K4V7fqpgAfWkr6b7HTZrc3f-T6AV/preview") -``` - -### Using HTML - - - -## Website Examples - -Yet again you can use a link to a website like so: - -[A Website](https://yihui.org) - -You might want to have users open a website in a new tab by default, especially if they need to reference both the course and a resource at once. - -[A Website](https://yihui.org){target="_blank"} - -Or, you can embed some websites. - -### Using `knitr` - -This works: - -```{r, fig.align="center", echo=FALSE} -knitr::include_url("https://yihui.org") -``` - - -### Using HTML - - - - -If you'd like the URL to show up in a new tab you can do this: - -``` -LinkedIn -``` - -## Citation examples - -We can put citations at the end of a sentence like this [@rmarkdown2021]. -Or multiple citations [@rmarkdown2021, @Xie2018]. - -but they need a ; separator [@rmarkdown2021; @Xie2018]. - -In text, we can put citations like this @rmarkdown2021. - -## Stylized boxes - -Occasionally, you might find it useful to emphasize a particular piece of information. To help you do so, we have provided css code and images (no need for you to worry about that!) to create the following stylized boxes. - -You can use these boxes in your course with either of two options: using HTML code or Pandoc syntax. - -### Using `rmarkdown` container syntax - -The `rmarkdown` package allows for a different syntax to be converted to the HTML that you just saw and also allows for conversion to LaTeX. See the [Bookdown](https://bookdown.org/yihui/rmarkdown-cookbook/custom-blocks.html) documentation for more information [@Xie2020]. Note that Bookdown uses Pandoc. - - -``` -::: {.notice} -Note using rmarkdown syntax. - -::: -``` - -::: {.notice} -Note using rmarkdown syntax. - -::: - -As an example you might do something like this: - -::: {.notice} -Please click on the subsection headers in the left hand -navigation bar (e.g., 2.1, 4.3) a second time to expand the -table of contents and enable the `scroll_highlight` feature -([see more](introduction.html#scroll-highlight)) -::: - - -### Using HTML - -To add a warning box like the following use: - -``` -
-Followed by the text you want inside -
-``` - -This will create the following: - -
- -Followed by the text you want inside - -
- -Here is a `
` box: - -
- -Note text - -
- -Here is a `
` box: - -
- -GitHub text - -
- - -Here is a `
` box: - -
- -dictionary text - -
- - -Here is a `
` box: - -
- -reflection text - -
- - -## Dropdown summaries - -
You can hide additional information in a dropdown menu -Here's more words that are hidden. -
- -## Print out session info - -You should print out session info when you have code for [reproducibility purposes](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/managing-package-versions.html). - -```{r} -devtools::session_info() -``` - -[many links]: https://github.com/jhudsl/OTTR_Template diff --git a/Course_Name.rds b/Course_Name.rds new file mode 100644 index 00000000..ac98e3f6 Binary files /dev/null and b/Course_Name.rds differ diff --git a/_bookdown.yml b/_bookdown.yml index b163ca5a..040a22ba 100644 --- a/_bookdown.yml +++ b/_bookdown.yml @@ -1,9 +1,9 @@ -book_filename: "Course_Name" +book_filename: "Season 1 Introduction to R" chapter_name: "Chapter " repo: https://github.com/jhudsl/OTTR_Template/ rmd_files: ["index.Rmd", "01-intro.Rmd", - "02-chapter_of_course.Rmd", + "lesson1.Rmd", "About.Rmd", "References.Rmd"] new_session: yes diff --git a/lesson1.Rmd b/lesson1.Rmd new file mode 100644 index 00000000..a5e48d64 --- /dev/null +++ b/lesson1.Rmd @@ -0,0 +1,226 @@ + +# A new chapter + +If you haven't yet read the getting started Wiki pages; [start there](https://www.ottrproject.org/getting_started.html). + +To see the rendered version of this chapter and the rest of the template, see here: https://jhudatascience.org/OTTR_Template/. + +Every chapter needs to start out with this chunk of code: + + +```{r, include = FALSE} +ottrpal::set_knitr_image_path() +``` +# W1: Intro to Computing + +## Goals of the course + +- Fundamental concepts in high-level programming languages (R, Python, Julia, WDL, etc.) that is transferable: *How do programs run, and how do we solve problems using functions and data structures?* + +- Beginning of data science fundamentals: *How do you translate your scientific question to a data wrangling problem and answer it?* + + ![Data science workflow](https://d33wubrfki0l68.cloudfront.net/571b056757d68e6df81a3e3853f54d3c76ad6efc/32d37/diagrams/data-science.png){width="450"} + +- Find a nice balance between the two throughout the course: we will try to reproduce a figure from a scientific publication using new data. + +## What is a computer program? + +- A sequence of instructions to manipulate data for the computer to execute. + +- A series of translations: English \<-\> Programming Code for Interpreter \<-\> Machine Code for Central Processing Unit (CPU) + +We will focus on English \<-\> Programming Code for R Interpreter in this class. + +More importantly: **How we organize ideas \<-\> Instructing a computer to do something**. + +## A programming language has following elements: {#a-programming-language-has-following-elements} + +- Grammar structure (simple building blocks) + +- Means of combination to analyze and create content (examples around genomics provided, and your scientific creativity is strongly encouraged!) + +- Means of abstraction for modular and reusable content (data structures, functions) + +- Culture (emphasis on open-source, collaborative, reproducible code) + +Requires a lot of practice to be fluent! + +## What is R and why should I use it? + +It is a: + +- Dynamic programming interpreter + +- Highly used for data science, visualization, statistics, bioinformatics + +- Open-source and free; easy to create and distribute your content; quirky culture + +## R vs. Python as a first language + +In terms of our goals, recall: + +- Fundamental concepts in high-level programming languages + +- Beginning of data science fundamentals + +There are a lot of nuances and debates, but I argue that Python is a better learning environment for the former and R is better for the latter. + +Ultimately, either should be okay! Perhaps more importantly, *consider what your research group and collaborator are more comfortable with*. + +## Posit Cloud Setup + +Posit Cloud/RStudio is an Integrated Development Environment (IDE). Think about it as Microsoft Word to a plain text editor. It provides extra bells and whistles to using R that is easier for the user. + +Today, we will pay close attention to: + +- Script editor: where sequence of instructions are typed and saved as a text document as a R program. To run the program, the console will execute every single line of code in the document. + +- Console (interpreter): Instead of giving a entire program in a text file, you could interact with the R Console line by line. You give it one line of instruction, and the console executes that single line. It is what R looks like without RStudio. + +- Environment: Often, code will store information *in memory*, and it is shown in the environment. More on this later. + +## Using Quarto for your work + +Why should we use Quarto for data science work? + +- Encourages reproducible workflows + +- Code, output from code, and prose combined together + +- Extendability to Python, Julia, and more. + +More options and guides can be found in [Introduction to Quarto](https://quarto.org/docs/get-started/hello/rstudio.html) . + +## Grammar Structure 1: Evaluation of Expressions + +- **Expressions** are be built out of **operations** or **functions**. + +- Operations and functions combine **data types** to return another data type. + +- We can combine multiple expressions together to form more complex expressions: an expression can have other expressions nested inside it. + +For instance, consider the following expressions entered to the R Console: + +```{r} +18 + 21 +max(18, 21) +max(18 + 21, 65) +18 + (21 + 65) +length("ATCG") +``` + +Here, our input **data types** to the operation are **numeric** in lines 1-4 and our input data type to the function is **character** in line 5. + +Operations are just functions in hiding. We could have written: + +```{r} +sum(18, 21) +sum(18, sum(21, 65)) +``` + +Remember the function machine from algebra class? We will use this schema to think about expressions. + +![Function machine from algebra class.](https://cs.wellesley.edu/~cs110/lectures/L16/images/function.png) + +If an expression is made out of multiple, nested operations, what is the proper way of the R Console interpreting it? Being able to read nested operations and nested functions as a programmer is very important. + +```{r} +3 * 4 + 2 +3 * (4 + 2) +``` + +Lastly, a note on the use of functions: a programmer should not need to know how the function is implemented in order to use it - this emphasizes [abstraction and modular thinking](#a-programming-language-has-following-elements), a foundation in any programming language. + +## Grammar Structure 2: Storing data types in the global environment + +To build up a computer program, we need to store our returned data type from our expression somewhere for downstream use. We can assign a variable to it as follows: + +```{r} +x = 18 + 21 +``` + +If you enter this in the Console, you will see that in the Environment, the variable `x` has a value of `39`. + +::: {.callout-tip} +## Execution rule for variable assignment + +Evaluate the expression to the right of `=`. + +Bind variable to the left of `=` to the resulting value. + +The variable is stored in the environment. + +`<-` is okay too! +::: + +The environment is where all the variables are stored, and can be used for an expression anytime once it is defined. Only one unique variable name can be defined. + +The variable is stored in the working memory of your computer, Random Access Memory (RAM). This is temporary memory storage on the computer that can be accessed quickly. Typically a personal computer has 8, 16, 32 Gigabytes of RAM. When we work with large datasets, if you assign a variable to a data type larger than the available RAM, it will not work. More on this later. + +Look, now `x` can be reused downstream: + +```{r} +x - 2 +y = x * 2 +``` + +## Grammar Structure 3: Evaluation of Functions + +A function has a **function name**, **arguments**, and **returns** a data type. + +::: {.callout-tip} +## Execution rule for functions: + +Evaluate the function by its arguments, and if the arguments are functions or contains operations, evaluate those functions or operations first. + +The output of functions is called the **returned value**. +::: + +```{r} +sqrt(nchar("hello")) +(nchar("hello") + 4) * 2 +``` + +## Functions to read in data + +We are going to read in a Comma Separated Value (CSV) spreadsheet, that contains information about cancer cell lines. + +The first line calls the function `read.csv()` with a string argument representing the file path to the CSV file (we are using an URL online, but this is typically done locally), and the returned data type is stored in `metadata` variable. The resulting `metadata` variable is a new data type you have never seen before. It is a **data structure** called a **data frame** that we will be exploring next week. It holds a table of several data types that we can explore. + +We run a few functions on `metadata`. + +```{r} +metadata = read.csv("https://github.com/caalo/Intro_to_R/raw/main/classroom_data/CCLE_metadata.csv") +head(metadata) +nrow(metadata) +ncol(metadata) +``` + +If you don't know what a function does, ask for help: + +```{r} +?nrow +``` + +## Tips on Exercises / Debugging + +Common errors: + +- Syntax error. + +- Changing a variable without realizing you did so. + +- The function or operation does not accept the input data type. + +- It did something else than I expected! + + +Solutions: + +- Where is the problem? + +- What kind of problem is it? + +- Explain your problem to someone! + +