Programming skills and software tools for building automated bioinformatics pipelines and computational biology analyses. Emphasis on UNIX tools and R libraries for distilling raw sequencing data into interpretable results. This course is aimed at students familiar with UNIX and with some programming experience in python, R, or C/C++.
Please click on the links above for email addresses.
Monday and Wednesday, 9:00-10:20 am, Foege S110 (http://www.washington.edu/home/maps/southcentral.html?gnom).
We will use Slack during class and outside of class to communicate, share code snippets, ask and answer questions. The class slack is here:
You will receive an invitation to join prior to the first class.
- No official office hours. Post questions on Slack as needed.
- Substantial background in molecular and cellular biology, genetics, biochemistry, or related disciplines.
- Familiarity with UNIX.
- Some programming experience in python, R, or C/C++.
- Students are encouraged to have taken GENOME559 and/or GENOME560.
- The course involves hands-on programming during class time. We will use the GS compute cluster, so make sure you can log into it from your computer remotely.
- All programming projects are due by the start of class on the date listed.
- You are welcome to talk to classmates about principles for solving problems, but please do not share code or program together. In many ways, writing your own code is where you will learn the most for this class.
There will be no examinations.
Grades will come 50% from the programming projects and 50% from class participation.
We will read from several online resources and tutorials. I strongly encourage you to read all of the material in the following:
- Comprehensive single-cell transcriptional profiling of a multicellular organism (Packer et al)
- Git Basics
- Pro Git
- BASH basics
- Essential UNIX
- Sed and Awk
- Sed and Awk, pocket ref
- STAR Manual
- SAM format
- samtools
- BED format
- bedtools
- R Markdown: the definitive guide
- R for Data Science
- ggplot2: elegant graphics for data analysis
- Monocle: an analyis toolkit for single-cell RNA-seq
- Garnett: Automated cell type classification
- R packages
Specific, selected readings for the course will be listed in the course schedule below.
- Visual studio code - An outstanding code editor and integrated development environment
- Rstudio - An integrated development environment for R
Date | Topic | Reading | Assigments |
---|---|---|---|
3/25 | Course overview, student setup, and version control html pdf | [Git Basics](https://https://www.freecodecamp.org/news/learn-the-basics-of-git-in-under-10-minutes-da548267cc91/; Packer et al) | |
3/27 | Intro to bioinformatics pipelines, automation html | Essential UNIX; BASH basics (sections 1-7) | |
4/1 | Tools for working with tables html | Sed and Awk | |
4/3 | NGS read alignment html | SAM format; bedtools | |
4/8 | no class, Cole at NHGRI Training Meeting | Project 1 due | |
4/10 | Bespoke tools for exploratory analysis html | Monocle documentation; Garnett documentation | |
4/15 | Electronic lab notebooks with Markdown html; | R for Data Science (Chapter 27); R Markdown (chapter 3) | |
4/17 | Making figures html | R for Data Science (Chapter 13) | |
4/22 | Tools for working with tables, part II html; Relational databases html | R for Data Science (Chapters 10, 12, and 5 ); R for Data Science (Chapter 13) | |
4/24 | R packages html | R packages (Wickham) | Project 2 due |