OVERVIEW
This is a 3-credit course for graduate students that introduces the powerful open-source computing tools that are used in biological research for the creation, organization, manipulation, processing, analysis, and archiving of both small data sets and “big data”. This course is designed to prepare and enable students to use computational tools for biological applications in advanced courses and independent research projects. The primary topics covered are: data formats and repositories, command line Linux computing and scripting, regular expressions, super-computing, data wrangling and visualization with R (tidyverse), computer programming with PYTHON, version control and dissemination of scripts and programs with git and GitHub, and typesetting with markdown.
RESOURCES
Reference Book (not required): Computing Skills for Biologists
CSB Guide to Downloading Software
CSB Text Book Resource Downloads
STUDENT LEARNING OUTCOMES
Upon the successful completion of this course, students should be able to:
- Recognize, describe, and organize data into standard biological data structures
- Locate scientific data repositories and extract data
- Operate UNIX/LINUX computers from command line
- Construct and modify computer programming/scripting logic structures for processing biological data (
bash
,R
,python
) - Use version control software (
git
) - Describe and use regular expressions to query data
- Typeset with
LaTeX
orMarkDown
variants - Use the most popular open-source tools for biological data manipulation
INSTRUCTIONAL METHODS AND ACTIVITIES
Computation for 21st Century Biologists will convene on Fridays for 2.5 hours. Class periods will involve interactive lectures that require each student to have a computer designed for content creation (Linux, OSX, Windows, not chrome, not iOS, not Android). Homework exercises will embellish upon concepts addressed in lecture. Participation involves attending lectures and performance on unannounced quizzes. Weekly Assignments will be given to reinforce concepts covered in lectures and encourage students to start using computational tools. Exams will be used to evaluate comprehension of the materials covered in lectures and assignments. For undergraduates only, a comprehensive Final Exam will be used to assess the learning objectives detailed above.
Rather than having a final exam, graduate students are expected to complete a Final Project involving the automation of the manipulation and/or analysis of data, The code should be archived on GitHub. A report written in Latex or Markdown will be due during the final exam period. The report should be concise in stating what the problem is, describing the strategy used for the solution, and describing how the code works (be sure to include a flow-chart or outline describing what code does). Each student will give a 10-minute presentation during the Final period on their project.
Project examples: automatically process data from experimental apparatus; image analysis; automated reporting of experimental results; downloading and organizing data from online repositories; etc…
CLASSROOM & OFFICE LOCATIONS
Lectures are F 2-4:30 CCH 206
Office hours are W-Th 2:30-5 on Zoom or TH234
Grades will be maintained on black board.
SECTION 1. WELCOME TO THE MATRIX
SECTION 2. DATA WRANGLING AND VISUALIZATION WITH R
SECTION 3. PROGRAMMING WITH PYTHON
FINAL EXAM: "Welcome to the Desert of the Real"