Skip to content

Latest commit

 

History

History
97 lines (61 loc) · 4.22 KB

syllabus.md

File metadata and controls

97 lines (61 loc) · 4.22 KB
tags
ggg, ggg2020, ggg298

Syllabus for GGG 298: Tools to support data-intensive research (Winter 2019/2020)

UC Davis

[toc]

Code of Conduct

Please abide by my lab's Code of Conduct in this course.

In particular, this is not an intellectual contest, and please realize that we all have plenty of things to learn.

Course sessions

GitHub site for class - redundant with links below, but more permanent!

Lab

What: hands-on computational work When: Wed 9:15am-noon Where: Bennett Conference Room 203 (2nd floor of the Center for Companion Animal Health)

Sticky notes!!

Discussion

What: read & discuss a paper. When: Fri 12-1pm Where: Shields Library, room 360 (Datalab/Data Science Initiative main classroom)

Homework

There will be 8 paragraph-length homeworks due, one each week on the reading; they'll be assigned a week in advance and due on Fridays at 11am.

Grading

The course is pass/fail, and only graded on homework; you need to hand in 6 of the 8 homeworks to pass.

Office hours

Office hours to meet with Titus will be from 3-5pm on Wednesdays in CCAH 251 (just down the hallway from Bennett); please use this online signup sheet. Don't worry too much about the specific time, this is just to signal to me that you want to talk to me (and puts it on my calendar!). (If no one signs up by 1pm of that Wednesday, I will feel free not to show up!)

Note that Titus is busy on 1/15 between 3-5pm, and out of town on 2/5 and 2/26.

Shannon's office hour will be from 10 to 11am on Fridays. The location is still to be determined but will be somethere in Meyer Hall. Please email me (Shannon) to let me know if you plan to come.

If neither of the listed office hours work and you have questions please email Shannon (sejoslin@ucdavis.edu) to setup a time to meet.

Instructors

C. Titus Brown (IOR) (ctbrown@ucdavis.edu), Shannon Joslin (sejoslin@ucdavis.edu).

Course description

This course will provide a practical introduction to common tools used in data-intensive research, including the UNIX shell, version control with git, RMarkdown, JupyterLab, and workflows with snakemake. The associated discussion section will connect the lab practicals to foundational concepts in data science, including repeatability/reproducibility, statistics, and publication ethics.

This course is open to all graduate students. No prior computational experience is required or assumed. There will be some minimal overlap with GGG 201(b) topics. All materials will be open to the community and freely available online.

Schedule of lab topics

Wednesdays, 9-noon: Bennett Conference Room (2nd floor Center for Companion Animal Health).

These will be lab practicals where we take a solid look at a given piece of technology.

  1. 1/08 : Basic UNIX + R/RMarkdown
  2. 1/15 : UNIX bash shell for file manipulation
  3. 1/22 : conda for software installation
  4. 1/29 : snakemake for data intensive workflows
  5. 2/05 : Project layout and setup
  6. 2/12 : git and GitHub for change tracking in scripts
  7. 2/19 : Slurm and the Farm cluster for doing analysis
  8. 2/26 : R/RMarkdown revisited (CTB OOT)
  9. 3/04 : TBD
  10. 3/11 : TBD

Paper discussions

Fridays, noon-1pm: 360 Shields Library (Data Science Initiative classroom).

These will be discussion periods where we explore some of the literature on techniques and processes for (biological) data science.

  1. 1/10 - Read and discuss: A preliminary review of influential works in data-driven discovery, Stalzer & Mentzel, 2016.
  2. 1/17 - More on week 1 paper; first homework due.
  3. 1/24 - second homework due
  4. 1/31 - third homework due
  5. 2/07
  6. 2/14
  7. 2/21
  8. 2/28
  9. 3/06
  10. 3/13