Stats Bootcamp
-Class 11
-Prepare
-Watch the following videos from StatQuest (it will take ~15 mins to watch them all):
-diff --git a/.nojekyll b/.nojekyll index 36e0c7d2..1fd3d8a5 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -83236ade \ No newline at end of file +25e9c6fa \ No newline at end of file diff --git a/index.html b/index.html index 9715cc3d..6be82519 100644 --- a/index.html +++ b/index.html @@ -243,24 +243,24 @@
Class 11
-Watch the following videos from StatQuest (it will take ~15 mins to watch them all):
-Chromatin accessibility
-You will need to review this material before class 17.
-We’ll use data from the following studies in chromatin accessibility section.
-Schep AN, Buenrostro JD, Denny SK, Schwartz K, Sherlock G, Greenleaf WJ. Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome Res. 2015 PMID: 26314830; PMCID: PMC4617971 [Link]
-Zentner GE, Henikoff S. Mot1 redistributes TBP from TATA-containing to TATA-less promoters. Mol Cell Biol. 2013 PMID: 24144978; PMCID: PMC3889552. [Link]
-GViz enables visualization of genomic signals in a “track” format. Review the GViz vignette, especially the “Basic Features” section, which provides an overview.
-valr is a tool set for genome interval manipulation with R. Read over the “Getting Started” to get a sense of the tools and the types of analysis they enable.
-ComplexHeatmap provides a flexible framework for generating heatmaps. Look over the “A Single Heatmap” section (section 2).
- - -You will need to review this material before class 20.
-Skene PJ, Henikoff S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife. 2017 PMID: 28079019; PMCID: PMC5310842. [Link]
-Kaya-Okur HS, Wu SJ, Codomo CA, Pledger ES, Bryson TD, Henikoff JG, Ahmad K, Henikoff S. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun. 2019 PMID: 31036827; PMCID: PMC6488672. [Link]
-MACS is the gold-standard in peak calling. It models read coverage as a Poisson process, enabling identification of regions of higher than expected coverage (i.e., peaks) to be identified using a single parmaeter (lambda) that captures the mean and variance of read coverage. Read over the paper to get a sense of how it works.
-We’ll use the motifRG R library, which implements a discriminative (i.e., foreground / background) approach for motif discovery and answer the question, “Which sequences drive factor association to DNA?”.
- - -You will need to review this material before class 23.
-We’ll use data from the following studies in the RNA-seq section.
-Hubbard KS, Gut IM, Lyman ME, McNutt PM. Longitudinal RNA sequencing of the deep transcriptome during neurogenesis of cortical glutamatergic neurons from murine ESCs. F1000Res. 2013 PMID: 24358889; PMCID: PMC3829120. [Link]
- - -These recent papers provide insights that could only be made with the information gleaned by long-read sequencing.
-Alfonso-Gonzalez C, Legnini I, Holec S, Arrigoni L, Ozbulut HC, Mateos F, Koppstein D, Rybak-Wolf A, Bönisch U, Rajewsky N, Hilgers V. Sites of transcription initiation drive mRNA isoform selection. Cell. 2023 PMID: 37178687; PMCID: PMC10228280. [Link]
-Choquet K, Baxter-Koenigs AR, Dülk SL, Smalec BM, Rouskin S, Churchman LS. Pre-mRNA splicing order is predetermined and maintains splicing fidelity across multi-intronic transcripts. Nat Struct Mol Biol. 2023 Aug;30(8):1064-1076. doi: 10.1038/s41594-023-01035-2. Epub 2023 Jul 13. PMID: 37443198.
-Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PGS, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science. 2022 PMID: 35357919; PMCID: PMC9186530. [Link]
-Stergachis AB, Debo BM, Haugen E, Churchman LS, Stamatoyannopoulos JA. Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science. 2020 Jun 26;368(6498):1449-1454. doi: 10.1126/science.aaz1646. PMID: 32587015.
- - -You will need to review this material before class 31.
-In this section we will analyze data generated by the 10x Genomics Chromium scRNA-seq platform. The following paper introduces the technology:
-Zheng GXY, Terry JM, Belgrader P, et al. Massively parallel digital transcriptional profiling of single cells. Nat Communications. 2017;8:ncomms14049. https://doi.org/10.1038/ncomms14049 [Link].
- - -