Skip to content

Latest commit

 

History

History
124 lines (104 loc) · 14 KB

README.md

File metadata and controls

124 lines (104 loc) · 14 KB

BCB 731: Critical readings in biomedical statistics and machine learning

BCB 731 (a.k.a Defense Against the Dark Arts) is a survey of recurring statistical errors and pitfalls which are sometimes used to exaggerate the weight of evidence for novel biological claims or inflate the estimated accuracy of proposed predictive biomedical models. This course focuses on misapplied analyses of data sources where a small number of biological samples are quantified into very high dimensional feature spaces, such as in genomics, proteomics, and biomedical imaging.

Crucially, this is not a course about data falsification or intentional research misconduct. Our focus is the hazy space in which good intentions meet flawed incentives, motivated reasoning, and high dimensional data.

Fall 2023 Schedule

Date Topic Papers
10/2 Reproducibility, and the lack thereof, in scientific research
10/4 Empiricism, scientific models, statistics, machine learning, and data analysis
10/9 Machine learning, model evaluation, overfitting, and generalization (whiteboard)
10/16 The frequentist hypothesis testing version of overfitting: p-hacking, HARKing, & related phenomena (whiteboard)
10/18 Into the Garden of Forking Paths (studies with same data and many analysts)
10/23 Optimist: Genetic basis for clinical response to CTLA-4 blockade in melanoma Snyder 2014 NEJM
10/25 Critic: Genetic basis for clinical response to CTLA-4 blockade in melanoma Snyder 2014 NEJM
10/30 Optimist: A neoantigen fitness model predicts tumor response to checkpoint blockade immunotherapy Łuksza 2017 Nature
11/1 Critic: A neoantigen fitness model predicts tumor response to checkpoint blockade immunotherapy Łuksza 2017 Nature
11/6 Optimist: Key Parameters of Tumor Epitope Immunogenicity Revealed Through a Consortium Approach Improve Neoantigen Prediction Wells 2020 Cell
11/8 Critic: Key Parameters of Tumor Epitope Immunogenicity Revealed Through a Consortium Approach Improve Neoantigen Prediction Wells 2020 Cell
11/13 Beginner p-hacking bootcamp: leaking labels through feature construction and selection (notebook)
11/15 Intermediate p-hacking bootcamp: overfitting a classifier from metadata (notebook)
11/20 Advanced p-hacking bootcamp: classical p-hacking vs. label memorization vs. unkosher feature selection
11/27 Optimist: Microbiome analyses of blood and tissues suggest cancer diagnostic approach Poore 2020 Nature
11/29 Critic: Microbiome analyses of blood and tissues suggest cancer diagnostic approach Poore 2020 Nature Gihawi 2023 mBio
12/4 Just how well can we predict TCGA cancer type from metadata features? (notebook)
12/6 Odds and Ends Rojas 2023 Nature

Links

REPRODUCIBILITY CRISIS

P-HACKING (AND RELATED COMMON DISASTERS IN STATISTICAL HYPOTHESIS TESTING)

RESEARCH SCANDALS

OTHER CLASSES

EARLY 20TH CENTURY STATISTICS

EXPLORATORY DATA ANALYSIS

STATS/ML

STATS/ML BOOKS

MODEL OVERFITTING / INTERPOLATIVE MEMORIZATION (AKA DOUBLE DESCENT)

CAUSAL INFERENCE

PRE-16TH CENTURY SCIENCE & PROTO-SCIENCE:

PRE-MODERN STATISTICS

POST-16TH CENTURY EMPIRICAL SCIENCE (WITHOUT MUCH STATISTICS):