I am the Hoben and Patricia Thomas and Thomas and Ann Hettmansperger Early Career Professor of Statistics and a faculty affiliate in political science at Penn State University. My research focuses on methodological and applied problems in the social sciences, including elections, legislative redistricting, racial disparities, and missing data. Read more on my website.
The sections below organize my public projects into a few areas, with some cross-listing.
bases is a lightweight R package to create basis expansions of covariates for use in modeling functions. These allow kernel methods, approximate GPs (via random Fourier features), and other nonparametric methods to be used inside other R modeling functions.
seine is an R package for estimating conditional means from aggregate data (i.e., ecological inference).
adjustr is an R package for efficient Stan model sensitivity analysis using leave-one-out importance sampling.
conformalbayes is an R package for finite-sample calibration of predictive intervals for Stan models.
redist is software written in R and C++ to analyze districting plans with sampling algorithms. The accompanying redistmetrics is specialized for quantifying aspects of plans like compactness, partisan performance, and so on.
fifty-states is a comprehensive project to generate public redistricting simulations for congressional districts in all 50 states, for 2010 and 2020.
census-2020 contains Census demographics joined to VEST precinct-level vote data for easy use.
PL94171 is an R package for downloading and processing the first redistricting data released decennially by the U.S. Census Bureau under P.L. 94-171.
ggredist is an R package for making areal maps, especially for districting plans.
harm-redistricting is replication code for Individual and Differential Harm in Redistricting, a new set of tools for analyzing effects of districting plans on individuals and groups.
midterms-22 is a dynamic Bayesian model to forecast the 2022 U.S. midterm elections, in collaboration with Data for Progress.
president, senate, and us-house-20 are dynamic Bayesian models for the 2020 federal elections.
dem-primary-20 replicates the analysis I published in the Washington Post on the 2020 presidential primary.
birdie is an R package that performs BISG, which probabilistically imputes individual race from surnames and addresses. More importantly, birdie allows for statistically valid estimates of racial disparities (generally, of a conditional mean of some outcome variable by race) using the BISG probabilities. Paper replication code is available in birdie-replication.
seine is an R package for ecological inference, where individual relationships (commonly, vote by race) are estimated from aggregate data.
easycensus is an R package for finding Census variables to download, downloading Census data in a clean format, and labeling and tidying the results.
PL94171 is an R package for downloading and processing the first redistricting data released decennially by the U.S. Census Bureau under P.L. 94-171.
blockpop is an R package for working with FCC block-level population estimates, which are based on new roads and map data, along with decennial Census and ACS data.
tinytiger is a lightweight alternative to the tigris package for downloading TIGER/Line shapefiles from the Census.
wacolors is a set of colorblind-friendly palettes, both discrete and continuous.
legal is a Quarto template (via LaTeX) for legal filings and expert reports.
science is a Quarto template for Science journals.
cmc-article is a lightweight Quarto template for scientific papers.