This repository contains Python code for scalable parameter estimation and inference in massive linear mixed models with crossed random effects. The algorithms are taken from our papers:
- Katelyn Gao and Art Owen. Efficient moment calculations for variance components in large unbalanced crossed random effects models. Electronic Journal of Statistics, 11(1): 1235-1296, 2017. http://projecteuclid.org/euclid.ejs/1492135234.
- Katelyn Gao and Art Owen. Estimation and inference for very large linear mixed effects models. Arxiv e-prints, 2016. http://arxiv.org/abs/1610.08088v2.
We assume that there are two crossed factors; the examples from e-commerce in the papers have as factors users and products. The data are assumed to reside on one machine.
Requires numpy and cleaned data (no NAs). For the crossed random effects model, download cre.py. For the linear mixed model, download mixed.py.
From the command line, run python cre.py fileName.txt.
- fileName.txt contains the data in log-file format: (i,j,Yij), where i is the level of the first factor, j is the level of the second factor, and Yij is the response of interest. Each line is an observation and the quantities are separated by spaces.
- Prints out estimated variance components and their conservative variances.
From the command line, run python mixed.py fileName.txt
- fileName.txt contains the data in log-file format: (i,j,Yij,xij), where i is the level of the first factor, j is the level of the second factor, Yij is the response of interest, and xij are the predictors (be sure to include a '1' if an intercept is desired). Each line is an observation and the quantities are separated by spaces.
- Prints out estimated regression coefficients and variance components and their asymptotic variances.