We updated a demo for the case of unknown sigma and p less than n/2-3:
- The demo is located in the folder
demo_glm_MoM
- For usage guidelines, please refer to the
example.R
file in this folder
This demo showcases our method's application in scenarios where the covariance matrix (sigma) is unknown and the number of predictors (p) is less than half the sample size minus 3 (p < n/2-3).
The code will be improved later!
Xingyu Chen, Lin Liu, Rajarshi Mukherjee
We introduce moments-based identification strategies for statistical functionals within high-dimensional generalized linear models (GLMs), where the dimension ( p ) is proportional to the sample size ( n ) and the covariance matrix of covariates is known. Key advantages of our methods include:
-
Computational Efficiency: Relying on only a few low-dimensional moments of the data, our methods are computationally efficient.
-
Generalization to Observational Studies: Our strategies extend to inferential techniques for average treatment effects (ATE) and mean estimands under missing data without requiring sample-splitting or cross-fitting.
-
Root-n-consistent and Normal Estimators: The estimators are Root-n-consistent and asymptotically normal (CAN) under Gaussian covariates and demonstrate universal applicability beyond Gaussian cases.
We implemented two simulation settings.
The first involves estimating GLM (logistic model) coefficients and quadratic forms, where we compare our methods with that of Bellec (2022)1.
- Our method's code is located in the folder
code_simulation/code_glm_mom
. The code is consistent across all files, with variations only in the initial settings section. - Bellec's method's code can be found in the folder
utils/function_of_bellec.R
andcode_simulation/code_glm_bellec
. The main function resides inutils/function_of_bellec.R
, complete with detailed notes. Incode_simulation/code_glm_bellec
, the files are identical except for the initial settings section.
To generate Figures 1, 3, 7, 9, and 11, use glm_clean_cluster_bellec.R
, glm_clean_cluster_mom.R
, and glm_plot_lambda.R
in the folder code_clean_plot
after obtaining the data.
To generate Figures 2, 4, 6, 8, and 12, use glm_clean_cluster_bellec.R
, glm_clean_cluster_mom.R
, and glm_plot_mom_hist_qqnorm.R
in the folder code_clean_plot
after obtaining the data.
The second simulation setting involves estimating the mean of a response under random missingness, where the outcome model is a linear model and the missingness mechanism is a logistic model. We compare our methods with those of Celentano and Wainwright (2023)2.
- Our method's code is in the folder
code_simulation/code_mar_mom
. The code is the same in all files except for the initial setting section. - Celentano and Wainwright's method's code is in the folder
code_simulation/code_mar_celentano
, with minor adjustments made to their original code available at this GitHub repository. The code is the same in all files except for the initial setting section.
To generate Figures 5, 10, 13, and 15, use mar_clean_cluster_mom.R
and mar_plot_lambda.R
in the folder code_clean_plot
after obtaining the data.
To generate Figures 6, 12, 14, and 16, use mar_clean_cluster_mom.R
and mar_plot_mom_hist_qqnorm.R
in the folder code_clean_plot
after obtaining the data.
Any questions about the code, feel free to contact me at xingyuchen0714@sjtu.edu.cn. By the way, you should really encapsulate functions that are used multiple times! I'm already a mess anyway. I will fight this mountain of crappy code later.