-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain how summary stats works for TWAS #6
Comments
For each genes there is two vectors: The expression weight of each SNP and the standardized effect sizes of SNP Z score of each SNP. The matrix multiplication of two vectors produce a Z-score of expression and trait (WZ), where under null data (no association) and a multi-variate normal assumption Z_gwas ~ N(0,Σs,s) W_twas * Z_gwas The variance of Z_twas is the product between square of W_twas and the Correlation matrix among all SNP(the LD Matrixs) : W_twas * Cor.LD * W_twas The imputation Z-score of cis-genetic effect on trait is therefore Z_twas = (W_twas %*% Z_gwas)/( W_twas * Cor.LD * W_twas) ^1/2 It occurs that, the Z_twas score is in fact still based on the effect of the SNP on traits, which may largely via their impact on protein structures, while taking consideration of how those same SNPs may still have impact on the expression level of the protein, which will no doubt impact traits as well. This understanding may rise issues of translating this method onto other molecular phenotype. For the effect of cis SNP on expression is directly impacting the abundance of mRNA and hence the protein, but the SNP effect on methylation and polyA tails are manifest firstly on mRNA abundance. In other word, the information from methylation and polyA tails are already encoded in the expression weights, which render them useless in this case. |
@hsun3163 good work on the summary statistics version explanation. Two suggestions:
I agree -- it seems you already have some causality model / graph in your head which is very good. But the idea is, that if our model is good enough, the model itself should be able to tell this and decide to "use" additional information or not. This is what the multivariate-TWAS model is meant for. |
For each genes, the Z score after TWAS are noted as , Where W is , is the covariate matrix between SNP and gene expression, which are populated by various different algorithm in Fusion_Weight_Compute.R. is the LD matrix for all SNPs. *is a vector of N elements containing the GWAS association statistics for each of the SNP of said genes (Gusev 2016) The exact components that specify vary based on how β was estimated, under the case of GWAS between one SNP and the phenotype,(Maier 2017) Where y is the phenotype of interests, N is the total number of SNP, and is the a vector of the population base mean centered genotype for the *SNP corresponding to y With the assumption that Under further assumptions that
The model can be simplified as follow (Maier 2017) Gusev et al. “Integrative approaches for large-scale transcriptome-wide association studies” 2016 Nature Genetics Maier "A practical introduction to some theoretical concepts in quantitative genetics." 2017 https://rawgit.com/uqrmaie1/statgen_equations/master/statgen_equations.html#summary-of-equations |
@hsun3163 looks great! let's discuss it later today. |
@hsun3163 food for thought -- given we have multivariate TWAS predictions (eg joint coefficients predicted for gene expression in 3 tissues, plus other phenotypes such as splicing etc), how do you perform then a TWAS type of test for all the molecular features combined? It is kind of obvious if you work with the full data, but not exactly obvious in the summary statistics space. I think you can try to derive the method, though |
@hsun3163 how the model looks like, and what approximation has been made.
The text was updated successfully, but these errors were encountered: