Skip to content

Tutorial 2: Special Options

huilisabrina edited this page Jul 16, 2018 · 8 revisions

The mtag command line tool allows users to take advantage of several "special cases" of MTAG described in the Online Methods of the paper. Here, we walk through these options and describe their underlying assumptions. While all of the options described below tend to speed up the runtime of mtag, they will lead to non-optimal -- and possibly even misleading -- results if the underlying assumptions do not hold. Please take care in checking that your data approximately satisfies the assumptions (constant sample size, no sample overlap, etc.) before using any of the flags!

All applications of MTAG discussed below will use two summary statistics that have been specifically formatted for the command line tool (see the Sample GWAS Results and Data Format section in in the first tutorial). We use the main results of a GWAS on educational attainment by Okbay et al. (2016) (EA2) along with a GWAS on educational attainment of individuals in the UK Biobank (UKB). These EA2 and UKB summary statistics can be found here and here.

--no_overlap MTAG in the absence of sample overlap

Assumes: no overlap between any of the cohorts in any pair of GWAS studies fed into mtag.

When there is no sample overlap between any pair of GWAS summary statistics used in mtag, we can use the --no_overlap flag to automatically set the residual covariance terms (i.e., the off-diagonal terms of Sigma) to 0. As a result, for T summary statistics files, bivariate LD Score Regression will only need to be run T rather than T(T+1)/2 times. This flag may only lead to significant reductions in time when T is large and Omega is not estimated numerically. This specification of MTAG does not account for correlation in estimation error across traits that is due to bias, which means that the resulting MTAG standard errors should be inflated by the square root of the estimated LD score intercept (Sigma's diagonal terms).

python mtag/mtag.py  \
	--sumstats EducAtt_ea2.txt,EducAtt_ukb.txt \
	--out ./tutorial_results_2.1 \
	--stream_stdout \
	--no_overlap &
    
  [...]
  
  Summary of MTAG results:
  ------------------------
  Trait                 N (max)  N (mean)  # SNPs used  GWAS mean chi^2  MTAG mean chi^2  GWAS equiv. (max) N
  1  .../EducAtt_ea2.txt  250000   134825    790847       2.011            2.501            371134
  2  .../EducAtt_ukb.txt   52863    52863    790847       1.204            2.468            380237
  
  Estimated Omega:
  [[  6.492e-06   5.026e-06]
  [  5.026e-06   3.970e-06]]
  
  Estimated Sigma:
  [[ 0.92   0.   ]
  [ 0.     1.028]]

--perfect_gencov: Different measures of the same trait

Assumes: the T summary statistics used in MTAG are GWAS estimates for traits that are perfectly correlated with one another, i.e., each GWAS is on a different measure of the same "trait".

When multiple GWAS are assumed to be on different measures of the same trait with (possibly) different degrees of measurement error, the --perfect_gencov flag can be used to pin down the covariance terms of Omega so that genetic correlations across the separate GWAS traits are unity.

Our set of sample summary statistics appears to fulfill this condition as the individual "traits" we are analyzing both measure the years of education. From the estimates of Omega listed above, we can see that the two traits are almost genetically perfectly correlated with another anyway, so it is no surprise that our results only marginally change when we restrict the genetic correlation to be 1:

python mtag/mtag.py  \
	--sumstats EducAtt_ea2.txt,EducAtt_ukb.txt \
	--out ./tutorial_results_2.2 \
	--stream_stdout \
	--perfect_gencov &
    
  [...]
  
  Summary of MTAG results:
  ------------------------
  Trait                 N (max)  N (mean)  # SNPs used  GWAS mean chi^2  MTAG mean chi^2  GWAS equiv. (max) N
  1  .../EducAtt_ea2.txt  250000   134825    790847       2.011            2.022            252528
  2  .../EducAtt_ukb.txt   52863    52863    790847       1.204            2.022            264574
  
  Estimated Omega:
  [[  6.492e-06   5.077e-06]
  [  5.077e-06   3.970e-06]]
  
  Estimated Sigma:
  [[ 0.92   0.385]
  [ 0.385  1.028]]

-equal_h2: Performing meta-analysis with mtag

Requires: --perfect_gencov Assumes: Variation between "traits" is only due to non-genetic factors. All summary statistics files have in MTAG have the same heritability as they considered to be results on the same measure of a single trait.

If we also specify --equal_h2 (equal heritability of traits) in addition to --perfect_gencov then we are assuming that the multiple input files are GWA studies of the same measure of a single trait. In this case, we can use mtag to implement a type of inverse-variance meta-analysis that can handle sample overlap in the GWAS results. As described in the Online Methods section of the accompanying paper, the MTAG effect size estimator simplifies such that it no longer requires an estimate of Omega.

python mtag/mtag.py  \
	--sumstats EducAtt_ea2.txt,EducAtt_ukb.txt \
	--out ./tutorial_results_2.3 \
	--stream_stdout \
	--perfect_gencov \
	--equal_h2 &
    
  [...]
  
  Summary of MTAG results:
  ------------------------
  Trait                 N (max)  N (mean)  # SNPs used  GWAS mean chi^2  MTAG mean chi^2  GWAS equiv. (max) N
  1  .../EducAtt_ea2.txt  250000   134825    790847       2.011            1.998            246760
  2  .../EducAtt_ukb.txt   52863    52863    790847       1.204            1.998            258531
  
  Omega hat not computed because --equal_h2 was used.
  
  Estimated Sigma:
  [[ 0.92   0.385]
  [ 0.385  1.028]]