Seak, which stands for sequence annotations in kernel-based tests, is an open-source Python software package for performing set-based genotype-phenotype association tests. It allows for the flexible incorporation of prior knowledge, such as variant effect predictions, or other annotations, into variant association tests via kernel functions. The mathematical implementation of these tests is based on FaST-LMM- (Listgarten et al., 2013; Lippert et al., 2014). Fast simulation of LRT test statistics is based on RLRsim (Scheipl et al., 2008)
Seak provides interfaces for all data loading functionalities (seak.data_loaders) in order to maximize flexibility. This way users can easily adapt the package to the input data types of their choice.
- Free software: Apache Software License 2.0
The installation of seak requires Python 3.7+ and the packages numpy and cython. All other dependencies are installed automatically when installing the package.
Clone the repository. Then, on the command line:
pip install -e ./seak
You can find a full documentation of seak including an API reference on https://seak.readthedocs.io/.
Main citation:
Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes by Remo Monti, Pia Rautenstrauch, Mahsa Ghanbari, Alva Rani James, Matthias Kirchler, Uwe Ohler, Stefan Konigorski & Christoph Lippert, Nature Communications, 2022.
Summary statistics for that study are available on the GWAS catalog, accession number 36088354.
Lippert, Christoph, et al. "Greater power and computational efficiency for kernel-based association testing of sets of genetic variants." Bioinformatics 30.22 (2014): 3206-3214.
Listgarten, Jennifer, et al. "A powerful and efficient set test for genetic markers that handles confounders." Bioinformatics 29.12 (2013): 1526-1533.
Scheipl, Fabian, Sonja Greven, and Helmut Kuechenhoff. "Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models." Computational statistics & data analysis 52.7 (2008): 3283-3299.
A more complete list of references can be found on readthedocs .