Skip to content

ThomasKAtkins/TheModernPromethease

Repository files navigation

The Modern Promethease

Caution

The author no longer recommends using information from SNPedia, or tools that use information from SNPedia (such as this tool and Promethease) for the following reasons:

  1. SNP summaries are often misleading. Given that many SNPs have wide-ranging effects, it is often impossible to summarize the effect of a SNP into one short phrase/sentence. See rs6311, rs53576.

  2. Most SNPedia pages are infrequently updated. Despite the ever-increasing understanding of genetic impacts on traits, many SNP summaries have not been updated in the past 5 years.

  3. Giving information on a per-SNP basis ignores the complex genetic architecture underlying many human traits. It is well known that most human traits are polygenic, and in many cases, the impact of a single SNP on a trait is small. Presenting genome interpretation in a SNP-first (rather than genome-wide) context risks misleading users about their relative odds of traits.

While its use is no longer recommended, the tool will remain here as an archive.

Please see the disclaimers.

The Modern Promethease is a free and open source tool that replicates the functionality of the tool Promethease. This tool creates a sumamry of a user's data from the popular genotyping service 23andMe. Example output for the user Lilly Mendel is provided in the mendelgenome/ folder.

Installation Instructions

The code can be downloaded as a .zip file by clicking the green "Code" button in the upper right and selecting "Download ZIP".

Alternatively, to install from the command line using git:

cd <path where you want to install The Modern Promethease>
git clone https://github.com/ThomasKAtkins/TheModernPromethease.git
cd TheModernPromethease

Running Instructions (using pre-built datasets)

Assumes Python3, Pandas and Numpy are installed.

Download your raw data from 23andMe by following these instructions. Then, edit the provided file by deleting every line that starts with # except for the line

# rsid	chromosome	position	genotype

which should be changed to:

rsid	chromosome	position	genotype

Now, to generate a variant report (similar to the one generated by Promethease), run the command

python3 generate_variant_report.py <path to 23andMe file>

to generate two files, variant_report.html, and snpedia_data.csv. Opening variant_report.html will display a graphical report of the 23andMe data, while snpedia_data.csv contains the data in tabular format.

To generate a report containing risk of different conditions, run

python3 generate_variant_report.py <path to 23andMe file> <population>

where population is one of Global, European, African, AfricanOthers, AfricanAmerican, Asian, EastAsian, OtherAsian, LatinAmerican1, LatinAmerican2, SouthAsian, Other. See the ALFA Allele Frequencies page for more information on these categories. The output can be found in trait_report.html.

Re-Generating the Datasets (advanced users only)

One advantage of this tool over Promethease is the ability to update the datasets used to create the tool. We provide somewhat up-to-date versions of these datasets in the data/ folder, but also provide the code to generate these files with the script data/getSnpediaPages.R (requires the devtools, dplyr, and readr libraries). The datasets can be generated using

cd data
rm snp_df.csv && rm geno_df.csv
Rscript getSnpediaPages.R <path to 23andMe file>

The 23andMe file is necessary so the R script knows which SNPedia pages to scrape. This will produce two new data files, snp_df.csv (contains data on the SNP pages) and geno_df.csv (contains data on the individual genotype pages).

To generate the data for the trait report, run the gettraits.py and getlinkage.py. Note: for getlinkage.py, an LDLink API token is required.

About

A tool to create genomic reports based on 23andMe data.

Topics

Resources

License

Stars

Watchers

Forks

Languages