Skip to content

abeconnelly/untap

Repository files navigation

untap

In order to facilitate ease of access, some of the information available through Harvard Personal Genome Project page and the GET-Evidence site has been consolidated into a small SQLite database (~120Mb uncompressed). This project is a collection of scripts to download data, consolidate into a SQLite database, upload to an Arvados project and create an HTML visualization front end for easy exploration of the data.

You can explore the most recent snapshot of the Harvard Personal Genome Project database snapshot available through a Curoverse hosted collection

Quick start

To grab the repository:

$ git clone https://github.com/abeconnelly/untap
$ cd untap

We need to run the application inside a HTTP server,

such as nginx

$ cd $HOME
$ sudo apt-get install nginx
$ sudo /etc/init.d/nginx start
$ mkdir /var/www
$ cat > /etc/nginx/sites-enabled/untap <<EOF
	server {
	  root /var/www;

	  location / {
	  }
	}
EOF
$ sudo ln -s $HOME/untap /var/www/untap
$ sudo chmod -R 777 /var/www/untap
$ sudo nginx -s reload

or with a python module

$ cd html
$ python -m SimpleHTTPServer

Now we need to obtain a dataset. Either 1) download the snapshot provided at the Untap hosted on Curoverse or 2) follow the instructions in the following section to scrape Tapestry and build your own snapshot. In both cases, the database should be put in the root directory, i.e. /untap/hu-pgp.sqlite3.gz.

Now if you go to Untap.html you should see the application running and tabs such as "Summary" should show graphs when you select a dropdown option (e.g. "allergies").

Updating the Database

The Quick start uses a static snapshot of the database and may not be up-to-date. To re-scrape all the data yourself for a more up-to-date copy, see the following instructions.

You may need several dependencies if they're not installed already.

$ sudo apt-get install jq
$ sudo add-apt-repository -y ppa:ethereum/ethereum
$ sudo apt-get install golang
$ mkdir -p ~/go; echo "export GOPATH=$HOME/go" >> ~/.bashrc
$ echo "export PATH=$PATH:$HOME/go/bin:/usr/local/go/bin" >> ~/.bashrc
$ source ~/.bashrc
$ go get github.com/ericchiang/pup
$ sudo apt-get install parallel

To download the database from my.pgp-hms.org and evidence.pgp-hms.org run:

$ ./public-database-snapshot

If you would like to upload to an Arvados project (requires an account on an Arvados system and appropriate config files):

$ ./upload-to-arvados

Installing the html directory in the appropriate place will allow you to see the visualization. Care needs to be taken to make sure the SQLite database file gets copied over properly.

Guided Walkthrough

For a guided walkthrough of how to use this application, see Introduction.

Visualization

Since the SQLite database is so small (~120Mb uncompressed) it can be loaded into the browser and explored directly. There are a few canned visualizations, explanations of the SQLite schema and custom visualizations available. Sometimes the database takes a while to load so please be patient if you don't immediately see any graphs in the Summary, Variants or Custom section.

Summary Information Visualizations

Age Summary

This includes some canned summary statistics for the Harvard Personal Genome Project cohort, including age distribution, gender, ethnicity, etc

Variant

Subject/Variant

This shows a matrix of participants who have genomic data and variants.

Custom

Custom Visualization

This allows you to do your own custom queries. There are some example queries that can be selected in the lower right hand corner.

Schema

Schema

This page gives the schema for the SQLite database provided.

Examples

Schema

This page gives some simple queries that allow you to explore the underlying tables that exist in the SQLite database.

LICENSE

Source code is provided under AGPLv3. All collected data from the Harvard Personal Genome Project is under CC0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •