Skip to content

BRCAChallenge/mavedb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 

Repository files navigation

Overview

MaveDB is a biological database for Multiplex Assays of Variant Effect (MAVE) datasets, such as those generated by Deep Mutational Scanning (DMS) or Massively Parallel Reporter Assays (MPRA). The software in this repo generates content for the Variant Effect Maps track hub, which renders MaveDB assay results mapped against the Hg38 reference genome. This software relies on the mappings of the raw MaveDB scores from the original reference sequences to the canonical human reference sequences, building upon mappings generated by Arbesfeld et al.

Prerequisites

This software relies on the cool-seq-tool python package to map protein to genomic coordinates. The cool-seq-tool package in turn relies on the Universal Transcript Archive (UTA) database, the SeqRepo sequence mapping resource, and other resources. See the installation instructions for cool-seq-tool for more information.

This software also requires the UCSC Genome Browser executables bedSort and bedToBigBed. These should be installed in your path prior to execution.

Installation

  1. Install the cool-seq-tool package, following the package installation instructions.

  2. Install DynamoDBLocal_lib

Execution

  1. Launch Postgresql

  2. In the DynamoDBLocal_lib directory, execute the command

    java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb
    
  3. In a separate shell, launch the cool-seq-tool API service as follows:

    1.Navigate to the cool-seq-tool directory, which contains the file Pipfile. Execute the command

    pipenv shell
    
    1. Navigate to the cool_seq_tool source directory (which contains the file api.py)

    2. Execute the command

    uvicorn cool_seq_tool.api:app --reload
    
  4. In a third shell:

    1. Activate the cool-seq-tool virtual environment

    2. Navigate to the cool_seq_tool source code directory.

    3. Execute the command pipenv shell

    4. Set the following environment variables

       export UTA_VERSION=uta_20210129.pgd.gz
       export UTA_PASSWORD=uta
       export SEQREPO_ROOT_DIR=/usr/local/share/seqrepo/latest
       export TRANSCRIPT_MAPPINGS_PATH=${PWD}/cool_seq_tool/data/transcript_mapping.tsv
       export AWS_ACCESS_KEY_ID=fakeMyKeyId
       export AWS_SECRET_ACCESS_KEY=fakeSecretAccessKey
      
  5. Navigate to the src directory under mavedb.

  6. Execute with a command as follows:

    export PATH=../:${PATH}
    src/mavedb_to_trackhub.py \
           -i input/00000098-a-1.json \
           -n "Variant Effect Maps" \
           -t ../trackDb.txt \
           --bed_dir output/bed \
           -c output/coordinates \
           -l output/lm \
           -b output/bigBed \
           -s ../hg38.chrom.sizes \
           -d 1
    

    where

    --input (-i) specifies the name of one or more input json files (wildcards accepted)

    --track_name (-n) specifies the name of the track (default: "Variant Effect Maps")

    --trackDb (-t) specifies the pathname of the output trackDb file

    --bed_dir specifies a subdirectory for the output bed files

    --coordinates (-c) specifies an optional output file that maps the coordinates from the mavedb mappings to the reference genome

    --location_matrix_dir (-l) specifies a subdirectory for the output location matrix files

    --bigBed_dir (-b) specifies a subdirectory for the output bigBed files

    --chrom_sizes (-s) specifies the chrom sizes file for the reference genome

    --debug (-d) turns on debugging information

This code performs a two-step process. First, it generates a bed file with the coordinates of the assay scores plus a location matrix file with the assay values. Second, it calls bigHeat to generate a heatmap, represented as a set of bigBed files, one file per alternate residue (amino acid or nucleotide). These bigBed files can then be viewed in the browser as track hubs.

About

Software for generating the MaveDB track hub

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published