baitset_design

A general script to determine the region(s) of interest for designing target capture baits.

This script will take a list of gene names, and transcripts IDs, as input and identify the coordinates of the exons or introns of the genes (and transcripts) of interest. It uses a GTF file with transcript annotations for the coordinates of the exons to determine which regions to extract. When no transcript ID(s) is specified for a particular gene, the script will output the union of overlapping exons in multiple transcripts.

Installation

There is no installation process for this program. It is a python script that can be downloaded and run on any system with Python 2.7 installed.

Usage

List the available command line parameters.

python <PATH_TO_DIRECTORY_SCRIPT_IS_LOCATED>/design_target_regions.py -h

Example to extract exon regions for genes in gene_list.txt using all ensembl 75 transcripts for those genes.

python <PATH_TO_DIRECTORY_SCRIPT_IS_LOCATED>/design_target_regions.py -g gene_list.txt -o $PWD -f exon

Requirements

Gene input file

This file contains the genes of interest to design baits for the exons/intron regions. It can have 1 or 2 columns.

Gene - HUGO gene symbol which contains the variant.
Transcript ID - A comma-delimited list of transcript IDs corresponding to the gene. Only exon/intron regions will be extracted from these transcripts.

GTF file

This is a annotation file containing all the genes and their associated transcripts, along with the corresponding coordinates for the genes, transcripts and their features. There are default files, but these assume the user is using the Eris compute cluster (in which the default files are located). There is an option (-a, --annotation) to input a different GTF file. Note that the annotation file is assumed to be gzipped (.gz).

Parameters

Parameter	Description	Default
-h, --help	Show the help message and exit	NA
-g, --genes	A file containing HUGO gene symbols of interest. The gene name need to match the gene names in the GTF annotation file.	NA (Required parameter)
-o, --output_dir	Output directory to store output files.	'.'
-v, --ensembl_ver	Ensembl annotation version to use. This assumes user is on the Eris computer cluster and the file paths are hardcoded. Use -a option for user-specified GTF file.	75
-f, --features	Features to target in the gene (exon, intron).	exon,intron
-u, --upstream_buffer	Number of base pairs that should be added upstream the regions of interest.	0
-d, --downstream_buffer	Number of base pairs that should be added downstream the regions of interest.	0
-a, --annotation	A gzipped GTF file containing transcript annotations for determining the regions to extract.	None

Output files and formats

Output file

When the program completes, there should be output files in gene-specific directories within the specified output_dir. There will be files for each featureType specified (i.e., exon and intron). The 7 columns contain:

gene name - The HUGO gene symbol that was input.
region index - The index of the region to be baited.
chrom - The chromosome of the region.
start - The region start position.
end - The region end position.
transcript IDs - A comma-delimited list of transcript IDs containing regions covered by the region.
feature index - A comma-delimited list of the feature indices. These are relative to the other features in the transcript. If extracting exons, this number would be the exon number.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
TargetDesigner		TargetDesigner
.gitignore		.gitignore
README.md		README.md
design_target_regions.py		design_target_regions.py
ensemble_coding_regions.py		ensemble_coding_regions.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

baitset_design

Installation

Usage

Requirements

Gene input file

GTF file

Parameters

Output files and formats

Output file

About

Releases

Packages

Languages

ccgd-profile/baitset_design

Folders and files

Latest commit

History

Repository files navigation

baitset_design

Installation

Usage

Requirements

Gene input file

GTF file

Parameters

Output files and formats

Output file

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages