python script for calculating expression values from RNA-seq
GenomonExpression is a software for simply calculating transcriptome expression values from rna sequencing data. The procedure is as follows:
- Filter inconsistent read pairs (sam format flag 2 is on) and low mapping quality reads (default above 20).
- For each specified exon, calculate the aligned bases.
- For each refseq gene, calculate the aligned bases.
- For each gene symbol, get the associated refseq genes with maximum mapped bases divided by region size.
- Derive FPKM value for each gene symbol.
Note that this software is just for obtaining gene symbol bases expression values. For those who want to get the expression values for each splicing variant, go to cufflinks, kallisto, salmon and so on.
Python (>= 2.7, 3.6, 3.7), pysam, annot_utils
pip install genomon_expression
For the last command, you may need to add --user if you are using a shared computing cluster.
pip install genomon_expression --user
- Install the bedtools and set the path to it.
- Install annot_utils
genomon_expression [-h] [--version] [--grc]
[--genome_id {hg19,hg38,mm10}]
[-q mapping_qual_thres] [--keep_improper_pair]
[--debug]
sequence.bam output_prefix
You can check the manual by typing
genomon_expression -h
The primary result is ${output_prefix}.sym2fkpm.txt, in which the first column is the gene symbol and the second column is the FPKM value.