PolSter

Pol II density estimated by statistical inference of transcription elongation rates by total RNA-seq

Overview

The RNA-seq library was prepared from rRNA-depleted non-poly-A transcripts (total RNA-seq) provides a transcriptomic profile of nascent RNAs undergoing transcription with co-transcriptional splicing. In general, the RNA-seq reads exhibit a sawtooth pattern in a gene, which is characterized by a monotonically decreasing gradient across introns in the 5’ to 3’ direction, and by substantially higher levels of RNA-seq reads present in exonic regions. Such patterns result from the process of underlying transcription elongation by RNA polymerase II (Pol II). The objective is to reconstruct the spatial distribution of transcription elongation rates in a gene from a given noisy, sawtooth-like profile.

Demonstration

Run example Total RNA-seq dataset [1] (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36799) of this project. These were the following commands that were used to estimate Pol II density in mouse dataset.

Download input data "input_data.zip"
gcc -lm Estimate_Pol2.c
./a.out /DIRECTORY_input_data/

To interpret transcription elongation rates, you can create such figures as "Estimated_result_example.jpg". The Pol II existence probability is inversely proportional to the elongation rates.

Usage

Input formats

Accepted input formats for this software include txt files. For example, with a txt input, in order to build models, we expect the following columns to have entries for each row in the txt file. Each intron was divided into bins with intervals equal to 400 bp. An exonic region was treated positionally as a single point. Here's an example of a potential input file.

*_read.txt: read count data across genomic bin positions from TSS, * shows gene number.

*_EI.txt: exon or intron across genomic bin positions, * shows gene number.

Particle_num.txt: The number of particles for particle filter: recommend more than 100000.

SRR960177_pickup_gene_num_joint_ei_av_mu_tau_sigma.txt: hyper parameter for each gene, mu is the initial state of variable of state model, tau is noise of system model, and sigma is noise of measurement model (log). The details are described in our paper as follows.

SRR960177_pickup_gene_num_joint_ei_av_chrPosSE_sig_len5cor05_cov01raw.txt: gene number, chromosome number, gene name, strand, start, end. Here, one gene resulted in combined all isoforms in one.

Reference

[1] Sigova AA, Mullen AC, Molinie B, Gupta S et al., Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells., Proc Natl Acad Sci U S A, 2013 Feb 19;110(8):2876-81., PMID: 23382218

Credit

If you use this program in your work, please cite:

Yumi Kawamura, Shinsuke Koyama, and Ryo Yoshida., Statistical inference of the rate of RNA polymerase II elongation by total RNA sequencing., Bioinformatics, Volume 35, Issue 11, 1 June 2019, Pages 1877–1884

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
Estimate_Pol2.c		Estimate_Pol2.c
Estimated_result_example.jpg		Estimated_result_example.jpg
LICENSE		LICENSE
README.md		README.md
input_data.zip		input_data.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolSter

Overview

Demonstration

Usage

Input formats

Reference

Credit

About

Releases

Packages

Languages

License

hjsbio/PolSter

Folders and files

Latest commit

History

Repository files navigation

PolSter

Overview

Demonstration

Usage

Input formats

Reference

Credit

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages