-
Notifications
You must be signed in to change notification settings - Fork 0
JMF47/recountNNLS
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
## Overview This package provides an interface to estimate transcript abundances of any samples quantified by the aligner *Rail-RNA*. This method is a non-negative least squares (NNLS) estimation that infers the number of reads that originated from each transcript of the coding portion of the GencodeV25 transcriptome. The model does not require raw aligned `BAM` files, but is content with compressed coverage statistics (coverage of the genome at a basepair level and across annotated junctions) primarily stored in `bigwig` formats. The more than 70,000 samples compiled by *recount2* already have transcript expression pre-computed by this method and is directly accessible. To replicate the abundance estimation of any SRA project in *recount2*, the user only needs to supply the SRA project id. Otherwise, users can supply the necessary information as outlined further below to utilize this package to carry out transcript abundance estimation. ## Accessing quantified estimates for samples in recount2 For the projects currently curated in recount2, to access quantified transcript abundances use: ```{r, eval=F} project = 'DRP000366' path = getRseTx(project, download_path=paste0(getwd(), '/rse_tx.RData')) load(path) ``` ## Quantifying samples on recount2 To re-run the NNLS model on the samples in recount2 follow: ```{r, eval=F} library(recountNNLS) ## Specify a SRA project and download the relevant path data from recount2 project = 'SRP063581' pheno = processPheno(project) ## Main NNLS workhorse function to create a RSE of transcript abundance rse_tx = recountNNLS(pheno) ``` ## Data not yet part of recount2 Please see vignette `vignette('recountNNLS', package='recountNNLS')`. ## Deliverable The output of `recountNNLS()` is a `RangedSummarizedExperiment` where: * Each row corresponds to a transcript * Each column corresponds to a run (sample) * `rowRanges()` can be used to access a `GRangesList` annotation of the transcripts * `assays()` returns a list of length 2 that can be access with `$` + `$fragments` returns a `matrix` of estimated transcript abundance on a number of reads scale + `$se` returns a `matrix` of estimated standard error of the abundance + `$score` returns a `matrix` of estimated scores + `$df` returns a `matrix` of degrees of freedom for statistical inference * `colData()` returns a `DataFrame` that contains the metadata for this project obtained from SRA and processed by `processPheno()`. * `rowData()` returns a `DataFrame` containing information on transcript names, gene, names, transcript lengths, and number of exons in transcript. * `colnames()` returns the sample ids (runs) in the project, and corresponds to each column of the `assays()` matrices. * `rownames()` returns the transcript ids, and corresponds to each row of the `assays()` matrices.
About
Tx Abundance Using recount2 Summary Measures Visa NNLS
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published