-
Notifications
You must be signed in to change notification settings - Fork 2
/
02_stranded_or_not.Rmd
91 lines (63 loc) · 2.53 KB
/
02_stranded_or_not.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: "Is sample stranded?"
author: "Florent Chuffart, Céline Mandier"
date: "`r Sys.Date()`"
output:
rmarkdown::html_document:
toc: true
toc_float: true
toc_depth: 3
number_sections: true
---
```{r, echo=FALSE, eval=TRUE}
knitr::opts_chunk$set(collapse=TRUE, comment = "#>", fig.width=9, fig.height=6, eval=TRUE, echo=FALSE, results="hide")
source("config")
```
# Data
Data matrix is compute from fastq files:
* from study `r gse`
* aligned on `r species ` `r version ` `r annotation ` genome using STAR
* counts are computed using `htseq-count`
* normalization is done by DESeq2.
Raw counts are computed using htseq-count
We generate three tables corresponding to the count of reads for genes with htseq strand option set to:
* `yes`
* `no`
* `reverse`
We plot raw counts obtained using `htseq-count stranded` option `yes` according to raw counts obtained using `htseq-count stranded` option `reverse`.
We plot characterization of reads `reverse` and reads `yes` with boxplot.
# Results
```{r}
print("# stranded or not?")
re_filenames = list.files(paste0("~/projects/datashare/", gse, "/"), paste0("*_notrim_star_", species, "_", annotation, "_", version, "_", gtf_prefix, "_strandedreverse_classiccounts.txt"))
ye_filenames = list.files(paste0("~/projects/datashare/", gse, "/"), paste0("*_notrim_star_", species, "_", annotation, "_", version, "_", gtf_prefix, "_strandedyes_classiccounts.txt"))
gsm = intersect(do.call(rbind, strsplit(re_filenames, "_notrim_star_"))[,1], do.call(rbind, strsplit(ye_filenames, "_notrim_star_"))[,1])
gsm
re_filenames = paste0("~/projects/datashare/", gse, "/", gsm, "_notrim_star_", species, "_", annotation, "_", version, "_", gtf_prefix, "_strandedreverse_classiccounts.txt")
ye_filenames = paste0("~/projects/datashare/", gse, "/", gsm, "_notrim_star_", species, "_", annotation, "_", version, "_", gtf_prefix, "_strandedyes_classiccounts.txt" )
foo_re = read.table(re_filenames)[,2]
foo_ye = read.table(ye_filenames)[,2]
layout(matrix(1:2, 1), respect=TRUE)
plot(log2(foo_re+1), log2(foo_ye+1), pch=".")
abline(0,1, col=2)
df = data.frame(cnt=c(log2(foo_re+1),log2(foo_ye+1)), str=c(rep("re", length(foo_re)), rep("ye", length(foo_ye))) )
boxplot(cnt~str, df)
```
```{r}
diff = median(log2(foo_re+1)) - median(log2(foo_ye+1))
if (diff > 1.5) {
strand = "reverse"
} else if (diff < -2) {
strand = "yes"
} else {
strand = "no"
}
```
# conclusion
```{r echo=TRUE, results="verbatim"}
strand
```
# Session Information
```{r, results="verbatim"}
sessionInfo()
```