Skip to content
Volker Brendel edited this page Sep 13, 2019 · 18 revisions

I keep records of my alignment data nicely organized in a spreadsheet. Can I upload that information directly into my TSS object?

Yes, this is possible and in fact a good way to keep order if you have multiple experiments or samples to keep track of. Consider the following example:

  1. Create the file bamf.tsv in your working directory (assuming you installed TSRchitect in your home directory; note that the whitespaces in the file need to be tabs):
SAMPLE	ReplicateID	FILE
sample1-rep1	1	~/TSRchitect/inst/extdata/bamFiles/sample1-rep1.bam
sample1-rep2	1	~/TSRchitect/inst/extdata/bamFiles/sample1-rep2.bam
sample2-rep1	2	~/TSRchitect/inst/extdata/bamFiles/sample2-rep1.bam
sample2-rep2	2	~/TSRchitect/inst/extdata/bamFiles/sample2-rep2.bam
  1. Now run the following R code:
library("TSRchitect")
test.Obj <- loadTSSobj(experimentTitle="Code example", inputDir=".", isPairedBAM=TRUE, isPairedBED=TRUE, sampleSheet="bamf.tsv", sampleNames="", replicateIDs=0)

show(test.Obj)

This should have loaded all the samples as specified.


Can I add data to an existing TSS object or combine different TSS objects?

Yes, quite easily so. Let's look at the following example - we have two .bam files to upload, and then our colleague Bob comes up with new data which he supplies in .bed format. We'd want to look at the similarities and differences between the samples and replicates, and thus we'll set up two sample sheets (bamf1.tsv and bedf2.tsv):

SAMPLE	ReplicateID	FILE
sample1-rep1	1	~/TSRchitect/inst/extdata/bamFiles/sample1-rep1.bam
sample1-rep2	1	~/TSRchitect/inst/extdata/bamFiles/sample1-rep2.bam

and

SAMPLE	ReplicateID	FILE
sample2-rep1	2	~/TSRchitect/inst/extdata/bedFiles/sample2-rep1.bed
sample2-rep2	2	~/TSRchitect/inst/extdata/bedFiles/sample2-rep2.bed

Then, the following R code will nicely combine the input data into test.ObjCombined:

library("TSRchitect")
  
test.Obj1 <- loadTSSobj(experimentTitle="Code example", inputDir=".", isPairedBAM=TRUE, isPairedBED=TRUE, sampleSheet="bamf1.tsv", sampleNames="", replicateIDs=0)

test.Obj2 <- loadTSSobj(experimentTitle="Code example", inputDir=".", isPairedBAM=TRUE, isPairedBED=TRUE, sampleSheet="bedf2.tsv", sampleNames="", replicateIDs=0)


# ... converting BAM/BED data into TSS information and attaching it to its tssObject object:
test.Obj1 <- inputToTSS(test.Obj1)
test.Obj2 <- inputToTSS(test.Obj2)

# ... constructing the tag count per TSS data matrices:
test.Obj1 <- processTSS(experimentName=test.Obj1, n.cores=2, tssSet="all", writeTable=FALSE)
test.Obj2 <- processTSS(experimentName=test.Obj2, n.cores=2, tssSet="all", writeTable=FALSE)

test.ObjCombined <- new("tssObject")
test.ObjCombined@tssCountData <- c(test.Obj1@tssCountData,test.Obj2@tssCountData)
test.ObjCombined@sampleNames  <- c(test.Obj1@sampleNames,test.Obj2@sampleNames)
test.ObjCombined@replicateIDs <- c(1,1,2,2)
test.ObjCombined@title <- "Combined object"
str(test.ObjCombined)

I have uploaded paired-end read data into my TSS object. How can I access the 3'-read information?

Although TSRchitect primarily focusses on 5'-read processing to derive transcription start site information, the mapping data of the paired 3'-reads is kept in the TSS object. Where you will find the data depends on your original input path. If you imported TSS profiling data from .bam files, then the 5'-read information is stored in a list of GAlignmnent objects called bamDataFirstRead, whereas the 3'-read information is stored in bamDataLastRead. For the example discussed before, you can see what has been loaded as follows:

library("TSRchitect")
test.Obj <- loadTSSobj(experimentTitle="Code example", inputDir=".", isPairedBAM=TRUE, isPairedBED=TRUE, sampleSheet="bamf.tsv", sampleNames="", replicateIDs=0)

show(test.Obj)
str(test.Obj@bamDataLastRead)

library("GenomicAlignments")
grf <- granges(test.Obj@bamDataFirstRead[[1]], use.names=TRUE, use.mcols=TRUE)
grl <- granges(test.Obj@bamDataLastRead[[1]], use.names=TRUE, use.mcols=TRUE)
head(grf)
head(grl)

Note that we have cast the GAlignment objects to GRanges using the granges() function of GenomicAlignments. We could retrieve the pair information as follows:

dff <- as.data.frame(grf)
dfl <- as.data.frame(grl)
mdf <- merge(dff,dfl,by="qname",sort=TRUE)

If you loaded the original data in (paired) .bed format, then the mapping information is stored in the bedData slot as a list of Pairs objects (see S4Vectors package). Equivalent to the above for .bam files, the following would do:

library("TSRchitect")
test.Obj <- loadTSSobj(experimentTitle="Code example", inputDir=".", isPairedBAM=TRUE, isPairedBED=TRUE, sampleSheet="bedf.tsv", sampleNames="", replicateIDs=0)

show(test.Obj)
library("S4Vectors")
first(test.Obj@bedData[[1]])
second(test.Obj@bedData[[1]])
mdf <- as.data.frame(test.Obj@bedData[[1]])
Clone this wiki locally