Skip to content
Swati Parekh edited this page Jul 24, 2017 · 12 revisions

zUMIs' output is structured in three subdirectories:

zUMIs_output/filtered_fastq
zUMIs_output/expression
zUMIs_output/stats
  • "filtered_fastq" contains the filtered, gzipped fastq files for cDNA and barcode reads.
  • "expression" contains .rds files of the generated reference annotation and expression tables.
  • "stats" contains plots and data files with descriptive statistics
  • a log file can be found in zUMIs_output/ with possible error messages
  • STAR output is stored in the parent directory defined by the user (-o)

Structure of the output dgecounts object in .dgecounts.rds

zUMIs produces dge output in .rds format that can be read in R with the following command.

AllCounts <- readRDS("zUMIs_output/expression/example.dgecounts.rds")
names(AllCounts)
[1] "introns"     "exons"       "intron.exon"

names(AllCounts$exons)
[1] "readcounts"  "umicounts"   "downsampled"

names(names(AllCounts$exons$downsampled)
[1] "downsampled_7358"

AllCounts is a list of lists with all the count tables. The parent list is three feature types (introns,exons and intron+exon) and each of them contain three subtypes with "readcounts" -- (without removing duplicates), "umicounts" -- (removed duplicates) and "downsampled" -- a list of all the downsampling sizes requested. Each of the downsampling list also contains "readcounts" & "umicounts".

All the tables from any feature type can be saved as a count matrix using the code below. For example:

downsamp <- unlist(x = AllCounts$exons$downsampled,recursive = F,use.names = T)
lapply(names(downsamp),function(x) write.table(AllCounts[[x]],file=paste("zUMIs_output/expression/",x,".txt",sep=""),sep = "\t",row.names = T,col.names = T)))

Clone this wiki locally