Skip to content

Starting from demultiplexed fastq files

cziegenhain edited this page May 18, 2020 · 3 revisions

In some cases, sequencing providers will only provide already demultiplexed fastq files. Since zUMIs requires the cell identity to be encoded in one of the fastq files, already demultiplexed files can be incompatible eg. in Smart-seq data.

In zUMIs we provide a way to recombine fastq files and generate an arbitrary index sequence. All you need to provide is the path to the folder containing the individual fastq files that should be combined. Fastq file names are expected in the format of bcl2fastq (XYZ_R1_001.fastq.gz) or SRA's fastq-dump (XYZ_1.fastq.gz). Fastq files are assumed to be gzipped.

Rscript zUMIs/misc/merge_demultiplexed_fastq.R --dir /path/to/individual_fastqs

optionally, you can also set a custom path to the pigz dependency and a number of threads (defaults are pigz and 24, respectively) Rscript zUMIs/misc/merge_demultiplexed_fastq.R --dir /path/to/individual_fastqs --pigz /path/to/pigz --threads 8

The output files will be generated in the same folder:

reads_for_zUMIs.R1.fastq.gz --- concatenated read 1 file

reads_for_zUMIs.R2.fastq.gz --- concatenated read 2 file (if paired-end was detected)

reads_for_zUMIs.index.fastq.gz --- generated barcode reads to be used in zUMIs

reads_for_zUMIs.samples.txt --- text file containing the sample to barcode mapping

reads_for_zUMIs.expected_barcodes.txt --- barcode text list for use in zUMIs YAML

All "barcodes" will be a randomly generated strings of length 8, so in your YAML set up the index fastq file with BC(1-8). Independently of whether the original data was indexed as single-index or dual-index, this script will only create a single index fastq file.

Clone this wiki locally