Skip to content

Subsampling

Mahesh Binzer-Panchal edited this page Sep 5, 2019 · 4 revisions

Subsampling Reads

Notes:

  • Illumina Paired data only.

Command:

#!/usr/bin/env bash

module load bioinfo-tools seqtk

CPUS="${SLURM_NPROCS:-2}"
JOB=$SLURM_ARRAY_TASK_ID

DATA_DIR=/path/to/reads
FILES=( $DATA_DIR/*_R1.fastq.gz )

FASTQ=${FILES[$JOB]}
READ1=$( basename "$FASTQ" )
READ2="${READ1/_R1./_R2.}"
FRACTION=0.1
SEED=100
seqtk sample -s"$SEED" "$READ1" "$FRACTION" | gzip -c > "${READ1/_R1./_R1.subsampled.}" &
seqtk sample -s"$SEED" "$READ2" "$FRACTION" | gzip -c > "${READ1/_R2./_R2.subsampled.}"
wait
Clone this wiki locally