-
Notifications
You must be signed in to change notification settings - Fork 6
Subsampling
Mahesh Binzer-Panchal edited this page Sep 5, 2019
·
4 revisions
Notes:
- Illumina Paired data only.
Command:
#!/usr/bin/env bash
module load bioinfo-tools seqtk
CPUS="${SLURM_NPROCS:-2}"
JOB=$SLURM_ARRAY_TASK_ID
DATA_DIR=/path/to/reads
FILES=( $DATA_DIR/*_R1.fastq.gz )
FASTQ=${FILES[$JOB]}
READ1=$( basename "$FASTQ" )
READ2="${READ1/_R1./_R2.}"
FRACTION=0.1
SEED=100
seqtk sample -s"$SEED" "$READ1" "$FRACTION" | gzip -c > "${READ1/_R1./_R1.subsampled.}" &
seqtk sample -s"$SEED" "$READ2" "$FRACTION" | gzip -c > "${READ1/_R2./_R2.subsampled.}"
wait