Workflow for RNAseq, using hisat2 aligner
mv download grch39.tar.gz
tar -xvzf grch39.tar.gz
gunzip Homo_sapiens.GRCh38.99.gtf.gz
If you want to use it locally, the software from the workflow: trimmomatic, hisat, subread, samtools need to be installed locally and made runnable from command line
The paths to genome, GTF and adapters need to be set in the python constants in the Snakefile If needed, also paths to the software commands and trimmomatic jar. Recommended is to have them in the executable or java paths, eg with setting the environment value.
Create a run directory, where you place: Snakefile, adapters.fa and fastq.gz files in "data" subdirectory. Do the updates to the Snakefile as above: location of genome index and GTF annotation, then:
snakemake -np
snakemake -p
Make the snakemake available in the cluster environment, eg
module load gcc/8.2.0 python/3.10.4
snakemake -p -j 999 --cluster-config cluster.json --cluster "bsub -W {cluster.time} -n {cluster.n}"
# change times in cluster.json to HH:MM:SS
snakemake -p -j 999 --cluster-config cluster.json --cluster "sbatch --time {cluster.time} -n {cluster.n}"
snakemake -p -j 999 --cluster-config cluster.json --cluster "sbatch --time {cluster.time} -n 1 --cpus-per-task={cluster.n}"
snakemake -p -j 999 --cluster-config cluster.json --cluster "sbatch --time {cluster.time} -n 1 --cpus-per-task={cluster.n} --mem-per-cpu={cluster.mem}"
Running the workflow with the containers from Galaxy software stack requires passing the external folders as singularity parameters to the snakemake. The containers will be loaded into .snakemake folder.
snakemake -p -j 999 --use-singularity --cluster-config cluster.json \
--cluster "sbatch --time {cluster.time} -n 1 --cpus-per-task={cluster.n}" \
--singularity-args "--bind /cluster/scratch/username/runfolder/:/mnt2 --bind /cluster/home/michalo/project_michalo/hisat/grch38/:/genomes --bind /cluster/home/michalo/project_michalo/hg38/:/annots"