Skip to content

Commit

Permalink
8 - singularity compliance (#9)
Browse files Browse the repository at this point in the history
* Start integration Singularity

* Singularity implementation

* Singularity test

* Singularity artifacts

* Singularity tests

* Singularity TOKEN

* Singularity TOKEN

* Singularity TOKEN

* Singularity TOKEN

* Singularity fix paths

* Singularity PATH

* Singularity workdir

* Fix workdir singularity

* Singularity fix chdir

* Fastqwiper folder

* Pipeline folder copy from host to guest

* Singularity data folder

* Snakemake test

* Snakemake elif

* Snakemake help

* Singularity bash

* Singularity test

* Singularity token

* Singularity push test

* Singularity login test

* Singularity push test

* Singularity token test

* Singularity ENV

* Singularity ENV

* Singularity TOKENFILE

* Singularity TOKEN file

* Update Singularity.def

* Fixing Singularity build

* Try Singularity remote login

* Improved save of fastqwiper wipe in pipeline files

* SIF signature

* Adding fingerprint

* Adding fingerprint

* Deployment keys

* Fixed pipeline wildcards references

* Fixed pipeline syntax

* Fixed snakemake pipeline files

* Editing README.md

* Singularity integrated
  • Loading branch information
mazzalab authored Oct 19, 2023
1 parent 449c376 commit f258b61
Show file tree
Hide file tree
Showing 10 changed files with 150 additions and 30 deletions.
13 changes: 11 additions & 2 deletions .github/workflows/buildall_and_publish.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Build and Deploy all

on: [release]
on: [release] # , push

jobs:
Pypi:
Expand Down Expand Up @@ -28,4 +28,13 @@ jobs:
dockerhub_username: ${{ secrets.DOCKERHUB_USERNAME }}
dockerhub_token: ${{ secrets.DOCKERHUB_TOKEN }}
needs: Conda


Singularity:
name: Singularity CI
uses: ./.github/workflows/singularity_reusable.yml
with:
package_version: ${{vars.FASTQWIPER_VER}}${{github.run_number}}
secrets:
sylabs_token: ${{ secrets.SYLABS_TOKEN }}


2 changes: 1 addition & 1 deletion .github/workflows/code_coverage.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Test code

on: [push, pull_request]
on: [pull_request] # , push

jobs:
test:
Expand Down
38 changes: 38 additions & 0 deletions .github/workflows/singularity_reusable.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Singularity

on:
workflow_call:
inputs:
package_version:
required: true
type: string
secrets:
sylabs_token:
required: true

jobs:
docker_build_publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: eWaterCycle/setup-singularity@v7
with:
singularity-version: 3.8.3
- name: Build a singularity container
run: singularity build --fakeroot fastqwiper.sif Singularity.def
- name: Test the singularity container
run: singularity run fastqwiper.sif help
- uses: "finnp/create-file-action@master"
env:
FILE_NAME: "token.txt"
FILE_DATA: "${{ secrets.sylabs_token }}"
# - uses: actions/upload-artifact@v3
# with:
# name: token.txt
# path: token.txt
- name: Push artifacts to Library
run: |
singularity remote login --tokenfile token.txt
singularity key newpair --password= --name="Tommaso Mazza" --comment="Deployment keys" --email=t.mazza@css-mendel.it --push=false
singularity sign fastqwiper.sif
singularity push fastqwiper.sif library://mazzalab/fastqwiper/fastqwiper.sif:${{ inputs.package_version }}
23 changes: 18 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,14 @@
* OS: Windows, Linux, Mac OS (Snakemake workflows through Docker for Windows)
* Contributions: [bioinformatics@css-mendel.it](bioinformatics@css-mendel.it)
* Docker: https://hub.docker.com/r/mazzalab/fastqwiper
* Singularity: https://cloud.sylabs.io/library/mazzalab/fastqwiper/fastqwiper.sif
* Bug report: [https://github.com/mazzalab/fastqwiper/issues](https://github.com/mazzalab/fastqwiper/issues)


## USAGE
- **Case 1**. You have one or a couple (R1&R2) of **computer readable** FASTQ files which contain pesky, unformatted, uncompliant lines: Use *FastWiper* to clean them;
- **Case 2**. You have one or a couple (R1&R2) of **computer readable** FASTQ files that you want to drop unpaired reads from or fix reads interleaving: Use the FastqWiper's *Snakemake workflows*;
- **Case 3**. You have one `fastq.gz` file or a couple (R1&R2) of `fastq.gz` files which are corrupted and you want to recover healthy reads and reformat them: Use the FastqWiper's *Snakemake workflows*;
- **Case 3**. You have one `fastq.gz` file or a couple (R1&R2) of `fastq.gz` files which are corrupted (**unreadable**) and you want to recover healthy reads and reformat them: Use the FastqWiper's *Snakemake workflows*;


## Installation
Expand Down Expand Up @@ -59,10 +60,10 @@ It accepts in input and outputs **readable** `*.fastq` or `*.fastq.gz` files.


### Cases 2 & 3
There is a <b>QUICK</b> and a <b>SLOW</b> method to configure `FastqWiper`'s workflows.
There are <b>QUICK</b> and a <b>SLOW</b> methods to configure `FastqWiper`'s workflows.


#### The quick way (Docker, all OS)
#### One quick way (Docker)
1. Pull the Docker image from DockerHub:

`docker pull mazzalab/fastqwiper`
Expand All @@ -71,15 +72,27 @@ There is a <b>QUICK</b> and a <b>SLOW</b> method to configure `FastqWiper`'s wor

CMD: `docker run --rm -ti --name fastqwiper -v "YOUR_LOCAL_PATH_TO_DATA_FOLDER:/fastqwiper/data" mazzalab/fastqwiper paired 8 sample 50000000`

where:
#### Another quick way (Singularity)
1. Pull the Singularity image from the Cloud Library:

`singularity pull library://mazzalab/fastqwiper/fastqwiper.sif`

2. Once downloaded the image (e.g., fastqwiper.sif_2023.2.70.sif), type:

CMD `singularity run --bind /scratch/tom/fastqwiper_singularity/data:/fastqwiper/data --writable-tmpfs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000`

If you want to bind the `.singularity` cache folder and the `logs` folder, you can omit `--writable-tmpfs`, create the folders `.singularity` and `logs` (`mkdir .singularity logs`) on the host system, and use this command instead:

CMD: `singularity run --bind YOUR_LOCAL_PATH_TO_DATA_FOLDER/:/fastqwiper/data --bind YOUR_LOCAL_PATH_TO_.singularity_FOLDER/:/fastqwiper/.snakemake --bind YOUR_LOCAL_PATH_TO_LOGS_FOLDER/:/fastqwiper/logs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000`

For both **Docker** and **Singularity**:

- `YOUR_LOCAL_PATH_TO_DATA_FOLDER` is the path of the folder where the fastq.gz files to be wiped are located;
- `paired` triggers the cleaning of R1 and R2. Alternatively, `single` will trigger the wipe of individual FASTQ files;
- `8` is the number of your choice of computing cores to be spawned;
- `sample` is part of the names of the FASTQ files to be wiped. <b>Be aware</b> that: for <b>paired-end</b> files (e.g., "sample_R1.fastq.gz" and "sample_R2.fastq.gz"), your files must finish with `_R1.fastq.gz` and `_R2.fastq.gz`. Therefore, the argument to pass is everything before these texts: `sample` in this case. For <b>single end</b>/individual files (e.g., "excerpt_R1_001.fastq.gz"), your file must end with the string `.fastq.gz`; the preceding text, i.e., "excerpt_R1_001" in this case, will be the text to be passed to the command as an argument.
- `50000000` is the number of rows-per-chunk (used when cores>1. It must be a number multiple of 4)

CMD: `docker run --rm -ti --name fastqwiper -v "YOUR_LOCAL_PATH_TO_DATA_FOLDER:/fastqwiper/data" mazzalab/fastqwiper single 8 excerpt_R1_001 50000000`

#### The slow way (Linux & Mac OS)
To enable the use of preconfigured [pipelines](https://github.com/mazzalab/fastqwiper/tree/main/pipeline), you need to install **Snakemake**. The recommended way to install Snakemake is via Conda, because it enables **Snakemake** to [handle software dependencies of your workflow](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management).
Expand Down
47 changes: 47 additions & 0 deletions Singularity.def
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
## Install Singularity: https://github.com/apptainer/singularity/blob/master/INSTALL.md
## singularity run --bind /scratch/tom/fastqwiper_singularity/data/:/fastqwiper/data --bind /scratch/tom/fastqwiper_singularity/.snakemake:/fastqwiper/.snakemake --bind /scratch/tom/fastqwiper_singularity/logs:/fastqwiper/logs --writable-tmpfs fqw.sif paired 8 sample 50000000

Bootstrap: docker
From: condaforge/mambaforge

%files
pipeline/* /fastqwiper/pipeline/
data/* /fastqwiper/data/
run_wiping.sh /fastqwiper/run_wiping.sh

%environment
PATH=$PATH:/tmp/jre1.8.0_161/bin/

%post
mamba config --set channel_priority strict
mamba install python=3.10
mamba install -c conda-forge -c bioconda snakemake=7.32.3 -y
mamba install -c conda-forge colorama click -y
mamba install -c bioconda trimmomatic -y

mamba install -y -c bfxcss -c conda-forge fastqwiper

apt-get update -y
apt-get install gzrt -y

# Software versions
BBMAP_VER="39.01"

wget -c https://sourceforge.net/projects/bbmap/files/BBMap_$BBMAP_VER.tar.gz/download -O /fastqwiper/BBMap_$BBMAP_VER.tar.gz
cd fastqwiper
tar -xvzf BBMap_${BBMAP_VER}.tar.gz
rm BBMap_${BBMAP_VER}.tar.gz

wget -c http://javadl.oracle.com/webapps/download/AutoDL?BundleId=230532_2f38c3b165be4555a1fa6e98c45e0808 -O /tmp/java.tar.gz
cd /tmp/
tar xvzf java.tar.gz

chmod 777 /fastqwiper/run_wiping.sh

%runscript
if [ $# -eq 4 ] || [ $# -eq 1 ]; then
exec /fastqwiper/run_wiping.sh $@
else
echo "You must provide four arguments [mode (paired, single), # of cores (int), sample name (string), chunk size (int))"
exit 1
fi
14 changes: 8 additions & 6 deletions pipeline/fix_wipe_pairs_reads_parallel.smk
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#cmd: snakemake --config sample_name=sample chunk_size=50000000 -s pipeline/fix_wipe_pairs_reads_parallel.smk --use-conda --cores 4
#cmd: snakemake --config sample_name=sample chunk_size=50000000 -s ./pipeline/fix_wipe_pairs_reads_parallel.smk --use-conda --cores 4

import os
import shutil
Expand Down Expand Up @@ -46,8 +46,9 @@ rule wipe_fastq_parallel:
"logs/wipe_fastq/wipe_fastq.{sample}.chunk{i}.fastq.log"
message:
"Running FastqWiper on {input}."
shell:
"fastqwiper --fastq_in {input} --fastq_out {output} --log_out final_summary.txt --log_frequency 300 2> {log}"
shell:'''
fastqwiper --fastq_in {input} --fastq_out {output} --log_out data/{wildcards.sample}_chunks/{wildcards.sample}_final_summary.txt --log_frequency 300 2> {log}
'''

def aggregate_input(wildcards):
checkpoint_output = checkpoints.split_fastq.get(**wildcards).output[0]
Expand Down Expand Up @@ -92,16 +93,17 @@ rule fix_interleaving:
in2 = "data/{sample}_R2_fixed_wiped_paired.fastq.gz"
output:
out1 = "data/{sample}_R1_fixed_wiped_paired_interleaving.fastq.gz",
out2 = "data/{sample}_R2_fixed_wiped_paired_interleaving.fastq.gz"
out2 = "data/{sample}_R2_fixed_wiped_paired_interleaving.fastq.gz",
out3 = temp("data/{sample}_singletons.fastq.gz")
log:
"logs/pairing/pairing.{sample}.log"
"logs/interleaving/interleaving.{sample}.log"
message:
"Repair reads interleaving from {input}."
threads:
1
cache: False
shell:
"bbmap/repair.sh in={input.in1} in2={input.in2} out={output.out1} out2={output.out2} outsingle=singletons.fastq.gz 2> {log}"
"bbmap/repair.sh in={input.in1} in2={input.in2} out={output.out1} out2={output.out2} outsingle={output.out3} 2> {log}"

onsuccess:
print("Workflow finished, no error. Clean-up and shutdown")
Expand Down
12 changes: 7 additions & 5 deletions pipeline/fix_wipe_pairs_reads_sequential.smk
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,9 @@ rule wipe_fastq:
"logs/wipe_fastq/wipe_fastq.{sample}.log"
message:
"Running FastqWiper on {input}."
shell:
"fastqwiper --fastq_in {input} --fastq_out {output} 2> {log}"
shell:'''
fastqwiper --fastq_in {input} --fastq_out {output} --log_out data/{wildcards.sample}_final_summary.txt 2> {log}
'''

rule drop_unpaired:
input:
Expand All @@ -56,13 +57,14 @@ rule fix_interleaving:
in2 = "data/{sample}_R2_fixed_wiped_paired.fastq.gz"
output:
out1 = "data/{sample}_R1_fixed_wiped_paired_interleaving.fastq.gz",
out2 = "data/{sample}_R2_fixed_wiped_paired_interleaving.fastq.gz"
out2 = "data/{sample}_R2_fixed_wiped_paired_interleaving.fastq.gz",
out3 = temp("data/{sample}_singletons.fastq.gz")
log:
"logs/pairing/pairing.{sample}.log"
"logs/interleaving/interleaving.{sample}.log"
message:
"Repair reads interleaving from {input}."
threads:
1
cache: False
shell:
"bbmap/repair.sh in={input.in1} in2={input.in2} out={output.out1} out2={output.out2} outsingle=singletons.fastq.gz 2> {log}"
"bbmap/repair.sh in={input.in1} in2={input.in2} out={output.out1} out2={output.out2} outsingle={output.out3} 2> {log}"
5 changes: 3 additions & 2 deletions pipeline/fix_wipe_single_reads_parallel.smk
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,9 @@ rule wipe_fastq_parallel:
"logs/wipe_fastq/wipe_fastq.{sample}.chunk{i}.fastq.log"
message:
"Running FastqWiper on {input}."
shell:
"fastqwiper --fastq_in {input} --fastq_out {output} 2> {log}"
shell:'''
fastqwiper --fastq_in {input} --fastq_out {output} --log_out data/{wildcards.sample}_chunks/{wildcards.sample}_final_summary.txt 2> {log}
'''


def aggregate_input(wildcards):
Expand Down
5 changes: 3 additions & 2 deletions pipeline/fix_wipe_single_reads_sequential.smk
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ rule wipe_fastq:
"logs/wipe_fastq/wipe_fastq.{sample}.log"
message:
"Running FastqWiper on {input}."
shell:
"fastqwiper --fastq_in {input} --fastq_out {output} 2> {log}"
shell:'''
fastqwiper --fastq_in {input} --fastq_out {output} --log_out data/{wildcards.sample}_final_summary.txt 2> {log}
'''


21 changes: 14 additions & 7 deletions run_wiping.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,30 @@ cores=$(($2))
sample_name=$3
chunk_size=$(($4))

# Enter the FastqWiper folder
cd /fastqwiper

if [ $mode == "paired" ]
then
if [ $cores > 1 ]
if [ $cores -gt 1 ]
then
echo "Processing paired-end files in parallel"
snakemake --config sample_name=$sample_name chunk_size=$chunk_size -s pipeline/fix_wipe_pairs_reads_parallel.smk --use-conda --cores $cores
snakemake --config sample_name=$sample_name chunk_size=$chunk_size -s ./pipeline/fix_wipe_pairs_reads_parallel.smk --use-conda --cores $cores
else
echo "Processing paired-end files sequentially"
snakemake --config sample_name=$sample_name -s pipeline/fix_wipe_pairs_reads_sequential.smk --use-conda --cores $cores
snakemake --config sample_name=$sample_name -s ./pipeline/fix_wipe_pairs_reads_sequential.smk --use-conda --cores $cores
fi
else
if [ $cores > 1 ]
elif [ $mode == "single" ]
then
if [ $cores -gt 1 ]
then
echo "Processing single-end file in parallel"
snakemake --config sample_name=$sample_name chunk_size=$chunk_size -s pipeline/fix_wipe_single_reads_parallel.smk --use-conda --cores $cores
snakemake --config sample_name=$sample_name chunk_size=$chunk_size -s ./pipeline/fix_wipe_single_reads_parallel.smk --use-conda --cores $cores
else
echo "Processing single-end file sequentially"
snakemake --config sample_name=$sample_name -s pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores $cores
snakemake --config sample_name=$sample_name -s ./pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores $cores
fi
else
echo "Snakemake help"
snakemake --help
fi

0 comments on commit f258b61

Please sign in to comment.