Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

16 bbmap repairsh guesses the wrong ascii quality encoding offset and fails #17

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 10 additions & 26 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,31 +8,15 @@
// Update the 'dockerFile' property if you aren't using the standard 'Dockerfile' filename.
"dockerfile": "../Dockerfile"
},
// "mounts": [
// "source=${localWorkspaceFolder},target=/fastqwiper,type=bind,consistency=cached"
// ],
"customizations": {
"vscode": {
"extensions": [
"ms-python.vscode-pylance",
"ms-python.black-formatter",
"snakemake.snakemake-lang"
]
}
}

// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],

// Uncomment the next line to run commands after the container is created.
// "postCreateCommand": "cat /etc/os-release",

// Configure tool-specific properties.
// "customizations": {},

// Uncomment to connect as an existing user other than the container default. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "devcontainer"
"features": {
"ghcr.io/devcontainers/features/git:1": {}
},
"customizations" : {
"jetbrains" : {
"backend" : "PyCharm"
}
},
workspaceFolder: "/fastqwiper",
workspaceMount: "source=${localWorkspaceFolder},target=/fastqwiper,type=bind"
}
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#my custom
.vscode
bbmap/
logs/
bbmap

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
Empty file removed .vscode/settings.json
Empty file.
9 changes: 5 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ LABEL maintainer="mazza.tommaso@gmail.com"
ENV bbmap_version 39.01
ENV PATH "$PATH:/tmp/jre1.8.0_161/bin/"

RUN mamba config --set channel_priority strict
# RUN mamba config --set channel_priority strict
RUN mamba install python=3.10
RUN mamba install -c conda-forge -c bioconda snakemake=7.32.3 -y
RUN mamba install -c conda-forge colorama click -y
Expand All @@ -28,13 +28,14 @@ WORKDIR /fastqwiper

COPY pipeline pipeline
COPY run_wiping.sh run_wiping.sh
COPY data data
RUN chmod +x run_wiping.sh


ENTRYPOINT ["/fastqwiper/run_wiping.sh"]
# paired mode, 4 cores, sample name, #rows-per-chunk
CMD ["paired", "4", "sample", "50000000"]
# paired mode, 4 cores, sample name, #rows-per-chunk, ASCII offset (33=Sanger, 64=old Solexa)
CMD ["paired", "4", "sample", "50000000", "33"]

# docker build -t test .
# docker run --rm -ti --name test -v "D:\desktop_links\CSS-Bioinformatics\FastqWiper\FastqWiper\data:/fastqwiper/data" test paired 8 sample 50000000
# docker run --rm -ti --name test -v "D:\Projects\fastqwiper\data:/fastqwiper/data" test paired 4 sample 50000000 33
# docker exec -ti test /bin/bash
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ There are <b>QUICK</b> and a <b>SLOW</b> methods to configure `FastqWiper`'s wor

2. Once downloaded the image, type:

CMD: `docker run --rm -ti --name fastqwiper -v "YOUR_LOCAL_PATH_TO_DATA_FOLDER:/fastqwiper/data" mazzalab/fastqwiper paired 8 sample 50000000`
CMD: `docker run --rm -ti --name fastqwiper -v "YOUR_LOCAL_PATH_TO_DATA_FOLDER:/fastqwiper/data" mazzalab/fastqwiper paired 8 sample 50000000 33`

#### Another quick way (Singularity)
1. Pull the Singularity image from the Cloud Library:
Expand All @@ -79,20 +79,20 @@ CMD: `docker run --rm -ti --name fastqwiper -v "YOUR_LOCAL_PATH_TO_DATA_FOLDER:/

2. Once downloaded the image (e.g., fastqwiper.sif_2023.2.70.sif), type:

CMD `singularity run --bind /scratch/tom/fastqwiper_singularity/data:/fastqwiper/data --writable-tmpfs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000`
CMD `singularity run --bind /scratch/tom/fastqwiper_singularity/data:/fastqwiper/data --writable-tmpfs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000 33`

If you want to bind the `.singularity` cache folder and the `logs` folder, you can omit `--writable-tmpfs`, create the folders `.singularity` and `logs` (`mkdir .singularity logs`) on the host system, and use this command instead:

CMD: `singularity run --bind YOUR_LOCAL_PATH_TO_DATA_FOLDER/:/fastqwiper/data --bind YOUR_LOCAL_PATH_TO_.singularity_FOLDER/:/fastqwiper/.snakemake --bind YOUR_LOCAL_PATH_TO_LOGS_FOLDER/:/fastqwiper/logs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000`
CMD: `singularity run --bind YOUR_LOCAL_PATH_TO_DATA_FOLDER/:/fastqwiper/data --bind YOUR_LOCAL_PATH_TO_.singularity_FOLDER/:/fastqwiper/.snakemake --bind YOUR_LOCAL_PATH_TO_LOGS_FOLDER/:/fastqwiper/logs fastqwiper.sif_2023.2.70.sif paired 8 sample 50000000 33`

For both **Docker** and **Singularity**:

- `YOUR_LOCAL_PATH_TO_DATA_FOLDER` is the path of the folder where the fastq.gz files to be wiped are located;
- `paired` triggers the cleaning of R1 and R2. Alternatively, `single` will trigger the wipe of individual FASTQ files;
- `8` is the number of your choice of computing cores to be spawned;
- `sample` is part of the names of the FASTQ files to be wiped. <b>Be aware</b> that: for <b>paired-end</b> files (e.g., "sample_R1.fastq.gz" and "sample_R2.fastq.gz"), your files must finish with `_R1.fastq.gz` and `_R2.fastq.gz`. Therefore, the argument to pass is everything before these texts: `sample` in this case. For <b>single end</b>/individual files (e.g., "excerpt_R1_001.fastq.gz"), your file must end with the string `.fastq.gz`; the preceding text, i.e., "excerpt_R1_001" in this case, will be the text to be passed to the command as an argument.
- `50000000` is the number of rows-per-chunk (used when cores>1. It must be a number multiple of 4). Increasing this number too much would reduce the parallelism advantage. Decreasing this number too much would increase the number of chunks more than the number of available cpus, making parallelism unefficient. Choose this number wisely depending on the total number of reads in your starting file.

- `50000000` (optional) is the number of rows-per-chunk (used when cores>1. It must be a number multiple of 4). Increasing this number too much would reduce the parallelism advantage. Decreasing this number too much would increase the number of chunks more than the number of available cpus, making parallelism unefficient. Choose this number wisely depending on the total number of reads in your starting file.
- `33` (optional) is the ASCII offset (33=Sanger, 64=old Solexa)

#### The slow way (Linux & Mac OS)
To enable the use of preconfigured [pipelines](https://github.com/mazzalab/fastqwiper/tree/main/pipeline), you need to install **Snakemake**. The recommended way to install Snakemake is via Conda, because it enables **Snakemake** to [handle software dependencies of your workflow](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management).
Expand Down Expand Up @@ -146,14 +146,14 @@ Copy the fastq files you want to fix in the `data` folder.
#### Paired-end files

- **Get a dry run** of a pipeline (e.g., `fix_wipe_pairs_reads_sequential.smk`):<br />
`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_pairs_reads_sequential.smk --use-conda --cores 4`
`snakemake --config sample_name=my_sample qin=33 -s pipeline/fix_wipe_pairs_reads_sequential.smk --use-conda --cores 4`

- **Generate the planned DAG**:<br />
`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_pairs_reads_sequential.smk --dag | dot -Tpdf > dag.pdf`<br /> <br />
`snakemake --config sample_name=my_sample qin=33 -s pipeline/fix_wipe_pairs_reads_sequential.smk --dag | dot -Tpdf > dag.pdf`<br /> <br />
<img src="https://github.com/mazzalab/fastqwiper/blob/main/pipeline/fix_wipe_pairs_reads.png?raw=true" width="400">

- **Run the pipeline** (n.b., during the first execution, Snakemake will download and install some required remote packages and may take longer). The number of computing cores can be tuned accordingly:<br />
`snakemake --config sample_name=my_sample -s pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores 2`
`snakemake --config sample_name=my_sample qin=33 -s pipeline/fix_wipe_single_reads_sequential.smk --use-conda --cores 2`

Fixed files will be copied in the `data` folder and will be suffixed with the string `_fixed_wiped_paired_interleaving`.
We remind that the `fix_wipe_pairs_reads_sequential.smk` and `fix_wipe_pairs_reads_parallel.smk` pipelines perform the following actions:
Expand Down
4 changes: 2 additions & 2 deletions Singularity.def
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ From: condaforge/mambaforge
chmod 777 /fastqwiper/run_wiping.sh

%runscript
if [ $# -eq 4 ] || [ $# -eq 1 ]; then
if [ $# -eq 5 ] || [ $# -eq 3 ] || [ $# -eq 1 ]; then
exec /fastqwiper/run_wiping.sh $@
else
echo "You must provide four arguments [mode (paired, single), # of cores (int), sample name (string), chunk size (int))"
echo "You must provide three + 2 optional arguments [computing mode ('paired' or 'single'), # of cores (int), sample name (string), chunk size (optional, int), ASCII offset (optional, 33 or 64)]"
exit 1
fi
Binary file removed data/excerpt_S1_R1_001.fastq.gz
Binary file not shown.
Binary file removed data/excerpt_S1_R2_001.fastq.gz
Binary file not shown.
Loading