Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulling main to dev for fixing dorado #22

Merged
merged 13 commits into from
Oct 16, 2023
65 changes: 35 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,27 +34,30 @@ Download: https://community.nanoporetech.com/downloads

The `guppy` and `ont-pyguppy-client-lib` versions need to match
```
git clone https://github.com/Psy-Fer/buttery-eel.git
cd buttery-eel
python3 -m venv venv3
source ./venv3/bin/activate
pip install --upgrade pip
pip install --upgrade setuptools wheel

# if your slow5 file uses zstd compression and you have zstd installed
# see slow5lib for more info
# set this first to ensure pyslow5 installs with zstd:
# export PYSLOW5_ZSTD=1

# if GUPPY_VERSION=6.3.8
# modify requirements.txt to have:
# ont-pyguppy-client-lib==6.3.8
# if using DORADO_SERVER_VERSION=7.1.4
# ont-pyguppy-client-lib==7.1.4

python setup.py install

buttery-eel --help
git clone https://github.com/Psy-Fer/buttery-eel.git
cd buttery-eel
python3 -m venv venv3
source ./venv3/bin/activate
pip install --upgrade pip
pip install --upgrade setuptools wheel

# if your slow5 file uses zstd compression and you have zstd installed
# see slow5lib for more info
# set this first to ensure pyslow5 installs with zstd:
# export PYSLOW5_ZSTD=1

# if GUPPY_VERSION=6.3.8
# modify requirements.txt to have:
# ont-pyguppy-client-lib==6.3.8
# if using DORADO_SERVER_VERSION=7.1.4
# ont-pyguppy-client-lib==7.1.4

python setup.py install

# Alternatively, the new way of building things is to do the following command
# pip install .

buttery-eel --help

```

Expand Down Expand Up @@ -160,7 +163,7 @@ the `--config` file can be found using this command with guppy `guppy_basecaller

samtools fastq -TMM,ML test.mod.sam | minimap2 -ax map-ont -y ref.fa - | samtools view -Sb - | samtools sort - > test.aln.mod.bam

If you also wish to keep the quality scores in the unofficial qs tags or if mapping a regular unmapped sam the -T argument can be used in conjunction with minimap2 -y for example: `-TMM,ML,qs` or `-Tqs`
If you also wish to keep the quality scores in the unofficial qs tags or if mapping a regular unmapped sam the -T argument can be used in conjunction with minimap2 -y for example: `-TMM,ML,qs` or `-Tqs`. You can also get all sam tags with `-T'*'` but you need samtools of v1.16 or higher.


# Shutting down server
Expand All @@ -171,19 +174,21 @@ However, sometimes things go wrong, and the wrapper will temrinate before it ter

I have mostly fixed this but sometimes it still happens. Here is how you check for the server and then kill it.

# check for guppy instanaces
ps -ef | grep guppy
```
# check for guppy instanaces
ps -ef | grep guppy

# That might give you a result like this
# That might give you a result like this

# hasindu 27946 27905 99 19:31 pts/22 01:25:29 /install/ont-guppy-6.3.8/bin/guppy_basecall_server --log_path buttery_guppy_logs --config dna_r9.4.1_450bps_hac_prom.cfg --port 5558 --use_tcp -x cuda:all --max_queued_reads 2000 --chunk_size 2000
# hasindu 27946 27905 99 19:31 pts/22 01:25:29 /install/ont-guppy-6.3.8/bin/guppy_basecall_server --log_path buttery_guppy_logs --config dna_r9.4.1_450bps_hac_prom.cfg --port 5558 --use_tcp -x cuda:all --max_queued_reads 2000 --chunk_size 2000

# using the --port to see that it is indeed the one you started.
# you can then kill the process with, where in this case, `PID=27946`
# using the --port to see that it is indeed the one you started.
# you can then kill the process with, where in this case, `PID=27946`

kill <PID>
kill <PID>

# then you can try again
# then you can try again
```


# Info
Expand Down
2 changes: 1 addition & 1 deletion docs/formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The columns are described below. Some of the descriptions are derived from the [
| 7 | int | mux | 2 | The MUX setting for the channel when the read began. See Table 5 of [slow5 specification](https://hasindu2008.github.io/slow5specs/slow5-v1.0.0.pdf) for details |
| 8 | int | minknow_events | 0 | The number of events detected by MinKNOW. Zero if unknown, or if the value cannot be determined due to read-splitting |
| 9 | int | start_time | 3034.378 | Start time of the read, in seconds since the beginning of the run. This is typically equal to the start_time in S/BLOW5 (which is in terms of number of samples), divided by the sampling_rate in S/BLOW5. |
| 10 | int | duration | 39878 | Time it took from start time to sequence read in seconds |
| 10 | int | duration | 9.9695 | Time it took from start time to sequence read in seconds |
| 11 | string | passes_filtering | TRUE | TRUE/FALSE for passing the minimum qscore |
| 12 | ? | template_start | . | Legacy value. |
| 13 | int | num_events_template | . | Legacy value. Number of events present in read. |
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
url="https://github.com/Psy-Fer/buttery-eel",
author="James Ferguson",
author_email="j.ferguson@garvan.org.au",
description="Slow5 guppy basecall wrapper",
description="Slow5 guppy/dorado basecall wrapper",
long_description=long_description,
long_description_content_type="text/markdown",
packages=setuptools.find_packages(),
Expand Down
2 changes: 1 addition & 1 deletion src/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__="0.3.3"
__version__="0.4.0"
2 changes: 1 addition & 1 deletion src/buttery_eel.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,7 +576,7 @@ def submit_read(args, iq, rq, address, config, params, N):
if args.seq_sum:
minknow_events = call['metadata']['num_minknow_events']
sample_rate = float(read_store[read_id]["sampling_rate"])
duration = float(call['metadata']['duration'] / sample_rate, 6)
duration = round(float(call['metadata']['duration'] / sample_rate), 6)
num_events = call['metadata']['num_events']
median = round(call['metadata']['median'], 6)
med_abs_dev = round(call['metadata']['med_abs_dev'], 6)
Expand Down
2 changes: 1 addition & 1 deletion test/dorado/test_demux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ PORT=$(netstat -aln | awk '
}')
echo $PORT
}
LIST="barcode02 barcode25 barcode95 unclassified"
LIST="barcode02 barcode95 unclassified"

CURRENT_GUPPY=$(grep "ont-pyguppy-client-lib" requirements.txt | cut -d "=" -f 3)
test -z ${CURRENT_GUPPY} && die "ont-pyguppy-client-lib not found in requirements.txt"
Expand Down
2 changes: 1 addition & 1 deletion test/dorado/test_demux_qscore_split.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ PORT=$(netstat -aln | awk '
echo $PORT
}

LIST="barcode02 barcode25 barcode95 unclassified"
LIST="barcode02 barcode95 unclassified"

CURRENT_GUPPY=$(grep "ont-pyguppy-client-lib" requirements.txt | cut -d "=" -f 3)
test -z ${CURRENT_GUPPY} && die "ont-pyguppy-client-lib not found in requirements.txt"
Expand Down
13 changes: 7 additions & 6 deletions test/dorado/test_extensive.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ die() {
#exit 1
}

rm -f *.log

echo "Installation"
test/dorado/test_install.sh &> install.log || die "test failed. see install.log for details"
echo ""
echo "********************************************************************"

CURRENT_GUPPY=$(grep "ont-pyguppy-client-lib" requirements.txt | cut -d "=" -f 3)
test -z ${CURRENT_GUPPY} && die "ont-pyguppy-client-lib not found in requirements.txt"

Expand All @@ -40,12 +47,6 @@ export PATH_TO_EEL_VENV=./venv3/bin/activate
export PATH_TO_IDENTITY=/install/biorand/bin/identitydna.sh
export REFIDX=/genome/hg38noAlt.idx

rm -f *.log

echo "Installation"
test/dorado/test_install.sh &> install.log || die "test failed. see install.log for details"
echo ""
echo "********************************************************************"

echo "R9.4.1 DNA - FAST model - 20k reads"
export PATH_TO_FAST5=/data/slow5-testdata/NA12878_prom_subsubsample/fast5/
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion test/test_demux.sh → test/guppy/test_demux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ PORT=$(netstat -aln | awk '
echo $PORT
}

LIST="barcode02 barcode25 barcode95 unclassified"
LIST="barcode02 barcode95 unclassified"

CURRENT_GUPPY=$(grep "ont-pyguppy-client-lib" requirements.txt | cut -d "=" -f 3)
test -z ${CURRENT_GUPPY} && die "ont-pyguppy-client-lib not found in requirements.txt"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ PORT=$(netstat -aln | awk '
echo $PORT
}

LIST="barcode02 barcode25 barcode95 unclassified"
LIST="barcode02 barcode95 unclassified"

CURRENT_GUPPY=$(grep "ont-pyguppy-client-lib" requirements.txt | cut -d "=" -f 3)
test -z ${CURRENT_GUPPY} && die "ont-pyguppy-client-lib not found in requirements.txt"
Expand Down
198 changes: 198 additions & 0 deletions test/guppy/test_extensive.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
#!/bin/bash

# MIT License

# Copyright (c) 2023 Hasindu Gamaarachchi
# Copyright (c) 2023 James Ferguson

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

die() {
echo "Error: $@" >&2
exit 1
}

rm -f *.log

echo "Installation"
test/guppy/test_install.sh &> install.log || die "test failed. see install.log for details"
echo ""
echo "********************************************************************"

CURRENT_GUPPY=$(grep "ont-pyguppy-client-lib" requirements.txt | cut -d "=" -f 3)
test -z ${CURRENT_GUPPY} && die "ont-pyguppy-client-lib not found in requirements.txt"

export PATH_TO_GUPPY=/install/ont-guppy-${CURRENT_GUPPY}/bin/
export GUPPY_OUT_TMP=ont-guppy-tmp
export EEL_OUT_TMP=buttery_eel_tmp

export PATH_TO_EEL_VENV=./venv3/bin/activate

export PATH_TO_IDENTITY=/install/biorand/bin/identitydna.sh
export REFIDX=/genome/hg38noAlt.idx

echo "R9.4.1 DNA - FAST model - 20k reads"
export PATH_TO_FAST5=/data/slow5-testdata/NA12878_prom_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/NA12878_prom_subsubsample/reads.blow5
export MODEL=dna_r9.4.1_450bps_fast_prom.cfg
test/guppy/test.sh &> r9_dna_fast.log || die "test failed. see r9_dna_fast.log for details"
echo ""
echo "********************************************************************"

echo "R10.4.1 DNA - HAC model - 20k reads - split qscore inbuilt"
export PATH_TO_FAST5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/reads.blow5
export MODEL=dna_r10.4.1_e8.2_400bps_hac_prom.cfg
test/guppy/test_qscore_split.sh &> r10_split1.log || die "test failed. see r10_split1.log for details"
echo ""
echo "********************************************************************"

echo "R10.4.1 DNA - FAST model - 20k reads - split qscore script"
export PATH_TO_FAST5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/reads.blow5
export MODEL=dna_r10.4.1_e8.2_400bps_fast_prom.cfg
test/guppy/test_qscore_split2.sh &> r10_split2.log || die "test failed. See r10_split2.log for details"
echo ""
echo "********************************************************************"

echo "SAM format qscore split script"
echo "Not yet implemented :("
echo ""
echo "********************************************************************"

echo "adapater trimming"
export PATH_TO_FAST5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/reads.blow5
export OPTS_GUPPY="--detect_mid_strand_adapter --trim_adapters --detect_adapter --trim_strategy dna --min_score_adapter 60"
export OPTS_EEL=$OPTS_GUPPY
test/guppy/test.sh &> r10_adaptertrim.log || die "test failed. See r10_adaptertrim.log for details"
unset OPTS_GUPPY
unset OPTS_EEL
echo ""
echo "********************************************************************"

echo "read splitting"
export PATH_TO_FAST5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/reads.blow5
export OPTS_GUPPY="--do_read_splitting --min_score_read_splitting 50"
export OPTS_EEL=$OPTS_GUPPY
test/guppy/test.sh &> r10_readsplit.log || die "test failed. See r10_readsplit.log for details"
unset OPTS_GUPPY
unset OPTS_EEL
echo ""
echo "********************************************************************"

echo "adapter trimming with read splitting"
export PATH_TO_FAST5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/reads.blow5
export OPTS_GUPPY="--detect_mid_strand_adapter --trim_adapters --detect_adapter --trim_strategy dna --min_score_adapter 60 --do_read_splitting --min_score_read_splitting 50"
export OPTS_EEL=$OPTS_GUPPY
test/guppy/test.sh &> r10_readsplittrim.log || die "test failed. See r10_readsplittrim.log for details"
unset OPTS_GUPPY
unset OPTS_EEL
echo ""
echo "********************************************************************"

echo "seqsum"
export PATH_TO_FAST5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/reads.blow5
test/guppy/test_seqsum.sh &> seqsum.log || die "test failed. See seqsum.log for details"
echo ""
echo "********************************************************************"

echo "seqsum - multiple BLOW5"
export PATH_TO_FAST5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/blow5/
test/guppy/test_seqsum.sh &> seqsum_multiblow.log
echo ""
echo "********************************************************************"

echo "demux - FASTQ and SAM"
export PATH_TO_FAST5=/data/slow5-testdata/barcode_test/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/barcode_test/merged_rand.blow5
export MODEL=dna_r10.4.1_e8.2_400bps_fast_prom.cfg
test/guppy/test_demux.sh &> demux.log || die "test failed. See demux.log for details"
echo ""
echo "********************************************************************"

echo "demux - qscore - FASTQ and SAM"
export PATH_TO_FAST5=/data/slow5-testdata/barcode_test/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/barcode_test/merged_rand.blow5
test/guppy/test_demux_qscore_split.sh &> demux_qscore.log || die "test failed. See demux_qscore.log for details"
echo ""
echo "********************************************************************"

echo "demux - qscore - FASTQ and SAM - BARCODE+adapter trimming"
export PATH_TO_FAST5=/data/slow5-testdata/barcode_test/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/barcode_test/merged_rand.blow5
export OPTS_GUPPY="--trim_adapters "
export OPTS_BARCODER="--enable_trim_barcodes"
export OPTS_EEL=$OPTS_GUPPY" "$OPTS_BARCODER
test/guppy/test_demux_qscore_split.sh &> demux_qscore_trim.log || die "test failed. See demux_qscore_trim.log for details"
unset OPTS_GUPPY
unset OPTS_EEL
echo ""
echo "********************************************************************"

echo "move table"
echo "Not yet implemented :("
echo ""
echo "********************************************************************"

echo "move table when adaptor/barcode trimming"
echo "Not yet implemented :("
echo ""
echo "********************************************************************"

echo "remora"
export PATH_TO_FAST5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/reads.blow5
export MODEL=dna_r10.4.1_e8.2_400bps_modbases_5mc_cg_fast_prom.cfg
test/guppy/test_remora.sh &> remora.log || die "test failed. See remora.log for details"
echo ""
echo "********************************************************************"

echo "remora with qscore split and dumux"
echo "Not yet implemented :("
echo ""
echo "********************************************************************"

echo "remora with adaptor/barcode trimming"
echo "Not yet implemented :("
echo ""
echo "********************************************************************"

echo "R10.4.1 DNA - FAST model - 500k reads"
export PATH_TO_FAST5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/fast5/
export PATH_TO_BLOW5=/data/slow5-testdata/hg2_prom_lsk114_subsubsample/reads.blow5
export MODEL=dna_r10.4.1_e8.2_400bps_fast_prom.cfg
test/guppy/test.sh &> dna_500k.log || die "test failed. See dna_500k.log for details"
echo ""
echo "********************************************************************"

echo "R9.4.1 RNA - FAST model"
export PATH_TO_IDENTITY=/install/biorand/bin/identityrna.sh
export REFIDX=/genome/gencode.v40.transcripts.fa
export PATH_TO_FAST5=/data/hasindu/hasindu2008.git/f5c/test/rna/
export PATH_TO_BLOW5=/data/hasindu/hasindu2008.git/f5c/test/rna/reads.blow5
export MODEL=rna_r9.4.1_70bps_fast_prom.cfg
test/guppy/test.sh &> rna.log || die "test failed. See rna.log for details"



5 changes: 5 additions & 0 deletions test/test_install.sh → test/guppy/test_install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ die(){
exit 1
}

GUPPY_VERSION=6.5.7
CURRENT_GUPPY=$(grep "ont-pyguppy-client-lib" requirements.txt | cut -d "=" -f 3)
test -z ${CURRENT_GUPPY} && die "ont-pyguppy-client-lib not found in requirements.txt"
sed -i "s/${CURRENT_GUPPY}/${GUPPY_VERSION}/" requirements.txt || die "sed failed"

test -z $EEL_PYTHON3 && EEL_PYTHON3=python3
rm -rf venv3
${EEL_PYTHON3} -m venv venv3 || die "venv failed"
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading