Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change pp-sketchlib calls to kwargs #31

Merged
merged 8 commits into from
May 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions .github/workflows/python-package-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,25 @@ jobs:
build-linux:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ '3.8', '3.9' ]
sketchlib-version: [ '1.7.4', '2.0.0' ]
max-parallel: 5

name: linux python_${{ matrix.python-version }}_pp-sketchlib_${{ matrix.sketchlib-version }}
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: 3.8
python-version: ${{ matrix.python-version }}
- name: Add conda to system path
run: |
# $CONDA is an environment variable pointing to the root of the miniconda directory
echo $CONDA/bin >> $GITHUB_PATH
- name: Install dependencies
run: |
conda env update --file environment.yml --name base
conda install -y -c conda-forge pp-sketchlib=${{ matrix.sketchlib-version }}
- name: Lint with flake8
run: |
conda install flake8
Expand Down
2 changes: 1 addition & 1 deletion LICENSE_kseq
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
The MIT License
Copyright (c) 20082-2012 by Heng Li <lh3@me.com>
Copyright (c) 2008-2012 by Heng Li <lh3@me.com>
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
Expand Down
2 changes: 1 addition & 1 deletion NOTICE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

Copyright 2020 Zhirong Yang, John Lees
Copyright 2020-2022 Zhirong Yang, John Lees, Gerry Tonkin-Hill

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# mandrake <img src='docs/images/mandragora.png' align="right" height="139" />

<!-- badges: start -->
[![Build and run tests](https://github.com/johnlees/mandrake/actions/workflows/python-package-conda.yml/badge.svg)](https://github.com/johnlees/mandrake/actions/workflows/python-package-conda.yml)
[![Build and run tests](https://github.com/bacpop/mandrake/actions/workflows/python-package-conda.yml/badge.svg)](https://github.com/bacpop/mandrake/actions/workflows/python-package-conda.yml)
[![Anaconda package](https://anaconda.org/conda-forge/mandrake/badges/version.svg
)](https://anaconda.org/conda-forge/mandrake)
[![Documentation Status](https://readthedocs.org/projects/mandrake/badge/?version=latest)](https://mandrake.readthedocs.io/)
Expand All @@ -19,6 +19,9 @@ See https://mandrake.readthedocs.io/en/latest/installation.html for more details
2. Run `conda create -n mandrake_env mandrake` to install into a clean environment.
3. Run `conda activate mandrake_env` to use the environment.

Refer to the [conda-forge](https://conda-forge.org/docs/user/tipsandtricks.html#installing-cuda-enabled-packages-like-tensorflow-and-pytorch) documentation if
you want to install a CUDA (GPU) enabled version.

### Semi-manual

You will need some dependencies, which you can install through `conda`:
Expand Down
2 changes: 1 addition & 1 deletion mandrake/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@

'''Visualisation of pathogen population structure'''

__version__ = '1.2.1'
__version__ = '1.2.2'
2 changes: 1 addition & 1 deletion mandrake/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ def main():
if args.kNN < 0:
raise ValueError("Invalid value for kNN")
kNN = args.kNN
threshold = -1
threshold = -1.0
elif args.threshold is not None:
if args.threshold <= 0 or args.threshold > 1:
raise ValueError("Invalid value for threshold")
Expand Down
45 changes: 25 additions & 20 deletions mandrake/dists.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,31 +61,36 @@ def sketchlibDists(sketch_db, dist_col, kNN, threshold, cpus, use_gpu, device_id
jaccard = False
if threshold > 0:
raise ValueError("Use kNN with --sketches")
I, J, dists = pp_sketchlib.querySelfSparse(sketch_db,
names,
kmers,
True,
jaccard,
kNN,
dist_col,
cpus)
I, J, dists = pp_sketchlib.querySelfSparse(ref_db_name=sketch_db,
rList=names,
klist=kmers,
random_correct=True,
jaccard=jaccard,
kNN=kNN,
dist_cutoff=0,
dist_col=dist_col,
num_threads=cpus,
use_gpu=use_gpu,
device_id=device_id)
else:
# older versions of sketchlib do a dense query then sparsify the
# return. Ok for smaller data, but runs out of memory on big datasets
# sketchlib API needs positive int for kNN
if kNN < 0:
kNN = 0
I, J, dists = pp_sketchlib.queryDatabaseSparse(sketch_db,
sketch_db,
names,
names,
kmers,
True,
threshold,
kNN,
dist_col == 0,
cpus,
use_gpu,
device_id)
if threshold <= 0:
threshold = 0.0
I, J, dists = pp_sketchlib.queryDatabaseSparse(ref_db_name=sketch_db,
query_db_name=sketch_db,
rList=names,
qList=names,
klist=kmers,
random_correct=True,
dist_cutoff=threshold,
kNN=kNN,
core=(dist_col == 0),
num_threads=cpus,
use_gpu=use_gpu,
device_id=device_id)

return I, J, dists, names
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,9 @@ def build_extension(self, ext):
'Intended Audience :: Science/Research',
'Topic :: Scientific/Engineering :: Bio-Informatics',
'License :: OSI Approved :: Apache Software License',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
],
python_requires='>=3.7.0',
python_requires='>=3.8.0',
keywords='bacteria genomics population-genetics k-mer visualisation',
packages=['mandrake'],
entry_points={
Expand Down
3 changes: 1 addition & 2 deletions test/run_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,13 @@
sys.stderr.write("Running with each input type\n")
subprocess.run(python_cmd + " ../mandrake-runner.py --alignment sub5k_hiv_refs_prrt_trim.fas --kNN 50 --cpus 2 --maxIter 1000000", shell=True, check=True)
subprocess.run(python_cmd + " ../mandrake-runner.py --sketches listeria.h5 --kNN 50 --cpus 2 --maxIter 1000000", shell=True, check=True)
subprocess.run(python_cmd + " ../mandrake-runner.py --sketches listeria.h5 --use-accessory --kNN 50 --cpus 2 --maxIter 1000000", shell=True, check=True)
subprocess.run(python_cmd + " ../mandrake-runner.py --accessory gene_presence_absence.Rtab --kNN 50 --cpus 2 --maxIter 1000000", shell=True, check=True)

sys.stderr.write("kNN and threshold both work\n")
subprocess.run(python_cmd + " ../mandrake-runner.py --alignment sub5k_hiv_refs_prrt_trim.fas --threshold 0.1 --cpus 2 --maxIter 1000000", shell=True, check=True)
subprocess.run(python_cmd + " ../mandrake-runner.py --sketches listeria.h5 --threshold 0.1 --use-accessory --cpus 2 --maxIter 1000000", shell=True, check=True)
subprocess.run(python_cmd + " ../mandrake-runner.py --accessory gene_presence_absence.Rtab --threshold 0.2 --cpus 2 --maxIter 1000000", shell=True, check=True)

# test updating order is correct
sys.stderr.write("Processing can be turned off\n")
# This won't necessarily work
# subprocess.run(python_cmd + " ../mandrake-runner.py --sketches listeria.h5 --kNN 50 --maxIter 10000000 --no-preprocessing", shell=True, check=True)
Expand Down