Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEW: Adds core beta diversity measures #6

Merged
merged 48 commits into from
Jun 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
06d4d61
adds @_safely_count_cpus, tests
ChrisKeefe Oct 12, 2019
ef17a8b
MAINT: tidies var names of expected results in test_alpha
ChrisKeefe Oct 17, 2019
4da3de4
adds vscode workspace files to gitignore
ChrisKeefe Oct 17, 2019
b294120
guards against empty table passed as fp, fixes CPU count decorator
ChrisKeefe Feb 5, 2020
5ac73ad
MAINT: replaces copy.deepcopy() with biom 'inplace=False'
ChrisKeefe Feb 7, 2020
7205ed2
adds registrations, citations, init for core beta diversity
ChrisKeefe Feb 7, 2020
fb4e59a
NEW: adds core beta diversity measures, basic tests
ChrisKeefe Feb 8, 2020
9d04464
adds test data for faith_pd refactor, fixes broken rel. freq table
ChrisKeefe Feb 10, 2020
d9b9b91
refactors faith_pd to use biocore's Unifrac
ChrisKeefe Feb 10, 2020
a390a5f
BUG: corrects filename mismatch
ChrisKeefe Feb 10, 2020
7db2de7
BUG: corrects non-firing test cases
ChrisKeefe Feb 10, 2020
1b75089
Adds n_jobs edge case handling, temp test for running methods through…
ChrisKeefe May 9, 2020
aef7749
HACK: patches inconsistent error messages from Unifrac
ChrisKeefe May 13, 2020
4e81604
properly adds method and test data for weighted unifrac
ChrisKeefe May 13, 2020
cdcda1c
minor comment cleanup
ChrisKeefe May 13, 2020
ee5eaf2
removes fancy unifracs. see draft_fancy_unifracs branch if needed
ChrisKeefe May 13, 2020
bec18de
hides variance_adjusted parameter in unifrac to prevent adjusting wit…
ChrisKeefe May 14, 2020
2b1ad18
LINT: reorganize imports to PEP8
ChrisKeefe May 14, 2020
1ebe57b
combines empty_tables decorators, moves it to decorator.docorator, vi…
ChrisKeefe May 15, 2020
96e4388
switches n_jobs decorator over to decorator.decorator, first through-…
ChrisKeefe May 15, 2020
e244509
adds decorator to ci meta.yaml, missing-lines report to make test-cov…
ChrisKeefe May 18, 2020
27df8f0
fixes _disallow_empty_tables bad view_type error message
ChrisKeefe May 18, 2020
fd7c695
BUG: test_alpha and test_util now pass proper view types
ChrisKeefe May 19, 2020
2bd531a
MAINT: organizes table/tree data consistently
ChrisKeefe May 19, 2020
9744ccd
simplifies type handling, removes invalid filepath testing
ChrisKeefe May 20, 2020
e8c74f2
adds dependencies to recipe
ChrisKeefe May 21, 2020
d59aab3
LINT: updates copyright headers
ChrisKeefe May 20, 2020
e06eb57
adds test data assets to setup.py
ChrisKeefe May 21, 2020
e85651e
removes travis.yml, removes travis build badge from README
ChrisKeefe May 22, 2020
c4c91e5
REVERT ME: testing a theory
thermokarst May 22, 2020
acc4daa
Revert "REVERT ME: testing a theory"
thermokarst May 22, 2020
c33b977
adds github actions workflow
ChrisKeefe May 20, 2020
0e3ec5e
ENH: Adds lint-build-test badge to readme (#4)
ChrisKeefe May 22, 2020
ab22b7f
SQUASH: naming, comment placement
ChrisKeefe May 27, 2020
a96c04d
BUG: fixes patch-related test issue visible on Darwin (#5)
ChrisKeefe May 27, 2020
229e79f
BUG: actually fixes patch-related test (#6)
ChrisKeefe May 28, 2020
ad5967b
SQUASH: fixes poorly-named variable mock_cpu_affinity
ChrisKeefe May 28, 2020
ae674f3
MAINT: gitignore notes directory and contents
ChrisKeefe May 29, 2020
fd0e24e
Makes quoting consistent: user-facing strings double, keys etc single
ChrisKeefe May 27, 2020
47daa08
SQUASH: minor comment cleanup
ChrisKeefe Jun 5, 2020
750836d
preliminary review changes, not incl n_jobs
ChrisKeefe Jun 8, 2020
6fcd5b7
renames 'n_jobs' to 'threads' as needed, improves semantic type
ChrisKeefe Jun 8, 2020
f3e9056
Handles 'auto' passed to cpu-request params, renames tests from 'n_jobs'
ChrisKeefe Jun 9, 2020
687401d
Wordsmiths cpu-request parameter descriptions
ChrisKeefe Jun 9, 2020
d5fdd8e
MAINT: Test-data file lint, removes two_feature_table.tsv
ChrisKeefe Jun 9, 2020
b9f5168
test data file paths not instance vars, fixes tree view type
ChrisKeefe Jun 10, 2020
b662efd
LINT: removes unused io import
ChrisKeefe Jun 10, 2020
226be5b
Updates requested_cpuss test class name
ChrisKeefe Jun 10, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .github/workflows/lint-build-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: lint-build-test
# build on every PR and commit to master
on:
pull_request:
push:
branches:
- master

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: set up python 3.6
uses: actions/setup-python@v1
with:
python-version: 3.6
- name: install dependencies
run: python -m pip install --upgrade pip
- name: lint
run: |
pip install -q https://github.com/qiime2/q2lint/archive/master.zip
q2lint
pip install -q flake8
flake8

build-and-test:
needs: lint
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
# for versioneer
- run: git fetch --depth=1 origin +refs/tags/*:refs/tags/*
- uses: qiime2/action-library-packaging@alpha1
with:
plugin-name: q2-diversity-lib
additional-tests: pytest --pyargs q2_diversity_lib
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,7 @@ node_modules

# VSCode dotfiles
.vscode/*
*.code-workspace

# project notes
notes/
25 changes: 0 additions & 25 deletions .travis.yml

This file was deleted.

2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ test: all
py.test

test-cov: all
py.test --cov=q2_diversity_lib
py.test --cov-report=term-missing --cov=q2_diversity_lib

install:
$(PYTHON) setup.py install
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# q2-diversity-lib

[![Build Status](https://travis-ci.org/qiime2/q2-diversity-lib.svg?branch=master)](https://travis-ci.org/qiime2/q2-diversity-lib)
![](https://github.com/qiime2/q2-diversity-lib/workflows/lint-build-test/badge.svg)
[![Coverage Status](https://coveralls.io/repos/github/qiime2/q2-diversity-lib/badge.svg?branch=master)](https://coveralls.io/github/qiime2/q2-diversity-lib?branch=master)

This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.
7 changes: 5 additions & 2 deletions ci/recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,14 @@ requirements:
- setuptools

run:
- pandas
- scikit-bio
- biom-format >=2.1.5,<2.2.0
- decorator
- pandas
- psutil
- qiime2 {{ release }}.*
- q2-types {{ release }}.*
- scikit-bio
- unifrac

test:
imports:
Expand Down
4 changes: 3 additions & 1 deletion q2_diversity_lib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,13 @@

from .alpha import (faith_pd, observed_features, pielou_evenness,
shannon_entropy)
from .beta import (bray_curtis, jaccard, unweighted_unifrac, weighted_unifrac)
from ._version import get_versions

__version__ = get_versions()['version']
del get_versions


__all__ = ['faith_pd', 'observed_features', 'pielou_evenness',
'shannon_entropy']
'shannon_entropy', 'bray_curtis', 'jaccard', 'unweighted_unifrac',
'weighted_unifrac']
95 changes: 79 additions & 16 deletions q2_diversity_lib/_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,19 @@
# The full license is in the file LICENSE, distributed with this software.
# ----------------------------------------------------------------------------

import numpy as np
from functools import wraps
from inspect import signature

import numpy as np
from decorator import decorator
import psutil
import biom

from q2_types.feature_table import BIOMV210Format

skbio_methods = ["bray_curtis", "jaccard"]
unifrac_methods = ["unweighted_unifrac", "weighted_unifrac",
"faith_pd"]


def _drop_undefined_samples(counts: np.ndarray, sample_ids: np.ndarray,
minimum_nonzero_elements: int) -> (np.ndarray,
Expand All @@ -22,17 +31,71 @@ def _drop_undefined_samples(counts: np.ndarray, sample_ids: np.ndarray,
return (filtered_counts, filtered_sample_ids)


def _disallow_empty_tables(some_function):
@wraps(some_function)
def wrapper(*args, **kwargs):
try:
bound_signature = signature(wrapper).bind(*args, **kwargs)
table = bound_signature.arguments['table']
except KeyError as ex:
raise TypeError("The wrapped function has no parameter "
+ str(ex) + ".")
else:
if table.is_empty():
raise ValueError("The provided table object is empty")
return some_function(*args, **kwargs)
return wrapper
@decorator
def _disallow_empty_tables(wrapped_function, *args, **kwargs):
bound_signature = signature(wrapped_function).bind(*args, **kwargs)
table = bound_signature.arguments.get('table')
if table is None:
raise TypeError("The wrapped function has no parameter 'table'")

if isinstance(table, BIOMV210Format):
table = str(table)
table_obj = biom.load_table(table)
elif isinstance(table, biom.Table):
table_obj = table
else:
raise ValueError("Invalid view type: table passed as "
f"{type(table)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal opinion: in-lining function calls in f-strings decreases readability. No need to change, but just thought I would provide my unsolicited advice. Options:

# 1
table_type = type(table)
raise ValueError(f"Invalid view type: table passed as {table_type}")

#2 
raise ValueError("Invalid view type: table passed as %r" % (type(table),))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here, but am curious about why you feel this way. Though I agree that a clearly-named variable (e.g. table_type) is almost always more readable than a function call, I feel like the ability to inline clear/simple calls in fstrings reduces clutter. There is the slippery slope toward illegible inline function calls to think about, but I'm not sure that's a reason to avoid it entirely.

As for example 2, I think f strings are simply more readable on a fundamental level. Put the variables where you want them. Experienced developers might be used to c-style string formatting, but that syntax is inherently more complex and harder to read regardless of whether you're using variables or function calls.


if table_obj.is_empty():
raise ValueError("The provided table is empty")

return wrapped_function(*args, **kwargs)


@decorator
def _validate_requested_cpus(wrapped_function, *args, **kwargs):
bound_signature = signature(wrapped_function).bind(*args, **kwargs)
bound_signature.apply_defaults()

# Handle duplicate param names
if all(params in bound_signature.arguments
for params in ['n_jobs', 'threads']):
raise TypeError("Duplicate parameters: The _validate_requested_cpus "
"decorator may not be applied to callables with both "
"'n_jobs' and 'threads' parameters. Do you really need"
" both?")

# Handle cpu requests coming from different parameter names
if 'n_jobs' in bound_signature.arguments:
param_name = 'n_jobs'
cpus_requested = bound_signature.arguments[param_name]
elif 'threads' in bound_signature.arguments:
param_name = 'threads'
cpus_requested = bound_signature.arguments[param_name]
else:
raise TypeError("The _validate_requested_cpus decorator may not be"
" applied to callables without an 'n_jobs' or "
"'threads' parameter.")

# If `Process.cpu_affinity` unavailable on system, fall back
# https://psutil.readthedocs.io/en/latest/index.html#psutil.cpu_count
try:
cpus = len(psutil.Process().cpu_affinity())
except AttributeError:
cpus = psutil.cpu_count(logical=False)

if isinstance(cpus_requested, int) and cpus_requested > cpus:
raise ValueError(f"The value passed to '{param_name}' cannot exceed "
f"the number of processors ({cpus}) available to "
"the system.")

if cpus_requested == 'auto':
# remove 'auto' from args to prevent 'multiple values' TypeError...
argslist = list(args)
argslist.remove('auto')
return_args = tuple(argslist)
# ...then inject number of available cpus
return wrapped_function(*return_args, **kwargs, **{param_name: cpus})

return wrapped_function(*args, **kwargs)
29 changes: 10 additions & 19 deletions q2_diversity_lib/alpha.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,31 +6,22 @@
# The full license is in the file LICENSE, distributed with this software.
# ----------------------------------------------------------------------------

import biom
import pandas as pd
import skbio.diversity
import biom
from unifrac import faith_pd as f_pd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from unifrac import faith_pd as f_pd
import unifrac


from ._util import _drop_undefined_samples, _disallow_empty_tables
from q2_types.feature_table import BIOMV210Format
from q2_types.tree import NewickFormat
from ._util import (_drop_undefined_samples,
_disallow_empty_tables)


@_disallow_empty_tables
def faith_pd(table: biom.Table, phylogeny: skbio.TreeNode) -> pd.Series:
presence_absence_table = table.pa()
counts = presence_absence_table.matrix_data.toarray().astype(int).T
sample_ids = presence_absence_table.ids(axis='sample')
feature_ids = presence_absence_table.ids(axis='observation')

try:
result = skbio.diversity.alpha_diversity(metric='faith_pd',
counts=counts,
ids=sample_ids,
otu_ids=feature_ids,
tree=phylogeny)
except skbio.tree.MissingNodeError as e:
message = str(e).replace('otu_ids', 'feature_ids')
message = message.replace('tree', 'phylogeny')
raise skbio.tree.MissingNodeError(message)

def faith_pd(table: BIOMV210Format, phylogeny: NewickFormat) -> pd.Series:
table_str = str(table)
tree_str = str(phylogeny)
Comment on lines +22 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically, this doesn't match what you have done in the unifrac methods, below. I like what you've done in those unifrac methods, by in-lining the str, I suggest you do that here, too.

result = f_pd(table_str, tree_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = f_pd(table_str, tree_str)
result = unifrac.faith_pd(table_str, tree_str)

result.name = 'faith_pd'
return result

Expand Down
71 changes: 71 additions & 0 deletions q2_diversity_lib/beta.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2018-2020, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file LICENSE, distributed with this software.
# ----------------------------------------------------------------------------

import biom
import skbio.diversity
import sklearn.metrics
import unifrac

from q2_types.feature_table import BIOMV210Format
from q2_types.tree import NewickFormat
from ._util import (_disallow_empty_tables,
_validate_requested_cpus)


# --------------------Non-Phylogenetic-----------------------
@_disallow_empty_tables
@_validate_requested_cpus
def bray_curtis(table: biom.Table, n_jobs: int = 1) -> skbio.DistanceMatrix:
thermokarst marked this conversation as resolved.
Show resolved Hide resolved
counts = table.matrix_data.toarray().T
sample_ids = table.ids(axis='sample')
return skbio.diversity.beta_diversity(
metric='braycurtis',
counts=counts,
ids=sample_ids,
validate=True,
pairwise_func=sklearn.metrics.pairwise_distances,
n_jobs=n_jobs
)


@_disallow_empty_tables
@_validate_requested_cpus
def jaccard(table: biom.Table, n_jobs: int = 1) -> skbio.DistanceMatrix:
counts = table.matrix_data.toarray().T
sample_ids = table.ids(axis='sample')
return skbio.diversity.beta_diversity(
metric='jaccard',
counts=counts,
ids=sample_ids,
validate=True,
pairwise_func=sklearn.metrics.pairwise_distances,
n_jobs=n_jobs
)


# ------------------------Phylogenetic-----------------------
@_disallow_empty_tables
@_validate_requested_cpus
def unweighted_unifrac(table: BIOMV210Format,
phylogeny: NewickFormat,
threads: int = 1,
bypass_tips: bool = False) -> skbio.DistanceMatrix:
return unifrac.unweighted(str(table), str(phylogeny), threads=threads,
variance_adjusted=False, bypass_tips=bypass_tips)


@_disallow_empty_tables
@_validate_requested_cpus
def weighted_unifrac(table: BIOMV210Format,
phylogeny: NewickFormat,
threads: int = 1,
bypass_tips: bool = False) -> skbio.DistanceMatrix:
return unifrac.weighted_unnormalized(str(table), str(phylogeny),
threads=threads,
variance_adjusted=False,
bypass_tips=bypass_tips)
Loading