Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2.0 #96

Merged
merged 86 commits into from
May 3, 2023
Merged

V2.0 #96

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
1d566ec
wip
ckmah Jul 26, 2022
971adf5
cleanup
ckmah Aug 16, 2022
091617c
simplify tensor decomposition for signature analysis
ckmah Aug 16, 2022
beaa4d9
modularize counting neighbors
ckmah Aug 16, 2022
1b4bec0
Merge branch 'colocation' of https://github.com/ckmah/bento-tools int…
ckmah Aug 16, 2022
051b31d
modularize neighbor counting
ckmah Aug 16, 2022
d9e9321
simplify decomposition and use rasterio for shapes
ckmah Aug 16, 2022
17820d4
cleanup diff plots
ckmah Aug 29, 2022
7851f08
shape features
ckmah Aug 29, 2022
6bc5a9e
resolve merge conflicts
ckmah Aug 29, 2022
0232082
refactor sample features to be combinatorial, cleanup api
ckmah Aug 31, 2022
ffd79dd
feature refactor
ckmah Sep 15, 2022
f88d3d4
unstable
ckmah Oct 11, 2022
e4576ac
pretty colors
ckmah Oct 18, 2022
2d4329c
wip enrichment, embedding, plotting and more
ckmah Oct 26, 2022
9888d85
wip start splitting up plot functions
ckmah Oct 29, 2022
3045d9d
wip many things
ckmah Nov 12, 2022
0ac4590
wip
ckmah Dec 13, 2022
16a1b5c
reworked some api and updated rna flow method
ckmah Jan 3, 2023
ce15288
add minisom dep
ckmah Jan 3, 2023
673e722
add robust CLR transform to flow; flowmap auto cluster flow embedding
ckmah Jan 11, 2023
dd8ae28
remove clr from flow, add shape compositions, cleanup shape features,…
ckmah Jan 18, 2023
09bcd56
cleanup flow, colocation, syncing utility functions
ckmah Feb 1, 2023
2efe13e
bugfixes
ckmah Feb 9, 2023
823f759
n_neighbors flow increase usability
amonell Feb 10, 2023
5fc20bb
Merge pull request #89 from ckmah/colocation
ckmah Feb 14, 2023
e91d854
auto install poetry and dev bento-tools
ckmah Feb 14, 2023
c540403
bug fix for geopandas df IO of non-shapely objects
noor01 Feb 19, 2023
a7312c7
obs_stats use stripplots
ckmah Feb 20, 2023
ef7bd8d
geo format convert handle None
ckmah Feb 28, 2023
0217658
plotting improvements
ckmah Feb 28, 2023
51ff501
flowmap polygons allow inner rings
ckmah Feb 28, 2023
85925f7
plot flowmap elbow plot fast, cleanup imports
ckmah Feb 28, 2023
09c44c1
mistaken deleted code
ckmah Feb 28, 2023
6095040
build codespace container with poetry and package dev enabled
ckmah Feb 17, 2023
a23bd8b
dependency cleanup
ckmah Feb 17, 2023
1c611be
cleanup tools api
ckmah Feb 17, 2023
c200e01
geometry cleanup
ckmah Feb 28, 2023
9a7092e
remove lock file from repo
ckmah Feb 28, 2023
265a245
more code cleanup
ckmah Feb 28, 2023
8499a1d
remove preprocessing module
ckmah Mar 1, 2023
80beaad
rename flow to flux
ckmah Mar 1, 2023
da9d900
cleanup savefig and add xia2019 gs
ckmah Mar 13, 2023
9269207
Pending changes exported from your codespace
ckmah Mar 13, 2023
5432d20
Merge branch 'codespace-ckmah-supreme-robot-x97xw9gp56hvv7' into v2.0
ckmah Mar 14, 2023
89597f8
extract pt registration fn
ckmah Mar 21, 2023
2efafbf
modularize enrichment fns
ckmah Mar 21, 2023
10a3b55
obs stats nucleus optional
ckmah Mar 21, 2023
f5a4d5d
fix cbars overlapping ticklabels
ckmah Mar 21, 2023
94a0734
refactor plotting API
ckmah Mar 21, 2023
11aa465
update deps
ckmah Mar 21, 2023
bb02a9d
update api docs
ckmah Mar 21, 2023
4045b9d
inset colorbar
ckmah Mar 31, 2023
1d9534e
list full feature descriptions
ckmah Mar 31, 2023
b99725d
bigfixes, colors, better default params, formatting, comments
ckmah Mar 31, 2023
d589897
update tests
ckmah Mar 31, 2023
aded187
revamp tutorials
ckmah Mar 31, 2023
0c0e326
mroe doc updates
ckmah Apr 1, 2023
41e28e6
fix fluxmap save fig
ckmah Apr 3, 2023
63b5d2a
typechecking
ckmah Apr 3, 2023
b3bdde2
cleanup and bump version
ckmah Apr 3, 2023
1107d6f
doc update
ckmah Apr 3, 2023
22dbe37
Merge branch 'master' into v2.0
ckmah Apr 4, 2023
a233227
Update python-package.yml
ckmah Apr 4, 2023
814f10c
Update python-package.yml
ckmah Apr 4, 2023
7d67384
Update python-package.yml
ckmah Apr 4, 2023
e172e0a
add install info for non-python deps
ckmah Apr 4, 2023
727f817
Merge branch 'v2.0' of github.com:ckmah/bento-tools into v2.0
ckmah Apr 4, 2023
fe000bd
typo
ckmah Apr 4, 2023
26d4e90
Merge pull request #97 from ckmah/ckmah-unit-tests
ckmah Apr 4, 2023
0a9082f
Update python-package.yml
ckmah Apr 4, 2023
bbe63bf
cleanup test warnings
ckmah Apr 4, 2023
ef839a6
Merge branch 'v2.0' of github.com:ckmah/bento-tools into v2.0
ckmah Apr 4, 2023
f6525db
Update python-package.yml
ckmah Apr 4, 2023
2325a66
plotting tests
ckmah Apr 5, 2023
3ae4dcf
Merge branch 'v2.0' of github.com:ckmah/bento-tools into v2.0
ckmah Apr 5, 2023
d64c875
syntax errors in tests
ckmah Apr 5, 2023
e8b952c
handle slicing metadata by type, ignore empty
ckmah Apr 24, 2023
ee38d89
hide obnoxious progress bars
ckmah Apr 24, 2023
ef90f37
features do not overwrite by default
ckmah Apr 24, 2023
bfb382c
fix typos
ckmah Apr 24, 2023
1299972
upgrade anndata to >=0.8
ckmah Apr 24, 2023
e32336d
quantile plot handles nan values
ckmah Apr 27, 2023
0b754a4
features handle missing and recompute flag, add tests
ckmah Apr 27, 2023
fa70926
fix typos, docs, explicit params
ckmah Apr 27, 2023
90f4ab4
Merge pull request #98 from ckmah/bugfix-avery
ckmah May 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
// "forwardPorts": [],

// Use 'postCreateCommand' to run commands after the container is created.
// "postCreateCommand": "pip3 install --user -r requirements.txt",
"postCreateCommand": "pip3 install poetry==1.2.0; pip3 install -e .",

// Comment out connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
"remoteUser": "vscode"
Expand Down
17 changes: 7 additions & 10 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,33 +15,30 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os : [ubuntu-18.04, macos-11, macos-12, windows-2019]
os : [ubuntu-22.04, macos-11, macos-12, windows-2019]
python-version: ['3.8', '3.9']
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3.5.0
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v3.1.3
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[torch,docs]
- name: Lint with flake8
if: ${{ matrix.os == 'ubuntu-18.04' && matrix.python-version == '3.8' }}
python -m pip install .[docs]
- name: Lint & test coverage
if: ${{ matrix.os == 'ubuntu-22.04' && matrix.python-version == '3.8' }}
run: |
pip install flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Generate Report
if: ${{ matrix.os == 'ubuntu-18.04' && matrix.python-version == '3.8' }}
run: |
pip install coverage
coverage run -m unittest
- name: Upload Coverage to Codecov
if: ${{ matrix.os == 'ubuntu-18.04' && matrix.python-version == '3.8' }}
if: ${{ matrix.os == 'ubuntu-22.04' && matrix.python-version == '3.8' }}
uses: codecov/codecov-action@v1
with:
fail_ci_if_error: true
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Poetry
poetry.lock

.vscode/

# Byte-compiled / optimized / DLL files
Expand Down Expand Up @@ -132,3 +135,7 @@ dmypy.json
# Pyre type checker
.pyre/
docs/build.zip
tests/data/processed/data.pt
tests/data/processed/pre_filter.pt
tests/data/processed/pre_transform.pt
.DS_Store
27 changes: 4 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,15 @@

[![PyPI version](https://badge.fury.io/py/bento-tools.svg)](https://badge.fury.io/py/bento-tools)
[![codecov](https://codecov.io/gh/ckmah/bento-tools/branch/master/graph/badge.svg?token=XVHDKNDCDT)](https://codecov.io/gh/ckmah/bento-tools)
[![Documentation Status](https://readthedocs.org/projects/bento-tools/badge/?version=latest)](https://bento-tools.readthedocs.io/en/latest/?badge=latest)
![PyPI - Downloads](https://img.shields.io/pypi/dm/bento-tools)
[![GitHub stars](https://badgen.net/github/stars/ckmah/bento-tools)](https://GitHub.com/Naereen/ckmah/bento-tools)

> ### :warning: Significant upgrades coming soon, with additional analysis and data ingestion methods!

<img src="docs/source/_static/bento-name.png" alt="Bento Logo" width=350>

Bento is a Python toolkit for performing subcellular analysis of spatial transcriptomics data.

# Get started
Install with Python >=3.8 and <3.11:
```bash
pip install bento-tools
```

Check out the [documentation](https://bento-tools.readthedocs.io/en/latest/) for the installation guide, tutorials, API and more! Read and cite [our preprint](https://doi.org/10.1101/2022.06.10.495510) if you use Bento in your work.


# Main Features

<img src="docs/source/_static/tutorial_img/bento_workflow.png" alt="Bento Analysis Workflow" width=800>
# Bento

Bento is a Python toolkit for performing subcellular analysis of spatial transcriptomics data. The package is part of the [Scverse ecosystem](https://scverse.org/packages/#ecosystem). Check out the [documentation](https://bento-tools.readthedocs.io/en/latest/) for installation instructions, tutorials, and API. Cite [our preprint](https://doi.org/10.1101/2022.06.10.495510) if you use Bento in your work. Thanks!

- Store molecular coordinates and segmentation masks
- Visualize spatial transcriptomics data at subcellular resolution
- Compute subcellular spatial features
- Predict localization patterns and signatures
- Factor decomposition for high-dimensional spatial feature sets
<img src="docs/source/_static/tutorial_img/bento_tools.png" alt="Bento Workflow" width="800">

---
[![GitHub license](https://img.shields.io/github/license/ckmah/bento-tools.svg)](https://github.com/ckmah/bento-tools/blob/master/LICENSE)
8 changes: 5 additions & 3 deletions bento/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
from . import datasets
from . import datasets as ds
from . import io
from . import plotting as pl
from . import preprocessing as pp
from . import tools as tl
from ._utils import PATTERN_NAMES, TENSOR_DIM_NAMES
from . import _utils as ut
from . import geometry as geo
from .plotting import _colors as colors
from ._utils import sync
18 changes: 18 additions & 0 deletions bento/_constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
PATTERN_COLORS = ["#17becf", "#1f77b4", "#7f7f7f", "#ff7f0e", "#d62728"]
PATTERN_NAMES = ["cell_edge", "cytoplasmic", "none", "nuclear", "nuclear_edge"]
PATTERN_PROBS = [f"{p}_p" for p in PATTERN_NAMES]
PATTERN_FEATURES = [
"cell_inner_proximity",
"nucleus_inner_proximity",
"nucleus_outer_proximity",
"cell_inner_asymmetry",
"nucleus_inner_asymmetry",
"nucleus_outer_asymmetry",
"l_max",
"l_max_gradient",
"l_min_gradient",
"l_monotony",
"l_half_radius",
"point_dispersion_norm",
"nucleus_dispersion_norm",
]
203 changes: 182 additions & 21 deletions bento/_utils.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,13 @@
import inspect
from functools import wraps

from anndata import AnnData

import warnings
import geopandas as gpd
import pandas as pd
import seaborn as sns
from anndata import AnnData
from functools import wraps
from typing import Iterable
from shapely import wkt

PATTERN_NAMES = ["cell_edge", "cytoplasmic", "none", "nuclear", "nuclear_edge"]
PATTERN_PROBS = [f"{p}_p" for p in PATTERN_NAMES]
TENSOR_DIM_NAMES = ["layers", "cells", "genes"]

# Colors correspond to order of PATTERN_NAMES: cyan, blue, gray, orange, red
PATTERN_COLORS = ['#17becf', '#1f77b4', '#7f7f7f', '#ff7f0e', '#d62728']

# Colors to represent each dimension (features, cells, genes); Set2 palette n_colors=3
DIM_COLORS = ['#66c2a5', '#fc8d62', '#8da0cb']
# ['#AD6A6C', '#f5b841', '#0cf2c9']

def get_default_args(func):
signature = inspect.signature(func)
Expand All @@ -28,7 +21,7 @@ def get_default_args(func):
def track(func):
"""
Track changes in AnnData object after applying function.

1. First remembers a shallow list of AnnData attributes by listing keys from obs, var, etc.
2. Perform arbitrary task
3. List attributes again, perform simple diff between list of old and new attributes
Expand Down Expand Up @@ -70,7 +63,6 @@ def wrapper(*args, **kwds):

modified = False
for attr in old_attr.keys():

if attr == "n_obs" or attr == "n_vars":
continue

Expand Down Expand Up @@ -146,16 +138,185 @@ def pheno_to_color(pheno, palette):
List of converted colors for each sample, formatted as RGBA tuples.

"""
import seaborn as sns

if type(palette) is str:
if isinstance(palette, str):
palette = sns.color_palette(palette)
else:
palette = palette

values = list(set(pheno))
values.sort()
palette = sns.color_palette(palette, n_colors=len(values))
study2color = dict(zip(values, palette))
sample_colors = [study2color[v] for v in pheno]
return study2color, sample_colors


def sync(data, copy=False):
"""
Sync existing point sets and associated metadata with data.obs_names and data.var_names

Parameters
----------
data : AnnData
Spatial formatted AnnData object
copy : bool, optional
"""
adata = data.copy() if copy else data

if "point_sets" not in adata.uns.keys():
adata.uns["point_sets"] = dict(points=[])

# Iterate over point sets
for point_key in adata.uns["point_sets"]:
points = adata.uns[point_key]

# Subset for cells
cells = adata.obs_names.tolist()
in_cells = points["cell"].isin(cells)

# Subset for genes
in_genes = [True] * points.shape[0]
if "gene" in points.columns:
genes = adata.var_names.tolist()
in_genes = points["gene"].isin(genes)

# Combine boolean masks
valid_mask = (in_cells & in_genes).values

# Sync points using mask
points = points.loc[valid_mask]

# Remove unused categories for categorical columns
for col in points.columns:
if points[col].dtype == "category":
points[col].cat.remove_unused_categories(inplace=True)

adata.uns[point_key] = points

# Sync point metadata using mask
for metadata_key in adata.uns["point_sets"][point_key]:
if metadata_key not in adata.uns:
warnings.warn(
f"Skipping: metadata {metadata_key} not found in adata.uns"
)
continue

metadata = adata.uns[metadata_key]
# Slice DataFrame if not empty
if isinstance(metadata, pd.DataFrame) and not metadata.empty:
adata.uns[metadata_key] = metadata.loc[valid_mask, :]

# Slice Iterable if not empty
elif isinstance(metadata, list) and any(metadata):
adata.uns[metadata_key] = [
m for i, m in enumerate(metadata) if valid_mask[i]
]
elif isinstance(metadata, Iterable) and metadata.shape[0] > 0:
adata.uns[metadata_key] = adata.uns[metadata_key][valid_mask]
else:
warnings.warn(f"Metadata {metadata_key} is not a DataFrame or Iterable")

return adata if copy else None


def _register_points(data, point_key, metadata_keys):
required_cols = ["x", "y", "cell"]

if point_key not in data.uns.keys():
raise ValueError(f"Key {point_key} not found in data.uns")

points = data.uns[point_key]

if not all([col in points.columns for col in required_cols]):
raise ValueError(
f"Point DataFrame must have columns {', '.join(required_cols)}"
)

# Check for valid cells
cells = data.obs_names.tolist()
if not points["cell"].isin(cells).all():
raise ValueError("Invalid cells in point DataFrame")

# Initialize/add to point registry
if "point_sets" not in data.uns.keys():
data.uns["point_sets"] = dict()

if point_key not in data.uns["point_sets"].keys():
data.uns["point_sets"][point_key] = []

if len(metadata_keys) < 0:
return

# Register metadata
for key in metadata_keys:
# Check for valid metadata
if key not in data.uns.keys():
raise ValueError(f"Key {key} not found in data.uns")

n_points = data.uns[point_key].shape[0]
metadata_len = data.uns[key].shape[0]
if metadata_len != n_points:
raise ValueError(
f"Metadata {key} must have same length as points {point_key}"
)

# Add metadata key to registry
if key not in data.uns["point_sets"][point_key]:
data.uns["point_sets"][point_key].append(key)


def register_points(point_key: str, metadata_keys: list):
"""Decorator function to register points to the current `AnnData` object.
This keeps track of point sets and keeps them in sync with `AnnData` object.

Parameters
----------
point_key : str
Key where points are stored in `data.uns`
metadata_keys : list
Keys where point metadata are stored in `data.uns`
"""

def decorator(func):
@wraps(func)
def wrapper(*args, **kwds):
kwargs = get_default_args(func)
kwargs.update(kwds)

func(*args, **kwds)
data = args[0]
# Check for required columns
return _register_points(data, point_key, metadata_keys)

return wrapper

return decorator


def sc_format(data, copy=False):
"""
Convert data.obs GeoPandas columns to string for compatibility with scanpy.
"""
adata = data.copy() if copy else data

shape_names = data.obs.columns.str.endswith("_shape")

for col in data.obs.columns[shape_names]:
adata.obs[col] = adata.obs[col].astype(str)

return adata if copy else None


def geo_format(data, copy=False):
"""
Convert data.obs scanpy columns to GeoPandas compatible types.
"""
adata = data.copy() if copy else data

shape_names = adata.obs.columns[adata.obs.columns.str.endswith("_shape")]

adata.obs[shape_names] = adata.obs[shape_names].apply(
lambda col: gpd.GeoSeries(
col.astype(str).apply(lambda val: wkt.loads(val) if val != "None" else None)
)
)

return adata if copy else None
6 changes: 5 additions & 1 deletion bento/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
from ._datasets import get_dataset_info, load_dataset, sample_data
from ._datasets import (
get_dataset_info,
load_dataset,
sample_data,
)
Loading