Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm runner #61

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
4ea4677
EOD
jkanche Nov 22, 2024
8be9e3e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 22, 2024
298f4d1
Merge branch 'master' into refactor-layers
jkanche Nov 22, 2024
7f72009
Merge branch 'refactor-layers' of https://github.com/BiocPy/cellarr i…
jkanche Nov 22, 2024
acbc55f
EOD
jkanche Nov 23, 2024
88e1146
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 23, 2024
8e22118
there's many changes to support building the cellarr collection and q…
jkanche Nov 25, 2024
0ba6f4f
Merge branch 'refactor-layers' of https://github.com/BiocPy/cellarr i…
jkanche Nov 25, 2024
09b41ca
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 25, 2024
d66a593
reset sample index
jkanche Nov 25, 2024
89b4358
does the pool need to return?
jkanche Nov 26, 2024
fb67352
is fork the problem?
jkanche Nov 26, 2024
b9b46b5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 26, 2024
6548d59
add checks with threads
jkanche Nov 26, 2024
6f84688
run autoencoder tests only on github action
jkanche Nov 26, 2024
03eb747
fix docstring typos
jkanche Nov 26, 2024
7014544
update assets
jkanche Nov 26, 2024
6835ad1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 26, 2024
08d94a3
update README
jkanche Nov 26, 2024
fb0f2f2
update docstrings throughout
jkanche Nov 26, 2024
fa6c3d1
filter dataframes with tiledb query expressions
jkanche Nov 26, 2024
bf71cd5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 26, 2024
62365ff
fix dataloader when filtering query conditions
jkanche Nov 26, 2024
60c93b8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 26, 2024
0db584c
back to using remap
jkanche Nov 26, 2024
b28f037
get all cells for a sample
jkanche Nov 26, 2024
eef1a41
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 26, 2024
0577a8d
minor changes to README
jkanche Nov 27, 2024
f7cd450
separate assay group
jkanche Nov 27, 2024
d21f5a3
add caching and with clause support
jkanche Nov 27, 2024
b3e7240
slurm runner
jkanche Nov 27, 2024
91a7121
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 28, 2024
429bd75
Merge branch 'master' into slurm-runs
jkanche Nov 28, 2024
a7cda17
Merge branch 'slurm-runs' of https://github.com/BiocPy/cellarr into s…
jkanche Nov 28, 2024
5323275
minor edits
jkanche Dec 4, 2024
5d6302e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 4, 2024
908e763
Merge branch 'master' into slurm-runs
jkanche Dec 12, 2024
56488b5
remove dead code
jkanche Dec 12, 2024
c8b95b6
more changes for a dev release
jkanche Dec 12, 2024
c19e888
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 12, 2024
08f9f8d
slight modifications
jkanche Dec 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 47 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ datasets but can be generalized to store any 2-dimensional experimental data.

To get started, install the package from [PyPI](https://pypi.org/project/cellarr/)

```bash
```sh
pip install cellarr

## to include optional dependencies
Expand Down Expand Up @@ -123,6 +123,52 @@ print(dataset)

Check out the [documentation](https://biocpy.github.io/cellarr/tutorial.html) for more details.

### Building on HPC environments with `slurm`

To simplify building TileDB files on HPC environments that use `slurm`, there are a few steps you need to follow.

- Step 1: Construct a manifest file
A minimal manifest file (json) must contain the following fields
- `"files"`: A list of file path to the input `h5ad` objects.
- `"python_env"`: A set of commands to activate the Python environment containing this package and its dependencies.

Here’s an example of the manifest file:

```py
manifest = {
"files": your/list/of/files,
"python_env": """
ml Miniforge3
conda activate cellarr

python --version
which python
""",
"matrix_options": [
{
"matrix_name": "non_zero_cells",
"dtype": "uint32"
},
{
"matrix_name": "pseudo_bulk_log_normed",
"dtype": "float32"
}
],
}

import json
json.dump(manifest, open("your/path/to/manifest.json", "w"))
```

For more options, check out the [README](./src/cellarr/slurm/README.md).

- Step 2: Submit the job
Once your manifest file is ready, you can submit the necessary jobs using the `cellarr_build` CLI. Run the following command:

```sh
cellarr_build --input-manifest your/path/to/manifest.json --output-dir your/path/to/output --memory-per-job 8 --cpus-per-task 2
```

### Query a `CellArrDataset`

Users have the option to reuse the `dataset` object returned when building the dataset or by creating a `CellArrDataset` object by initializing it to the path where the files were created.
Expand Down
4 changes: 2 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@ testing =

[options.entry_points]
# Add here console scripts like:
# console_scripts =
# script_name = cellarr.module:function
console_scripts =
cellarr_build = cellarr.slurm.build_cellarr_steps:main
# For example:
# console_scripts =
# fibonacci = cellarr.skeleton:run
Expand Down
2 changes: 1 addition & 1 deletion src/cellarr/buildutils_tiledb_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def create_tiledb_frame_from_column_names(
)


def create_tiledb_frame_from_dataframe(tiledb_uri_path: str, frame: List[str], column_types=dict):
def create_tiledb_frame_from_dataframe(tiledb_uri_path: str, frame: List[str], column_types: dict = None):
"""Create a TileDB file with the provided attributes to persistent storage.

This will materialize the array directory and all
Expand Down
65 changes: 65 additions & 0 deletions src/cellarr/slurm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@

# manifest json

```json
{
"files": [
"/path/to/dataset1.h5ad",
"/path/to/dataset2.h5ad"
],
"matrix_options": [
{
"matrix_name": "counts",
"dtype": "uint32"
},
{
"matrix_name": "normalized",
"dtype": "float32"
}
],
"gene_options": {
"feature_column": "index"
},
"sample_options": {
"metadata": {
"sample_1": {
"condition": "control",
"batch": "1"
},
"sample_2": {
"condition": "treatment",
"batch": "1"
}
}
},
"cell_options": {
"column_types": {
"cell_type": "ascii",
"quality_score": "float32"
},
},
"python_env": """
. /system/gredit/clientos/etc/profile

ml Miniforge3
conda activate biocpy_miniforge

~/.conda/envs/biocpy_miniforge/bin/python --version
which python
python --version
""",
}
```


Run

```sh

python build_cellarr_steps.py \
--input-manifest manifest.json \
--output-dir /path/to/output \
--memory-per-job 64 \
--cpus-per-task 4

```
Empty file added src/cellarr/slurm/__init__.py
Empty file.
Loading
Loading