Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add dataset addition to eligible CLI tools #293

Merged
merged 18 commits into from
Apr 11, 2022
Merged

feat: Add dataset addition to eligible CLI tools #293

merged 18 commits into from
Apr 11, 2022

Conversation

aauker
Copy link
Contributor

@aauker aauker commented Apr 5, 2022

Adds dataset manager to CLI runner tool, to interface output segments to a "dataset"

  • Added "dataset_id" param which will add any "feature_data" (in parquet from) as a corresponding segment in the dataset
  • Option to use waystation or regular file-based dataset addition, via DATASET_URL (http being the rest endpoint).
  • Try and use parquet for all tabular data (protection of typing), mostly for the tile manifests
  • Some improvements to logging statements, execution flow clarity

Transform Design - Page 1 (2)

Two example workflows using new functionality:

  1. https://mskcc.box.com/s/zt1shn3fwt5x87r1tys33vxazmhevwk8
  2. https://mskcc.box.com/s/vo7uik4vhuuf41bo0i5bzf65fyasc94q

@aauker aauker changed the title Add dataset addition to eligible CLI tools feat: Add dataset addition to eligible CLI tools Apr 5, 2022
@codecov-commenter
Copy link

codecov-commenter commented Apr 5, 2022

Codecov Report

Merging #293 (1e67a1e) into dev (2bedb17) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #293      +/-   ##
==========================================
+ Coverage   75.99%   76.02%   +0.02%     
==========================================
  Files          53       53              
  Lines        3566     3570       +4     
==========================================
+ Hits         2710     2714       +4     
  Misses        856      856              
Impacted Files Coverage Δ
...luna/pathology/cli/extract_kfunction_statistics.py 98.30% <100.00%> (ø)
...logy/luna/pathology/cli/extract_tile_statistics.py 96.87% <100.00%> (+0.10%) ⬆️
...thology/luna/pathology/cli/generate_tile_labels.py 95.16% <100.00%> (ø)
...pathology/luna/pathology/cli/generate_tile_mask.py 96.07% <100.00%> (ø)
...una-pathology/luna/pathology/cli/generate_tiles.py 97.87% <100.00%> (ø)
...-pathology/luna/pathology/cli/infer_tile_labels.py 94.73% <100.00%> (+0.09%) ⬆️
.../luna/pathology/cli/run_stardist_cell_detection.py 96.00% <100.00%> (+0.08%) ⬆️
...thology/luna/pathology/cli/run_tissue_detection.py 89.89% <100.00%> (ø)
pyluna-pathology/luna/pathology/cli/save_tiles.py 94.23% <100.00%> (+0.11%) ⬆️
pyluna-pathology/luna/pathology/common/schemas.py 90.90% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2bedb17...1e67a1e. Read the comment docs.

@aauker aauker requested a review from arfathpasha April 5, 2022 18:42
Copy link
Collaborator

@arfathpasha arfathpasha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Andy. I'd like to spend a few min with you to connect the dots in terms of the larger picture.

pyluna-common/luna/common/utils.py Show resolved Hide resolved
pyluna-common/luna/common/utils.py Show resolved Hide resolved
@aauker aauker merged commit 368e73c into dev Apr 11, 2022
@aauker aauker deleted the post-cli-tool branch April 11, 2022 15:45
aauker added a commit that referenced this pull request Apr 14, 2022
* chore: merge updated release version to dev 0.1.1 (#264)

Automatically generated by python-semantic-release

Co-authored-by: Andy Aukerman <ataukerman@gmail.com>
Co-authored-by: github-actions <action@github.com>

* feat: New CLIs from lung project (#261)

All new CLI tools for radiology/pathology work from the lung project.

Co-authored-by: Aukerman <aukermaa@pllimsksparky3.mskcc.org>

* fix: Save size as list, not a tuple (safe for yaml)

Co-authored-by: Aukerman <aukermaa@pllimsksparky3.mskcc.org>

* feat: radiology - generate scan table cli (#265)

* 1074 final tutorial qc (#267)

chore: streamline docker-compose build and execution

* chore: 1074 final tutorial qc (#267)

chore: streamline docker-compose build and execution

* perf: Further refinements to pathology CLI calls (#269)

* feat: Stardist and Spatial Statistics (#270)

* feat: pathology - adding model trainer, samplers for tile classifiers (#268)

* 0.1.1

Automatically generated by python-semantic-release

* feat: pathology - adding model trainer, samplers for tile classifiers

* Update setup.cfg

sklearn dep

* fix: added ability to pass modules via yaml, resolved issue with parameterized types

* fix: removed custom StratifiedGroupKfold, small changes to get_group_stratified_sampler

* feat: adding cli for model training with ray

* fix: adding cli, dependencies for model training cli

* fix: adding updated dependencies to docker

* fix: fixing sklearn import

* fix: fixing mistake in Dataset

* Fix: changes to type system/param parsing for model training

Co-authored-by: Andy Aukerman <ataukerman@gmail.com>
Co-authored-by: github-actions <action@github.com>

* test: pathology - extract feature vector example (#273)

* docs: pathology - extract feature vector example

* update docstring

* perf: save tile data in h5 (#272)

* feat: New slide ETL (#271)

* fix: Imports for slide_etl (#274)

* fix: Imports for slide_etl

* Some fixes to generate tile to match conventions

* Fixing coordinates, updating test

* refactor

* row.tile_size

* schema check

* Updating tests

* -o

* ds test

* ds test

* new otsu tiles

* 10x

* Remove address assert

* Better warning with scale factor business

* spacing cleanup

* close hf file

* minor fix

* add subset test

* typo fix

Co-authored-by: Aukerman <aukermaa@pllimsksparky3.mskcc.org>
Co-authored-by: doori <doori.rose@gmail.com>

* fix: Inference Model (#275)

Refactoring and improvement of inference CLI

* feat: adding CLI for generating tile label masks  (#276)

* fix: removing explicit support for jsons, treat them like dict objects instead

* fix: fixing dict param in test cases

* fix: changing radiomic params from json to dict

* feat: adding CLI for generating tile mask from labels

* feat: Update tile schema, add save_tiles (#280)

* refactor: dsa visualization and upload tool (#281)

* docs: contribution guide (#278)

* Update Makefile

* feat: adding CLI for shape feature extraction  (#282)

* test: adding test data for mask generation

* feat: adding cli for hif extraction

* test: adding test for feature extractor

* doc: update docstrings

* doc: update docstring with more details

* fix: Updated DSA annotation (#279)

More complete ETL

* fix: Fixes on slide ingestion (#283)

* Fixes on slide ingestion

* Update adapters.py

Co-authored-by: Aukerman <aukermaa@pllimsksparky3.mskcc.org>

* feat: precommit w/ black and flake8 (#277)

* chore: adding pre-commit w. black/flake8

* sty: testing precommit

* chore: adding pre-commit to requirements

* docs: adding black badge

* fix: fixing flake8 config

* sty: remove comments

* fix: nifti scan orientation (#284)

* feat: pathology - dsa visualize bmps (#285)

* fix: Fixes from end-to-end tests (#286)

* Fixes from end-to-end tests + goodbye to preprocess.py

Co-authored-by: Aukerman <aukermaa@pllimsksparky3.mskcc.org>

* test: pathology - unit tests (#287)

* test: circleci config (#288)

* chore: reorganize and clean up pathology (#289)

* fix: run stardist cli with mind docker image (#290)

* Init post CLI tool

* feat: PET Pre-processing (SUVs) (#292)

* PET Processing

* Fix indent level

* Only propagate keys from non-aliases

* Reference origin

Co-authored-by: Aukerman <aukermaa@pllimsksparky3.mskcc.org>

* feat: Add dataset addition to eligible CLI tools  (#293)

* Make other tools dataset enabled

* add dataset post as part of CLI runner

* Add dataset ID options

* Revert uncomment

* Move to parquet

* More parquet migration

* Further migrate to parquet

* Parquet migration tests pass

* Forgot about run detection

* Combine ifs

* Better loggin

* Set language

* Better detect url scheme

* Docs post_to_dataset

* documentation

* Last few changes

* Fix tests

* PNG parquet fix

Co-authored-by: Aukerman <aukermaa@pllimsksparky3.mskcc.org>

* docs: inference,viz,dsa notebook updates (#294)

* docs: update inference tutorial

* docs: update dsa tools, add examples, remove configs

* docs: dsa annotation nb

Co-authored-by: armaan <kohli@cooper.edu>

Co-authored-by: doori <doori.rose@gmail.com>
Co-authored-by: github-actions <action@github.com>
Co-authored-by: Aukerman <aukermaa@pllimsksparky3.mskcc.org>
Co-authored-by: arfathpasha <arfathpasha@gmail.com>
Co-authored-by: armaan <kohli@cooper.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants