Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #55

Merged
merged 52 commits into from
Jun 11, 2024
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
8ddc557
add back-reference to frame as attribute for instane
aaprasad May 15, 2024
7887d56
separate get_boxes_times into two functions and use `Instances` as input
aaprasad May 15, 2024
9bab7bc
use instances as input into model instead of frames
aaprasad May 15, 2024
82293d2
create io module, move config, visualize there. abstract `Frame` and …
aaprasad May 16, 2024
b4049b8
refactor `Frame` and `Instance` initialization to use `attrs` instead…
aaprasad May 16, 2024
2b0bc55
add doc strings, fix small bugs
aaprasad May 16, 2024
ef26012
Implement AssociationMatrix class for handling model output
aaprasad May 16, 2024
94a0e61
create io module, move config, visualize there. abstract `Frame` and …
aaprasad May 16, 2024
c4bc0fb
refactor `Frame` and `Instance` initialization to use `attrs` instead…
aaprasad May 16, 2024
42f8a8c
add doc strings, fix small bugs
aaprasad May 16, 2024
b5f39b4
Implement AssociationMatrix class for handling model output
aaprasad May 16, 2024
ccd523a
Merge remote-tracking branch 'origin/aadi/refactor-data-structures' i…
aaprasad May 16, 2024
0f535af
fix overwrites from merge
aaprasad May 17, 2024
56e038a
store model outputs in association matrix
aaprasad May 17, 2024
8a71d6d
add track object for storing tracklets
aaprasad May 17, 2024
766820b
add reduction function to association matrix
aaprasad May 20, 2024
c0ceac1
add doc_strings
aaprasad May 20, 2024
92095e0
fix tests, docstrings
aaprasad May 20, 2024
a6a6ace
add spatial/temporal embeddings as attribute to `Instance`
aaprasad May 20, 2024
56e0555
fix typo
aaprasad May 20, 2024
80400fc
add `from_slp` converters
aaprasad May 21, 2024
7ff22e7
fix docstrings
aaprasad May 21, 2024
89007a9
store embeddings in Instance object instead of returning
aaprasad May 21, 2024
cbf915a
only keep visualize in io
aaprasad May 21, 2024
b3c5661
remove mutable types from default arguments. Don't use kwargs unless …
aaprasad May 21, 2024
adb3715
handle edge case where ckpt_path is not in config
aaprasad May 21, 2024
557d4e9
expose appropriate modules in respective `__init__.py`
aaprasad May 21, 2024
1013847
separate `files` into `vid_files`, `label_files` for finer grained co…
aaprasad May 27, 2024
e030da7
fix edge case for get trainer when trainer params don't exist
aaprasad May 27, 2024
f86317c
fix `to_slp` bugs stemming from type change
aaprasad May 27, 2024
ed85a40
use tmp dir for tests
aaprasad May 27, 2024
cf88413
refactor inference script
aaprasad May 27, 2024
9f20e48
add logic to handle directory paths instead of only file paths
aaprasad May 27, 2024
7e9eb65
add `from_yaml` classmethod for direct config loading
aaprasad May 27, 2024
d00a9f4
add documentation for cli calls
aaprasad May 27, 2024
5b8ad52
fix small typo + docstrings
aaprasad May 27, 2024
9d4d24a
fix docstring typo
aaprasad May 27, 2024
e1f7c66
fix small edge case when initializing new tracks
aaprasad May 28, 2024
05f10b0
add documentation for configs and usage
aaprasad May 28, 2024
c1293ed
finish initial draft of tutorial
aaprasad May 29, 2024
f914ba9
add links, more examples
aaprasad May 30, 2024
7b014f5
add more examples, descriptions to READMEs. Add small config handling…
aaprasad May 31, 2024
766ec69
fix small typo, add examples to inference config
aaprasad May 31, 2024
8bbc21c
fix formatting issues add links to respective sections in example .yaml
aaprasad Jun 1, 2024
a0ee366
infer dataset mode from config
aaprasad Jun 1, 2024
7320578
Merge branch 'main' into aadi/documentation
aaprasad Jun 6, 2024
4a1608b
update readme to reference `dreem` instead of `biogtr`
aaprasad Jun 10, 2024
5df1452
fix docstring formatting
aaprasad Jun 10, 2024
b9a332d
setup [docs website](https://dreem.sleap.ai)
aaprasad Jun 10, 2024
5939591
reference `dreem` instead of `biogtr`
aaprasad Jun 10, 2024
5ce73d8
fix typehinting error
aaprasad Jun 10, 2024
df26cec
fix small bug
aaprasad Jun 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,6 @@ logs
.vscode
dreem/training/.hydra/*
dreem/training/models/*

# docs
site/
277 changes: 270 additions & 7 deletions README.md

Large diffs are not rendered by default.

Binary file added docs/assets/favicon.ico
Binary file not shown.
Binary file added docs/assets/sleap-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/configs/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# `Config` Parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that headings are surrounded by blank lines for proper Markdown formatting.

+ 
# `Config` Parser
+ 
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# `Config` Parser
# `Config` Parser
Tools
Markdownlint

1-1: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces


1-1: Expected: 1; Actual: 0; Below (MD022, blanks-around-headings)
Headings should be surrounded by blank lines

:::dreem.io.Config
3 changes: 3 additions & 0 deletions docs/configs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# DREEM Config API

We utilize `.yaml` based configs with `hydra` and `omegaconf` for config parsing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline at the end of the file to comply with Markdown best practices.

We utilize `.yaml` based configs with `hydra` and `omegaconf` for config parsing.
+
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
We utilize `.yaml` based configs with `hydra` and `omegaconf` for config parsing.
We utilize `.yaml` based configs with `hydra` and `omegaconf` for config parsing.
Tools
Markdownlint

3-3: null (MD047, single-trailing-newline)
Files should end with a single newline character

170 changes: 170 additions & 0 deletions docs/configs/inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Description of inference params

Here we describe the parameters used for inference. See [here](./inference.md#example-config) for an example inference config.

* `ckpt_path`: (`str`) the path to the saved model checkpoint. Can optionally provide a list of models and this will trigger batch inference where each pod gets a model to run inference with.
e.g:
```YAML
...
ckpt_path: "/path/to/model.ckpt"
...
```
* `out_dir`: (`str`) a directory path where to store outputs.
e.g:
```YAML
...
out_dir: "/path/to/results/dir"
...
```
## `tracker`

This section configures the tracker.

* `window_size`: (`int`) the size of the window used during sliding inference.
* `use_vis_feats`: (`bool`) Whether or not to use visual feature extractor.
* `overlap_thresh`: (`float`) the trajectory overlap threshold to be used for assignment.
* `mult_thresh`: (`bool`) Whether or not to use weight threshold.
* `decay_time`: (`float`) weight for `decay_time` postprocessing.
* `iou`: (`str` | `None`) Either `{None, '', "mult" or "max"}`. Whether to use multiplicative or max iou reweighting.
* `max_center_dist`: (`float`) distance threshold for filtering trajectory score matrix.
* `persistent_tracking`: (`bool`) whether to keep a buffer across chunks or not.
* `max_gap`: (`int`) the max number of frames a trajectory can be missing before termination.
* `max_tracks`: (`int`) the maximum number of tracks that can be created while tracking.
We force the tracker to assign instances to a track instead of creating a new track if `max_tracks `has been reached.

### Examples:
```YAML
...
tracker:
window_size: 8
overlap_thresh: 0.01
mult_thresh: false
decay_time: 0.9
iou: "mult"
max_center_dist: 0.1
...
```

## `dataset`
This section contains the params for initializing the datasets for training. Requires a `test_dataset` keys.

### [`BaseDataset`](../reference/dreem/datasets/base_dataset.md) args

* `padding`: An `int` representing the amount of padding to be added to each side of the bounding box size
* `crop_size`: (`int`|`tuple`) the size of the bounding box around which a crop will form.
* `chunk`: Whether or not to chunk videos into smaller clips to feed to model
Comment on lines +53 to +55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

The search results indicate that the chunk parameter is indeed implemented in various dataset modules, such as sleap_dataset.py, microscopy_dataset.py, cell_tracking_dataset.py, and base_dataset.py. The descriptions and usage of chunk across these files suggest that it is used to determine whether to divide the dataset into smaller batches or chunks.

  • Locations in the codebase:
    • dreem/datasets/sleap_dataset.py
    • dreem/datasets/microscopy_dataset.py
    • dreem/datasets/cell_tracking_dataset.py
    • dreem/datasets/base_dataset.py

The description in the documentation should be clear about this functionality, indicating that chunk determines whether the dataset should be divided into smaller batches for processing.

Analysis chain

Ensure that the description of chunk is clear about its functionality.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the implementation of chunking in the dataset.
# Test: Search for the chunk implementation in the dataset module.
rg --type python $'chunk' src/dreem/datasets

Length of output: 74



Script:

#!/bin/bash
# Description: Verify the implementation of chunking in the dataset.
# Test: Search for the chunk implementation in the dataset module.
rg --type py 'chunk' src/dreem/datasets

Length of output: 98



Script:

#!/bin/bash
# Description: Locate the correct directory path for the dataset module and verify the implementation of chunking.
# Step 1: Locate the dataset module directory.
dataset_dir=$(fd datasets | head -n 1)

# Step 2: Verify the implementation of chunking in the located dataset module directory.
if [ -n "$dataset_dir" ]; then
  rg 'chunk' "$dataset_dir"
else
  echo "Dataset module directory not found."
fi

Length of output: 6059

Tools
LanguageTool

[uncategorized] ~53-~53: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...asets/base_dataset.md) args * padding: An int representing the amount of pad...


[uncategorized] ~55-~55: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...around which a crop will form. * chunk: Whether or not to chunk videos into sma...


[style] ~55-~55: Consider shortening this phrase to just ‘whether’, unless you mean ‘regardless of whether’. (WHETHER)
Context: ...ound which a crop will form. * chunk: Whether or not to chunk videos into smaller clips to f...

* `clip_length`: the number of frames in each chunk
* `mode`: `train` or `val`. Determines whether this dataset is used for training or validation.
* `n_chunks`: Number of chunks to subsample from. Can either a fraction of the dataset (ie `(0,1.0]`) or number of chunks
* `seed`: set a seed for reproducibility
* `gt_list`: An optional path to .txt file containing ground truth for cell tracking challenge datasets.

#### `dir`:
This section allows you to pass a directory rather than paths to labels/videos individually

* `path`: The path to the dir where the data is stored (recommend absolute path)
* `labels_suffix`: (`str`) containing the file extension to search for labels files. e.g. `.slp`, `.csv`, or `.xml`.
* `vid_suffix`: (`str`) containing the file extension to search for video files e.g `.mp4`, `.avi` or `.tif`.
##### Examples:
```YAML
...
dataset:
...
{MODE}_dataset:
dir:
path: "/path/to/data/dir/mode"
labels_suffix: ".slp"
vid_suffix: ".mp4"
...
...
...
```
#### `augmentations`:

This subsection contains params for albumentations. See [`albumentations`](https://albumentations.ai) for available visual augmentations. Other available augmentations include `NodeDropout` and `InstanceDropout`. Keys must match augmentation class name exactly and contain subsections with parameters for the augmentation
Comment on lines +82 to +84
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of augmentations could be enhanced by specifying examples of NodeDropout and InstanceDropout.

Would you like me to add examples for NodeDropout and InstanceDropout in the documentation?

Tools
LanguageTool

[uncategorized] ~82-~82: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...... ... ... ``` #### augmentations: This subsection contains params for al...

Markdownlint

82-82: Expected: 1; Actual: 0; Above (MD022, blanks-around-headings)
Headings should be surrounded by blank lines


82-82: Punctuation: ':' (MD026, no-trailing-punctuation)
Trailing punctuation in heading


##### Example
```YAML
augmentations:
Rotate:
limit: 45
p: 0.3
...
MotionBlur:
blur_limit: [3,7]
p: 0.3
```
### [`SleapDataset`](../reference/dreem/datasets/sleap_dataset.md) Args:
* `slp_files`: (`str`) a list of .slp files storing tracking annotations
* `video_files`: (`str`) a list of paths to video files
* `anchors`: (`str` | `list` | `int`) One of:
* a string indicating a single node to center crops around
* a list of skeleton node names to be used as the center of crops
* an int indicating the number of anchors to randomly select
Comment on lines +101 to +103
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation issues in the unordered list to maintain consistency and readability.

-    * a string indicating a single node to center crops around
-    * a list of skeleton node names to be used as the center of crops
-    * an int indicating the number of anchors to randomly select
+  * a string indicating a single node to center crops around
+  * a list of skeleton node names to be used as the center of crops
+  * an int indicating the number of anchors to randomly select
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* a string indicating a single node to center crops around
* a list of skeleton node names to be used as the center of crops
* an int indicating the number of anchors to randomly select
* a string indicating a single node to center crops around
* a list of skeleton node names to be used as the center of crops
* an int indicating the number of anchors to randomly select
Tools
Markdownlint

101-101: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation


102-102: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation


103-103: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation

If unavailable then crop around the midpoint between all visible anchors.
* `handle_missing`: how to handle missing single nodes. one of [`"drop"`, `"ignore"`, `"centroid"`].
* if `drop` then we dont include instances which are missing the `anchor`.
* if `ignore` then we use a mask instead of a crop and nan centroids/bboxes.
* if `centroid` then we default to the pose centroid as the node to crop around.
### [`MicroscopyDataset`](../reference/dreem/datasets/microscopy_dataset.md)
* `videos`: (`list[str | list[str]]`) paths to raw microscopy videos
* `tracks`: (`list[str]`) paths to trackmate gt labels (either `.xml` or `.csv`)
* `source`: file format of gt labels based on label generator. Either `"trackmate"` or `"isbi"`.
### [`CellTrackingDataset`](../reference/dreem/datasets/cell_tracking_dataset.md)
* `raw_images`: (`list[list[str] | list[list[str]]]`) paths to raw microscopy images
* `gt_images`: (`list[list[str] | list[list[str]]]`) paths to gt label images
* `gt_list`: (`list[str]`) An optional path to .txt file containing gt ids stored in cell
tracking challenge format: `"track_id", "start_frame",
"end_frame", "parent_id"`
### `dataset` Examples
#### [`SleapDataset`](../reference/dreem/datasets/sleap_dataset.md)
```YAML
...
dataset:
test_dataset:
slp_files: ["/path/to/test/labels1.slp", "/path/to/test/labels2.slp", ..., "/path/to/test/labelsN.slp"]
video_files: ["/path/to/test/video1.mp4", "/path/to/test/video2.mp4", ..., "/path/to/test/videoN.mp4"]
padding: 5
crop_size: 128
chunk: True
clip_length: 32
anchors: ["node1", "node2", ..."node_n"]
handle_missing: "drop"
... # we don't include augmentations bc usually you shouldnt use augmentations during val/test
...
```
#### [`MicroscopyDataset`](../reference/dreem/datasets/microscopy_dataset.md)
```YAML
dataset:
test_dataset:
tracks: ["/path/to/test/labels1.csv", "/path/to/test/labels2.csv", ..., "/path/to/test/labelsN.csv"]
videos: ["/path/to/test/video1.tiff", "/path/to/test/video2.tiff", ..., "/path/to/test/videoN.tiff"]
source: "trackmate"
padding: 5
crop_size: 128
chunk: True
clip_length: 32
... # we don't include augmentations bc usually you shouldnt use augmentations during val/test
```

## dataloader
This section outlines the params needed for the dataloader. Should have a `train_dataloader` and optionally `val_dataloader`/`test_dataloader` keys.
> Below we list the args we found useful/necessary for the dataloaders. For more advanced users see [`torch.utils.data.Dataloader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) for more ways to initialize the dataloaders

* `shuffle`: (`bool`) Set to `True` to have the data reshuffled at every epoch (during training, this should always be `True` and during val/test usually `False`)
* `num_workers`: (`int`) How many subprocesses to use for data loading. 0 means that the data will be loaded in the main process.

### Example
```YAML
...
dataloader:
test_dataloader: # we leave out the `shuffle` field as default=`False` which is what we want
num_workers: 4
...
```

## Example Config

```YAML
--8<-- "dreem/inference/configs/base.yaml"
```
Loading
Loading