-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review/optimize use of get_overlapping_pairs_3D
in Cellpose task
#779
Comments
Another thing to consider here: A future improvement may be to check for overlaps across ROIs as well, as in theory, they could also overlap. Given that this is just for warnings, not sure how relevant this is though. Our masking should still work for this after all. |
For the record, making this change is quite straightforward, since we should only move the check from within the ROI loop to its end (EDIT: this diff is not working, but the idea remains valid): diff --git a/fractal_tasks_core/tasks/cellpose_segmentation.py b/fractal_tasks_core/tasks/cellpose_segmentation.py
index ea78661a..403630a1 100644
--- a/fractal_tasks_core/tasks/cellpose_segmentation.py
+++ b/fractal_tasks_core/tasks/cellpose_segmentation.py
@@ -541,15 +541,6 @@ def cellpose_segmentation(
bbox_dataframe_list.append(bbox_df)
- overlap_list = get_overlapping_pairs_3D(
- bbox_df, full_res_pxl_sizes_zyx
- )
- if len(overlap_list) > 0:
- logger.warning(
- f"ROI {indices} has "
- f"{len(overlap_list)} bounding-box pairs overlap"
- )
-
# Compute and store 0-th level to disk
da.array(new_label_img).to_zarr(
url=mask_zarr,
@@ -582,6 +573,15 @@ def cellpose_segmentation(
bbox_dataframe_list = [empty_bounding_box_table()]
# Concatenate all ROI dataframes
df_well = pd.concat(bbox_dataframe_list, axis=0, ignore_index=True)
+
+ overlap_list = get_overlapping_pairs_3D(
+ bbox_dataframe_list, full_res_pxl_sizes_zyx
+ )
+ if len(overlap_list) > 0:
+ logger.warning(
+ f"{len(overlap_list)} bounding-box pairs overlap"
+ )
+
df_well.index = df_well.index.astype(str)
# Extract labels and drop them from df_well
labels = pd.DataFrame(df_well["label"].astype(str)) However we should not do this without actual benchmark, or we risk introducing a very bad scaling. If there are
E.g. for N=138 organoids (ref #764) and 100 labels (random guess), we'd have:
Note that:
|
Agreed on not doing this now. More something to test if this gets benchmarked further :) I'd say we go with the fix now as you implemented it here. And consider most of this benchmarking in the context of a future Cellpose task refactor to use new OME-Zarr reader/writer strategy |
I just debugged this with some users who (mistakenly) had output ROI tables set for single cell segmentation where they have tens of thousands of segmented objects. In that case, this performance became super prohibitive, even in the slightly optimized version. I'd say we should remove this in the next Cellpose refactor. The warning is not very useful, but sometimes highly performance critcial. For the extreme example, for 1 user, their whole segmentation task ran in under a minute, but then it took almost 9 hours (!!) to do these overlap checks for 28 thousand label objects:
|
Branching from #764 (fixed in principle with #778)
After a fix, it'd be useful to have a rough benchmark of
get_overlapping_pairs_3D for
one of those many-labels cases (say 1000 labels for a giveni_ROI
). Since this function does nothing else than printing a warning, I would be very strict in how long a runtime we can accept.If it turns out that it is slow, we can easily improve it in a trivial way (simply stop after the first overlap is found) or also in more systematic ways (the current approach is close to being the definition of a slow Python function: a quadratic-scaling for loop, which calls a pure-Python function and then even appends to a list).
If the function is not slow, then there's no special need for a refactor.
The text was updated successfully, but these errors were encountered: