Support `CombinedLabels` from `info_label.txt` when computing lesion distributions with `-f` #4584

joshuacwnewton · 2024-07-30T21:52:24Z

Description

This PR amends two methods responsible for computing the distribution of a lesion within a warped atlas (i.e. the -f option of sct_analyze_lesion):

_measure_eachLesion_distribution: Responsible for the lesion#Ndistribution pages in the sheet. The values are "percentage of the lesion that exists in a given ROI". Since the CombinedLabels are just combined ROIs, we can simply sum the percentages.
_measure_totLesion_distribution: Responsible for the ROI_occupied_by_lesion page in the sheet. The values are "percentage of the ROI taken up by any lesion". This is a bit trickier, since the ROI area is in the denominator (rather than the numerator). Thankfully, it turns out that this functionality was already implemented!
- Before: Hardcoded "combined labels" that are identical to the entries in info_label.txt.
- After: Read the "combined labels" directly from info_label.txt.

The end result is that the user can now modify info_label.txt to add their own custom labels, and they will get results for the pages generated by -f.

Testing this PR

Testing this is a bit tricky, because it requires lengthy preprocessing steps: A) lesion segmentation B) registering the image to the template and C) warping the template (including the atlas).

In our sct_testing_data dataset, we do have a "mini" version of the template, but it does not contain the atlas files. So, it's necessary to work with full-sized images (to access the existing atlas of the PAM50 template).

I was able to find a suitable subject in the SCI Zurich dataset for testing purposes. However, sct_label_vertebrae failed initially, so I need to manually identify the C2-C3 disc. In total, the full testing steps were:

sct_deepseg -task seg_sc_lesion_t2w_sci -i sub-zh01_ses-01_acq-sag_T2w.nii.gz 
sct_label_utils -i sub-zh01_ses-01_acq-sag_T2w.nii.gz -create-viewer 3
sct_label_vertebrae -i sub-zh01_ses-01_acq-sag_T2w.nii.gz -s sub-zh01_ses-01_acq-sag_T2w_sc_seg.nii.gz -initlabel labels.nii.gz -c t2
sct_label_utils -i sub-zh01_ses-01_acq-sag_T2w_sc_seg_labeled.nii.gz -vert-body 3,9
sct_register_to_template -i sub-zh01_ses-01_acq-sag_T2w.nii.gz -s sub-zh01_ses-01_acq-sag_T2w_sc_seg.nii.gz -l labels.nii.gz 
sct_warp_template -d sub-zh01_ses-01_acq-sag_T2w.nii.gz -w warp_template2anat.nii.gz
sct_analyze_lesion -m sub-zh01_ses-01_acq-sag_T2w_lesion_seg.nii.gz -s sub-zh01_ses-01_acq-sag_T2w_sc_seg.nii.gz -f label/

I would be happy to share the files privately on slack if you want to test (or you could try yourself on a different subject).

Note about other features

I also took the time to solve a small issue re: column ordering. All it took was a single call to sorted.

I wanted to solve #2677 while I was at it, since this was also requested by the user. However, I ran into an issue that I previously observed in #4394 (comment):

[!NOTE]
As a side observation, I noticed that adding the -f option makes sct_analyze_lesion very slow on my personal laptop, even for a single subject. There is no feedback in the output while the template-related processing is running, too. This isn't a great user experience, and I would love to look further into improving this after I debug the numerical issues.

By switching from per-level aggregation (usually ~10-13 vert levels) to per-slice aggregation (as many as 500 slices), the optimization issues get much worse. (As in, couldn't even finish processing worse.) So, to properly implement this feature, it would require some sort of optimization fix to improve the run time. At that point, I figured it might be better to get this initial PR going, and follow-up with a second fix afterwards.

Linked issues

Fixes #4567
Fixes #4568
Fixes #4587 (When #4588 is merged)
Fixes #2677 (When #4588 is merged)

Currently, we sum the totals at the same time we write the sheets to the file. But why? Doing it this way requires some awkward checks for `#` in the sheet name. If we were to instead sum the totals *in the sheet-specific method*, then we already know that we're working in the correct context. Additionally, this is necessary so that we can add additional columns *after* the total for the spinal cord, white matter, gray matter, and other combined labels.

This is so that we can insert new columns after the total column, but before the total row is computed.

We basically just sum columns corresponding to individual tracts that belong in each of the overall "CombinedLabels".

Luckily for us, this functionality was already implemented! We get to simplify the code here; instead of hardcoding the ranges for GM/WM/etc., we can re-use the existing label ranges as defined by info_label.txt. This also allows the user to specify custom "CombinedLabels", too. Oddly, the old code used contiguous ranges. But, since we have the full lists of combined labels, we can just use `in` rather than `min < val < max`, in case the CombinedLabel does not span a contiguous range.

The order the files are parsed affects the order in which they're inserted into `self.atlas_roi_lst` which in turn affects the order of the columns of the dataframe.

…lyze_lesion

…ible) (#4588) This was a tricky refactor, because `-perslice` goes against the default behavior of `sct_analyze_lesion` (aggregating by vertebral levels), and there are a lot of hardcoded `vert` variable names. It took quite a few changes of (vert -> row) to avoid a complete rewrite. That said, big-picture "aggregating by vertebral levels" basically boils down to this: - Iterate over vertebral levels. - For each vertebral level, set all other voxels equal to 0, then compute percentages. The general structure of this code still works if you substitute "vertebral level" for slice; You can instead iterate over slices and still set all other voxels equal to 0. To do this, I used "rows" to represent either vertebral levels *or* slices, and kept a separate variable "row_name" to track whether we have vertlevels or slices. This mostly gets the job done. Note: While the previous performance improvements help, this is still a slow process for any image with a large number of slices: - `-perslice 0`, no optimization: ~300s (~5m) - `-perslice 0`, with optimization: ~50s (~1m) - `-perslice 1`, no optimization: Did not finish (est. 3600s, or 1hr) - `-perslice 1`, with optimization: ~600s (10m) So, perhaps there is more optimization work to be done. It's a lot of calls to `np.sum`, `np.count_nonzero`, etc. I think this is because we've got naive `for` loops over both slices (~500) and tracts (~36). I'm sure there's some sort of smart numpy array operations we could be doing, but it's hard to implement that without a full rewrite. It may also be worth just... skipping the slices that don't have lesion values? With large images, this results in a large output spreadsheet with many empty rows. But, if we exclude the empty rows, then perhaps we can save on processing time as well. Fixes #4567 Fixes #2677 --------- Co-authored-by: Mathieu Guay-Paquet <mathieu.guay-paquet@polymtl.ca>

…lyze_lesion

mguaypaq

Seems reasonable, thanks for the new feature!

spinalcordtoolbox/scripts/sct_analyze_lesion.py

Co-authored-by: Mathieu Guay-Paquet <mathieu.guay-paquet@polymtl.ca>

joshuacwnewton added 5 commits July 30, 2024 17:30

sct_analyze_lesion: Split total column and total row code

026be61

This is so that we can insert new columns after the total column, but before the total row is computed.

sct_analyze_lesion.py: Add CombinedLabels to lesion dist

8262b4b

We basically just sum columns corresponding to individual tracts that belong in each of the overall "CombinedLabels".

sct_analyze_lesion: Sort atlas files

8dff7f8

The order the files are parsed affects the order in which they're inserted into `self.atlas_roi_lst` which in turn affects the order of the columns of the dataframe.

joshuacwnewton added feature category: new functionality sct_analyze_lesion context: labels Jul 30, 2024

joshuacwnewton added this to the 6.4 milestone Jul 30, 2024

joshuacwnewton requested a review from mguaypaq July 30, 2024 21:52

joshuacwnewton self-assigned this Jul 30, 2024

sct_analyze_lesion: Don't fail if info_label.txt is missing

c7489f9

joshuacwnewton mentioned this pull request Jul 31, 2024

Add -perslice option (plus performance improvements to make it feasible) #4588

Merged

Merge branch 'master' into jn/4567-add-combinedlabels-features-to-ana…

86cd667

…lyze_lesion

joshuacwnewton mentioned this pull request Aug 2, 2024

Update CHANGES.md and version.txt for 6.4 release #4589

Merged

joshuacwnewton and others added 2 commits August 7, 2024 10:08

Merge branch 'master' into jn/4567-add-combinedlabels-features-to-ana…

124911c

…lyze_lesion

mguaypaq approved these changes Aug 7, 2024

View reviewed changes

spinalcordtoolbox/scripts/sct_analyze_lesion.py Outdated Show resolved Hide resolved

spinalcordtoolbox/scripts/sct_analyze_lesion.py Outdated Show resolved Hide resolved

spinalcordtoolbox/scripts/sct_analyze_lesion.py Show resolved Hide resolved

joshuacwnewton and others added 2 commits August 7, 2024 11:59

sct_analyze_lesion: Replace zfill with int->str formatting

27207bd

Co-authored-by: Mathieu Guay-Paquet <mathieu.guay-paquet@polymtl.ca>

sct_analyze_lesion: Exclude CombinedLabels that match total

89881dc

mguaypaq merged commit 8911028 into master Aug 7, 2024
20 checks passed

mguaypaq deleted the jn/4567-add-combinedlabels-features-to-analyze_lesion branch August 7, 2024 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `CombinedLabels` from `info_label.txt` when computing lesion distributions with `-f` #4584

Support `CombinedLabels` from `info_label.txt` when computing lesion distributions with `-f` #4584

joshuacwnewton commented Jul 30, 2024 •

edited

Loading

mguaypaq left a comment

Support CombinedLabels from info_label.txt when computing lesion distributions with -f #4584

Support CombinedLabels from info_label.txt when computing lesion distributions with -f #4584

Conversation

joshuacwnewton commented Jul 30, 2024 • edited Loading

Description

Testing this PR

Note about other features

Linked issues

mguaypaq left a comment

Choose a reason for hiding this comment

Support `CombinedLabels` from `info_label.txt` when computing lesion distributions with `-f` #4584

Support `CombinedLabels` from `info_label.txt` when computing lesion distributions with `-f` #4584

joshuacwnewton commented Jul 30, 2024 •

edited

Loading