Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CombinedLabels from info_label.txt when computing lesion distributions with -f #4584

Merged

Conversation

joshuacwnewton
Copy link
Member

@joshuacwnewton joshuacwnewton commented Jul 30, 2024

Description

This PR amends two methods responsible for computing the distribution of a lesion within a warped atlas (i.e. the -f option of sct_analyze_lesion):

  • _measure_eachLesion_distribution: Responsible for the lesion#Ndistribution pages in the sheet. The values are "percentage of the lesion that exists in a given ROI". Since the CombinedLabels are just combined ROIs, we can simply sum the percentages.
  • _measure_totLesion_distribution: Responsible for the ROI_occupied_by_lesion page in the sheet. The values are "percentage of the ROI taken up by any lesion". This is a bit trickier, since the ROI area is in the denominator (rather than the numerator). Thankfully, it turns out that this functionality was already implemented!
    • Before: Hardcoded "combined labels" that are identical to the entries in info_label.txt.
    • After: Read the "combined labels" directly from info_label.txt.

The end result is that the user can now modify info_label.txt to add their own custom labels, and they will get results for the pages generated by -f.

Testing this PR

Testing this is a bit tricky, because it requires lengthy preprocessing steps: A) lesion segmentation B) registering the image to the template and C) warping the template (including the atlas).

In our sct_testing_data dataset, we do have a "mini" version of the template, but it does not contain the atlas files. So, it's necessary to work with full-sized images (to access the existing atlas of the PAM50 template).

I was able to find a suitable subject in the SCI Zurich dataset for testing purposes. However, sct_label_vertebrae failed initially, so I need to manually identify the C2-C3 disc. In total, the full testing steps were:

sct_deepseg -task seg_sc_lesion_t2w_sci -i sub-zh01_ses-01_acq-sag_T2w.nii.gz 
sct_label_utils -i sub-zh01_ses-01_acq-sag_T2w.nii.gz -create-viewer 3
sct_label_vertebrae -i sub-zh01_ses-01_acq-sag_T2w.nii.gz -s sub-zh01_ses-01_acq-sag_T2w_sc_seg.nii.gz -initlabel labels.nii.gz -c t2
sct_label_utils -i sub-zh01_ses-01_acq-sag_T2w_sc_seg_labeled.nii.gz -vert-body 3,9
sct_register_to_template -i sub-zh01_ses-01_acq-sag_T2w.nii.gz -s sub-zh01_ses-01_acq-sag_T2w_sc_seg.nii.gz -l labels.nii.gz 
sct_warp_template -d sub-zh01_ses-01_acq-sag_T2w.nii.gz -w warp_template2anat.nii.gz
sct_analyze_lesion -m sub-zh01_ses-01_acq-sag_T2w_lesion_seg.nii.gz -s sub-zh01_ses-01_acq-sag_T2w_sc_seg.nii.gz -f label/

I would be happy to share the files privately on slack if you want to test (or you could try yourself on a different subject).

Note about other features

I also took the time to solve a small issue re: column ordering. All it took was a single call to sorted.

I wanted to solve #2677 while I was at it, since this was also requested by the user. However, I ran into an issue that I previously observed in #4394 (comment):

[!NOTE]
As a side observation, I noticed that adding the -f option makes sct_analyze_lesion very slow on my personal laptop, even for a single subject. There is no feedback in the output while the template-related processing is running, too. This isn't a great user experience, and I would love to look further into improving this after I debug the numerical issues.

By switching from per-level aggregation (usually ~10-13 vert levels) to per-slice aggregation (as many as 500 slices), the optimization issues get much worse. (As in, couldn't even finish processing worse.) So, to properly implement this feature, it would require some sort of optimization fix to improve the run time. At that point, I figured it might be better to get this initial PR going, and follow-up with a second fix afterwards.

Linked issues

Fixes #4567
Fixes #4568
Fixes #4587 (When #4588 is merged)
Fixes #2677 (When #4588 is merged)

Currently, we sum the totals at the same time we write the sheets to the file. But why? Doing it
this way requires some awkward checks for `#` in the sheet name.

If we were to instead sum the totals *in the sheet-specific method*, then we already know that we're
working in the correct context.

Additionally, this is necessary so that we can add additional columns *after* the total for the spinal
cord, white matter, gray matter, and other combined labels.
This is so that we can insert new columns after the total column, but before the total row is
computed.
We basically just sum columns corresponding to individual tracts that belong in each of the overall
"CombinedLabels".
Luckily for us, this functionality was already implemented! We get to simplify the code here;
instead of hardcoding the ranges for GM/WM/etc., we can re-use the existing label ranges as defined
by info_label.txt. This also allows the user to specify custom "CombinedLabels", too.

Oddly, the old code used contiguous ranges. But, since we have the full lists of combined labels, we can
just use `in` rather than `min < val < max`, in case the CombinedLabel does not span a contiguous range.
The order the files are parsed affects the order in which they're inserted into `self.atlas_roi_lst`
which in turn affects the order of the columns of the dataframe.
@joshuacwnewton joshuacwnewton added feature category: new functionality sct_analyze_lesion context: labels Jul 30, 2024
@joshuacwnewton joshuacwnewton added this to the 6.4 milestone Jul 30, 2024
@joshuacwnewton joshuacwnewton self-assigned this Jul 30, 2024
joshuacwnewton and others added 2 commits August 7, 2024 10:08
…ible) (#4588)

This was a tricky refactor, because `-perslice` goes against the default
behavior of `sct_analyze_lesion` (aggregating by vertebral levels), and
there are a lot of hardcoded `vert` variable names. It took quite a few
changes of (vert -> row) to avoid a complete rewrite.

That said, big-picture "aggregating by vertebral levels" basically boils
down to this:

- Iterate over vertebral levels.
- For each vertebral level, set all other voxels equal to 0, then
compute percentages.

The general structure of this code still works if you substitute
"vertebral level" for slice; You can instead iterate over slices and
still set all other voxels equal to 0. To do this, I used "rows" to
represent either vertebral levels *or* slices, and kept a separate
variable "row_name" to track whether we have vertlevels or slices. This
mostly gets the job done.

Note: While the previous performance improvements help, this is still a
slow process for any image with a large number of slices:

- `-perslice 0`, no optimization: ~300s (~5m)
- `-perslice 0`, with optimization: ~50s (~1m)
- `-perslice 1`, no optimization: Did not finish (est. 3600s, or 1hr)
- `-perslice 1`, with optimization: ~600s (10m)

So, perhaps there is more optimization work to be done.

It's a lot of calls to `np.sum`, `np.count_nonzero`, etc. I think this
is because we've got naive `for` loops over both slices (~500) and
tracts (~36). I'm sure there's some sort of smart numpy array operations
we could be doing, but it's hard to implement that without a full
rewrite.

It may also be worth just... skipping the slices that don't have lesion
values? With large images, this results in a large output spreadsheet
with many empty rows. But, if we exclude the empty rows, then perhaps we
can save on processing time as well.

Fixes #4567 
Fixes #2677

---------

Co-authored-by: Mathieu Guay-Paquet <mathieu.guay-paquet@polymtl.ca>
Copy link
Member

@mguaypaq mguaypaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, thanks for the new feature!

spinalcordtoolbox/scripts/sct_analyze_lesion.py Outdated Show resolved Hide resolved
spinalcordtoolbox/scripts/sct_analyze_lesion.py Outdated Show resolved Hide resolved
@mguaypaq mguaypaq merged commit 8911028 into master Aug 7, 2024
20 checks passed
@mguaypaq mguaypaq deleted the jn/4567-add-combinedlabels-features-to-analyze_lesion branch August 7, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment