-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CombinedLabels
from info_label.txt
when computing lesion distributions with -f
#4584
Merged
mguaypaq
merged 11 commits into
master
from
jn/4567-add-combinedlabels-features-to-analyze_lesion
Aug 7, 2024
Merged
Support CombinedLabels
from info_label.txt
when computing lesion distributions with -f
#4584
mguaypaq
merged 11 commits into
master
from
jn/4567-add-combinedlabels-features-to-analyze_lesion
Aug 7, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently, we sum the totals at the same time we write the sheets to the file. But why? Doing it this way requires some awkward checks for `#` in the sheet name. If we were to instead sum the totals *in the sheet-specific method*, then we already know that we're working in the correct context. Additionally, this is necessary so that we can add additional columns *after* the total for the spinal cord, white matter, gray matter, and other combined labels.
This is so that we can insert new columns after the total column, but before the total row is computed.
We basically just sum columns corresponding to individual tracts that belong in each of the overall "CombinedLabels".
Luckily for us, this functionality was already implemented! We get to simplify the code here; instead of hardcoding the ranges for GM/WM/etc., we can re-use the existing label ranges as defined by info_label.txt. This also allows the user to specify custom "CombinedLabels", too. Oddly, the old code used contiguous ranges. But, since we have the full lists of combined labels, we can just use `in` rather than `min < val < max`, in case the CombinedLabel does not span a contiguous range.
The order the files are parsed affects the order in which they're inserted into `self.atlas_roi_lst` which in turn affects the order of the columns of the dataframe.
joshuacwnewton
added
feature
category: new functionality
sct_analyze_lesion
context:
labels
Jul 30, 2024
…ible) (#4588) This was a tricky refactor, because `-perslice` goes against the default behavior of `sct_analyze_lesion` (aggregating by vertebral levels), and there are a lot of hardcoded `vert` variable names. It took quite a few changes of (vert -> row) to avoid a complete rewrite. That said, big-picture "aggregating by vertebral levels" basically boils down to this: - Iterate over vertebral levels. - For each vertebral level, set all other voxels equal to 0, then compute percentages. The general structure of this code still works if you substitute "vertebral level" for slice; You can instead iterate over slices and still set all other voxels equal to 0. To do this, I used "rows" to represent either vertebral levels *or* slices, and kept a separate variable "row_name" to track whether we have vertlevels or slices. This mostly gets the job done. Note: While the previous performance improvements help, this is still a slow process for any image with a large number of slices: - `-perslice 0`, no optimization: ~300s (~5m) - `-perslice 0`, with optimization: ~50s (~1m) - `-perslice 1`, no optimization: Did not finish (est. 3600s, or 1hr) - `-perslice 1`, with optimization: ~600s (10m) So, perhaps there is more optimization work to be done. It's a lot of calls to `np.sum`, `np.count_nonzero`, etc. I think this is because we've got naive `for` loops over both slices (~500) and tracts (~36). I'm sure there's some sort of smart numpy array operations we could be doing, but it's hard to implement that without a full rewrite. It may also be worth just... skipping the slices that don't have lesion values? With large images, this results in a large output spreadsheet with many empty rows. But, if we exclude the empty rows, then perhaps we can save on processing time as well. Fixes #4567 Fixes #2677 --------- Co-authored-by: Mathieu Guay-Paquet <mathieu.guay-paquet@polymtl.ca>
mguaypaq
approved these changes
Aug 7, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable, thanks for the new feature!
Co-authored-by: Mathieu Guay-Paquet <mathieu.guay-paquet@polymtl.ca>
mguaypaq
deleted the
jn/4567-add-combinedlabels-features-to-analyze_lesion
branch
August 7, 2024 20:08
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR amends two methods responsible for computing the distribution of a lesion within a warped atlas (i.e. the
-f
option ofsct_analyze_lesion
):_measure_eachLesion_distribution
: Responsible for thelesion#Ndistribution
pages in the sheet. The values are "percentage of the lesion that exists in a given ROI". Since the CombinedLabels are just combined ROIs, we can simply sum the percentages._measure_totLesion_distribution
: Responsible for theROI_occupied_by_lesion
page in the sheet. The values are "percentage of the ROI taken up by any lesion". This is a bit trickier, since the ROI area is in the denominator (rather than the numerator). Thankfully, it turns out that this functionality was already implemented!info_label.txt
.info_label.txt
.The end result is that the user can now modify
info_label.txt
to add their own custom labels, and they will get results for the pages generated by-f
.Testing this PR
Testing this is a bit tricky, because it requires lengthy preprocessing steps: A) lesion segmentation B) registering the image to the template and C) warping the template (including the atlas).
In our
sct_testing_data
dataset, we do have a "mini" version of the template, but it does not contain the atlas files. So, it's necessary to work with full-sized images (to access the existing atlas of the PAM50 template).I was able to find a suitable subject in the SCI Zurich dataset for testing purposes. However,
sct_label_vertebrae
failed initially, so I need to manually identify the C2-C3 disc. In total, the full testing steps were:I would be happy to share the files privately on slack if you want to test (or you could try yourself on a different subject).
Note about other features
I also took the time to solve a small issue re: column ordering. All it took was a single call to
sorted
.I wanted to solve #2677 while I was at it, since this was also requested by the user. However, I ran into an issue that I previously observed in #4394 (comment):
By switching from per-level aggregation (usually ~10-13 vert levels) to per-slice aggregation (as many as 500 slices), the optimization issues get much worse. (As in, couldn't even finish processing worse.) So, to properly implement this feature, it would require some sort of optimization fix to improve the run time. At that point, I figured it might be better to get this initial PR going, and follow-up with a second fix afterwards.
Linked issues
Fixes #4567
Fixes #4568
Fixes #4587 (When #4588 is merged)
Fixes #2677 (When #4588 is merged)