-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joint label fusion unexpected results #1292
Comments
Yes, you're likely to get better matched voxel patches. However, the increased number of patches increases the chances that those better matched patches are incorrect and, in the case of incorrect patches, you won't have the true label correspondence with the target patch. |
@ntustison Thanks Nick. The unexpected behavior is that in all cases with all patch sizes the best dice metric is always with a search radius of 0. Furthermore, as the search radius increases the drop in dice metric gets worse rather quickly in all cases. In our test cases, we are seeing that any search radius greater than 0 makes the results worse. Given that the default search radius is 2, we were expecting (approximately) a u-shaped result where the best search radius was in the range of 1-3 based on the particular data set. We will continue to investigate. We appreciate the time you have given to provide some feedback to us. Best regards. |
No problem. If the registration is really good, then an optimal search radius of 0 would make sense. However, without a knowledge of voxelwise registration quality, I think it would be difficult to make these sorts of inferences (e.g., optimal search radius). Rather, in trying to tease out behavior, I think I'd start with something more artificial (like the example below). For example, below is a simple joint label fusion in which the target is a basic circle. The atlases aren't perfectly aligned but rather noisy (on the boundaries) circles where the radius is larger than the target by two voxels (files are attached), thus necessitating a radius > 0.
Output:
atlas1.nii.gz |
@ntustison thank you for this example and a simplified way to look at this. This is kind of lengthy, but I have modified the images a bit and created 3 atlas/intensity inputs to run JLF using the same target image. These images are meant to represent possible registration error on different sections of the atlases, but overall the edge boundaries of the target are covered by all 3 atlas/intensity inputs. atlas/intensity images(label is just a duplicate of intensity): Running JLF using Pearson's Correlation with these, as search and patch radius increase we get increasing scores until a dice overlap of 1.
radius0: But, running the same code with MeanSquares as the scoring metric returns a drastic drop in dice as search radius increases.
Loading in the resultant labels at each radius is showing that an increase in search radius is somehow eroding the resultant label, which I am not expecting. I would still expect a similar result to the pearson score (as search radius gets larger overall score imnproves). I am still confused on what would cause this behavior with with only the change in the metric. Also, this is the same sort of behavior that I saw in my Pearson metric usage with brain segmentations. I have a example plot of some dice results Ive gotten further down. The green segmentations are the label results based on search radius change in JLF Related plots/resultsThe context of me using joint fusion is for segmenting minipig brains. I am registering 37 manually segmented atlas/images to each subject, and then running joint label fusion with these registrations. The registrations are good quality, but nowhere near voxelwise perfect ( just by looking at them), which is why the search radius of 0 being a best performer in all cases is concerning. Below is a plot of the average dice score of 10 subjects JLF results to ground truth manual segmentations for a patch/search radius set. 0 search radius always scores best (red line), as we initially stated. As i continue to add in more subjects to these average plots, not much changes. Patch size creates stability, which makes sense because of the amount of information you are using. If I have a small search radius and patch radius size, it makes sense that there could be bad results, especially in boundary regions in the images are not extremely defined. I did another experiment with smaller search radius with making the patch sizes tested broader, but still no patch size beat that threshold line in the first plot: An increase patch size gives me more confidence that a patch, and therefor the labels are being selected 'correctly'. But, with an searhc radius and these large patch sizes that should mean that the correlation scores calculated should be finding a better alternative to a 0 search radius result if the registrations aren't voxelwise perfect. Unless I am still misinterpreting something. Thanks again for your help |
Your simplified example is obviously not as simple as mine as the optimal search radius will be different over the boundary in the target image. Also, when you write "Running JLF using Pearson's Correlation with these, as search and patch radius increase we get increasing scores until a dice overlap of 1," I'm confused as I don't know how both of these variables (search radius and patch radius) are changing over your different runs so there's not much I can infer. Additionally, there's not much I can reason from your data plots as I don't have any idea of what your data looks like. Perhaps you have long, thin regions in which, even though you have imperfectly aligned data, won't get better by expanding your search radius beyond 0. I don't know. I would suggest you do a little more digging, perhaps looking at some of the variables in the relevant classes or looking at specific patches and verifying that the matching patches are not unexpected. Many of us have used this code for years and haven't noticed anything. It is definitely possible that there exists a legitimate bug (or multiple bugs) somewhere and an oversight on our part. However, I would need something more simplified and concrete that I can quickly run on my machine that definitely illustrates the problem before I can spend additional time exploring this. |
thanks again for your insight and time. I've been writing intermediate results to files for variables for patch selection and weighting to potentially understand this and will continue looking. |
@KKnoernschild there is a related JLF implementation in the ASHS tool: joint_fusion https://github.com/pyushkevich/ashs I've found it provides pretty similar results to If I can locate my old experiments (described here https://github.com/cookpa/antsVsGreedyJLF) I might be able to try varying the radius to see if I can replicate your results. Probably not soon though. I just thought I'd mention it in case it might be helpful for your debugging. |
Hi @KKnoernschild , does #1302 resolve the search radius issue you raised here? |
@cookpa I was the person that found that bug, and the MSQ JLF bug, and Dr. Johnson was kind enough to do the pull request formatting and submission of the changes for me. Unfortunately it did not solve the search radius issue we were seeing, but going through the code and finding that bug helped realize other things with my analysis for the joint fusion results. After the bug fix, the dice score still was showing decrease with search radius increase with the full number of atlases being used. After looking at the individual label scores instead of the average score of all of these labels, the decrease in dice was only found in one label. It was just such a large decrease that when averaging with the rest of the labels that were stable we saw the decrease in total score as search radius increased. The rest of the remaining label dice results stayed consistent as search radius increased, as you would expect if optimal matches were found at an early search radius. This leads me to believe its more of a image contrast/quality issue post registration in that specific one poorly performing label region, or there is slight variability in these small atlas label regions to begin with, causing issues when comparing back to the ground truth label. I do think we can close this issue for the time being. |
OK, thanks for the update and for your work on this issue! |
This is with a library that is the most up to date version of ants/itk built yesterday, and a consistent error with a more out of date build from last summer.
ANTs Version: v2.3.5.post79-gdb98de3
Compiled from source: Jan 22 2022 02:10:59
I have been testing joint label fusion search radius and patch radius parameters, and have been getting what I think are unexpected results when increasing the search radius parameter.
If I understand correctly from the joint fusion publications, patch radius is utilized for defining the target region around a specific voxel in a target image. This same patch radius is used to define patches of the same size in the atlas images you are using, but now with additional voxels added by the search radius around center point of the target image, to help find a better match.
I have been testing many patch and search radius parameters exhaustively for my dataset, and then calculating dice scores. I have been noticing that in all cases, dice score drastically drops as search radius increases with both Pearson Correlation and MeanSquare metrics. To my understanding this should not be the case. If the search radius is increased and patch size stays the same, that just gives a larger number of potential better matches than a search radius of 0 for all input atlases, and shouldn't that lead to better matches an overall better dice score?
I have noticed as I increase patch size, I get more stable results, which makes sense because its more data to use overall. I am just confused as to why I get much worse dice scores as search radius increases, especially with low search radius.
Thank you in advance for your help.
The text was updated successfully, but these errors were encountered: