Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method to fill holes in segmentations #2678

Closed
Spenhouet opened this issue Aug 2, 2021 · 18 comments · Fixed by #2692
Closed

Method to fill holes in segmentations #2678

Spenhouet opened this issue Aug 2, 2021 · 18 comments · Fixed by #2692

Comments

@Spenhouet
Copy link
Contributor

Is your feature request related to a problem? Please describe.

We have 3D segmentation masks. Every segmented shape is not supposed to have holes within its borders.
Any wholes might be considered potential artifacts.
Especially for our training data we would like to close such holes.

For an example, the following matrix:

[
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 0, 1, 2, 0, 0 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 0, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 4, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
]

Should result in

[
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 0, 0 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 4, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
]

The only filled holes are the 1 and 3 in the middle slice.
The 2 shape is open to the side and the 4 is open to the back.
The 0 between the classes should stay untouched.

Describe the solution you'd like

MONAI is missing a "fill_holes" function / transformation.
scipy has an implementation which works for 3D data but only for binary images (scipy.ndimage.morphology.binary_fill_holes). This requires an iteration over all labels of the image which makes this already slow method unacceptably slow.

I wish that MONAI would implement such a function.

Describe alternatives you've considered

I opened this feature request on the scipy project: scipy/scipy#14504
And this stackoverflow question: https://stackoverflow.com/questions/68608749/performant-way-to-fill-holes-for-categorical-data

I implemented 7 versions using the existing scipy.ndimage.morphology.binary_fill_holes function (or its implementation) and numpy. Here the two best versions so far:

import numpy as np
from scipy.ndimage.morphology import binary_fill_holes

def fill_holes6(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    output = np.zeros_like(img)
    for i in applied_labels:
        output[binary_fill_holes(img == i)] = i

    return output

def fill_holes7(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    output = np.zeros(img.shape, dtype=int)
    for i in applied_labels:
        tmp = np.zeros(img.shape, dtype=bool)
        binary_dilation(tmp, structure=None, iterations=-1, mask=img != i, origin=0, border_value=1, output=tmp)
        output[np.logical_not(tmp)] = i
        
    return output

In MONAI this could be implemented something like this:

class FillHoles():

    @classmethod
    def _binary_dilation(cls, img: torch.Tensor) -> torch.Tensor:
        img_arr = img.detach().cpu().numpy()
        img_arr = binary_fill_holes(img_arr)
        return torch.as_tensor(img_arr, device=img.device).type(torch.uint8)

    def __init__(self, applied_labels: Union[Sequence[int]]) -> None:
        self.applied_labels = applied_labels

    def __call__(self, img: torch.Tensor,
                 meta_data: Dict[str, Any]) -> Tuple[torch.Tensor, Dict[str, Any]]:
       output = torch.zeros_like(img)
       for i in applied_labels:
          output[self._binary_fill_holes(img == i)] = i

       return output, meta_data

I measured the performance the following way (matching my real world data distribution):

import time
import pandas as pd

def measure(funs, t):
    res = []
    for _ in range(t):
        ra = np.random.randint(10, 40)
        sh = np.random.randint(200, 400, 3)
        img = np.random.randint(0, ra, sh)

        applied_labels = np.unique(img)[1:]

        fun_res = []
        for fun in funs:
            start = time.time()
            fun(img, applied_labels)
            end = time.time()
            fun_res.append(end - start)
        res.append(fun_res)
    return np.min(res, axis=0), np.max(res, axis=0), np.mean(res, axis=0), np.std(res, axis=0)

print(measure([fill_holes6, fill_holes7], t=10))

For my first implementations I got the following execution times (t=100):

fill_holes1 fill_holes2 fill_holes3
min 6.4s 6.9s 6.2s
max 83.7s 96.0s 80.4s
mean 32.9s 37.3s 31.6s
std 17.3s 20.1s 16.5

This is very slow.
The last implementation fill_holes7 is only 1.27 times faster than fill_holes3.
I really hope there is a more performant way of doing this.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 2, 2021

Hi @Spenhouet ,

Thanks for your sharing! I think that's an interesting idea.
A draft idea for the implementation: MONAI has a transform KeepLargestConnectedComponent which is based on skimage.measure.label to get connected areas, maybe it can also be used in your case to detect holes?
https://github.com/Project-MONAI/MONAI/blob/dev/monai/transforms/utils.py#L720

Thanks.

@Spenhouet
Copy link
Contributor Author

Hi @Nic-Ma,

my suggestion for the transformation class FillHoles was based the KeepLargestConnectedComponent class (with respect to structure and naming).
Interesting idea to also use that class for this task but I'm not sure how?

The issue with the morphology implementations in scipy is that they all only work for binary data.
Maybe it is also worth using kornia over scipy since it is much more pytorch native and tries to be differentiable.
One issue is that their dilation implementation does not have a mask parameter as the scipy implementation does (kornia.morphology.dilation).
That is also my biggest issue with the scipy implementation of binary_fill_holes since I don't understand the application of the mask parameter.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 2, 2021

Thanks for your analysis.
@wyli @rijobro @ericspod what do you guys think about this feature request?
If no other ideas or comments, @Spenhouet would you like to contribute a PR for it? If not, I can try to develop a draft PR for it based on the same approach of KeepLargestConnectedComponent.

Thanks.

@Spenhouet
Copy link
Contributor Author

Interesting idea to also use [KeepLargestConnectedComponents] for this task but I'm not sure how?

Okay, I see a way this could work:

  1. Find the largest connected component for every class + background
  2. Get the difference between these masks

But this might be even slower than my implementation suggested above.

@Nic-Ma Did you have another implementation with KeepLargestConnectedComponent in mind?

I can open a pull request with my suggested implementation but as I stated it is unacceptably slow. I did really hope that someone already done something like this or is way more knowledgeable than me and knows how to implemented this with much better performance.

@wyli
Copy link
Contributor

wyli commented Aug 2, 2021

looks like this approach is promising https://stackoverflow.com/a/66496234, would be nice to have a 3D version of it in MONAI with pytorch cuda extensions...

or if you are looking for offline preprocessing, the gpu-based apis might be helpful
https://docs.cupy.dev/en/v8.5.0/reference/ndimage.html#morphology
https://github.com/rapidsai/cucim/blob/e30d2cedd63c92d722c7a08d3a000c23c6795aef/python/cucim/src/cucim/skimage/morphology/misc.py#L146

@Spenhouet
Copy link
Contributor Author

Spenhouet commented Aug 2, 2021

@wyli Good catch. I take back my statement with respect to kornia. I did not work with it before and just now tried their dilation and erosion implementations and it seems they can not handle 3D data.

I did yet only work with the binary_dilation and binary_erosion functions of scipy which have the limitation of not working for multiple classes.
I was about to check out the grey_dilation and grey_erosion methods.

if you are looking for offline preprocessing

Yes, we are. Thanks for the hint. I will take a look.

I'm happy to provide a PR for this feature but I would like to implement something that is performant first.

@Spenhouet
Copy link
Contributor Author

Okay, I'm still stuck on the mask parameter of binary_dilation (scipy.ndimage.binary_dilation). I will need to try to understand how this mask is applied since for the binary_fill_holes implementation from scipy this is essential.

The grey_dilation does not have this parameter and I don't feel it is useable in a way I would like.

I will also take a look at the other implementations @wyli linked.
@Nic-Ma What was your thought process on using KeepLargestConnectedComponent for this task?

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 2, 2021

Hi @Spenhouet ,

In my simple mind, I think maybe we can:

  1. Filter out the 0 in image, refer to:
    https://github.com/Project-MONAI/MONAI/blob/dev/monai/transforms/post/array.py#L280
  2. Use measure.label() to get masks of all the 0 regions in the image:
    https://scikit-image.org/docs/dev/api/skimage.measure.html#label
  3. Then remove the "open holes" near borders.
  4. Fill the masks with the specified value.

Here is how we filter out the largest connected component region:
https://github.com/Project-MONAI/MONAI/blob/dev/monai/transforms/post/array.py#L277
I didn't test the perf of this idea, seems the CUDA methods from @wyli 's links may be faster?

Thanks.

@Spenhouet
Copy link
Contributor Author

I checked out the other linked implementations and they also only seem to work for binary data. One could implement the same as I did in my first post with a GPU accelerated version but I did hope that this could actually be implemented more efficient (not just faster via hardware resources).

I could not completely follow the binary_dilation code since it ends in some hard to follow C code but I believe I have a working theory how it works.

Given the following simple 2D binary image:

[ 1, 1, 1 ]
[ 1, 0, 1 ]
[ 1, 1, 1 ]

The mask is:

[ 0, 0, 0 ]
[ 0, 1, 0 ]
[ 0, 0, 0 ]

The image which will be dilated with a border is:

[ 1, 1, 1, 1, 1 ]
[ 1, 0, 0, 0, 1 ]
[ 1, 0, 0, 0, 1 ]
[ 1, 0, 0, 0, 1 ]
[ 1, 1, 1, 1, 1 ]

This now repeats a dilation step growing the border area. The mask is used to stop the growth. This stops when there is no more growth. Then it takes the dilation mask to define the final shape.

In the binary case I guess that is very similar to the idea of using the LargestConnectedComponent in that this will match the outside of a shape in the case where we only have small holes. With the LargestConnectedComponent I see a lot of edge cases not working.

What do you mean with step 3. and 4.? Maybe I'm not yet fully getting your idea.

The remove_small_holes implementation @wyli linked to might be similar to what you are proposing?

This also works on binary images.

  1. Create connected components for background pixel/voxel
  2. Remove all components smaller than a given parameter

This is still not without issues. For example the following would work with the edge based growing approach but not with the largest connected component approach:

[ 1, 1, 1, 1, 1 ]
[ 1, 0, 0, 0, 1 ]
[ 1, 0, 0, 0, 1 ]
[ 1, 0, 0, 0, 1 ]
[ 1, 1, 1, 1, 1 ]

The larges connected component here is actually the hole to fill.

So except for further understanding the existing / proposed solutions, I still have no clue if this could be implemented in a non binary way.

For now I might just open a PR for my original implementation.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 2, 2021

Hi @Spenhouet ,

Actually, I didn't get a chance to think clearly about how to remove the open holes when we got all the holes.. Maybe you can try the ideas @wyli linked? I will also search how to detect open holes later.

Thanks.

@ericspod
Copy link
Member

ericspod commented Aug 2, 2021

There's ways of implementing the morphology operators but the solution mentioned relies on unfold which is 2D only which is the same problem Kornia has with its operators. We would need 3D specific definitions of dilation, erosion, etc. to implement at a hole filling operation.

@Spenhouet
Copy link
Contributor Author

I did already look into the methods posted by @wyli but sadly they do not solve the problem.
Some of these do not fulfill the problem definition.

We want to fill all holes which are completely outlined by a single class which is not the background.

The connected component solution as provided by skimage.morphology.area_closing does not take open shapes (holes which are not completely enclosed by a single class) into account (@Nic-Ma maybe this is what you refer to as open holes?). This solution also does not work for arbitrary hole sizes.

We could theoretically find all connected components from the background class and then determine all neighbors per component so filter for all connected components which only have a single class as neighbor excluding background.
But this again does not sound performant at all.

@ericspod The binary_dilation and binary_erosion implementation of scipy does work for ndimages. The only issue is that we need to perform this per class (binary) and thereby multiply the execution time with the number of classes. This is currently the only implementation which actually works correct. It is just slow.

@Spenhouet
Copy link
Contributor Author

@Nic-Ma I now implemented a correct version with connected component labeling:

def fill_holes8(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    connectivity = 1
    footprint = generate_binary_structure(img.ndim, connectivity)
    background_mask = img == 0
    components, num_components = label(background_mask, structure=footprint)
    filled_holes = np.zeros_like(img)
    for component_label in range(1, num_components+1):
        component_mask = components == component_label
        # Pad with -1 to detect edge voxels
        component_neighborhood = np.pad(img, 1, constant_values=-1)[binary_dilation(np.pad(component_mask, 1), structure=footprint)]

        neighbor_labels = np.unique(component_neighborhood)
        if len(neighbor_labels) == 2 and -1 not in neighbor_labels:
            neighbor_label = neighbor_labels[1]
            filled_holes[component_mask] = neighbor_label

    return img + filled_holes

At least for my random sampled data this is extremely slower but the performance here depends on the number of background components. In real world data this most likely is way less.

This code can probably be optimized much further but I doubt that it will get more performant than fill_holes7.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 2, 2021

Hi @Spenhouet ,

Sounds great, could you please help contribute a PR? I think it's good to have it as initial version, and may optimize perf later.

Thanks.

@Spenhouet
Copy link
Contributor Author

@Nic-Ma Which solution would you prefer for a pull request?

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Aug 3, 2021

@Spenhouet , I think your solution8 looks good.
@wyli @ericspod , do you have other comments?

Thanks.

Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 3, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@outlook.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 3, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@outlook.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 3, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 3, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 3, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 3, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 3, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
@ericspod
Copy link
Member

ericspod commented Aug 3, 2021

I would choose whichever correct implementation is fastest for a PR, that would be helpful. This would be quite useful for other operations especially post-processing, but if the speed is poor we may need our own implementation of a 3D unfold or max/min filters.

Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 4, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
@Spenhouet
Copy link
Contributor Author

I would choose whichever correct implementation is fastest for a PR

Instead of benchmarking with random data, I did load one of our brain segmentations (256 x 256 x 256) and did run both methods (fill_holes7 and fill_holes8) with it.

fill_holes7 fill_holes8
69s 276s

So even for real world data, growing the background from the edge for every label is the "faster" solution (less slow).
But honestly, waiting 70s for a single image....

There is also the option to go into one more dimension by one-hot encoding but this will blow up memory usage too much. That is why I did not consider that solution.
It might be an option to do this in a "paging" type of way by creating partial one-hot encoded matrices, remapping labels, .... but this highly depends on the real data and then becomes complicated quickly (memory management).

I did now implement the PR with the fill_holes8 solution since @Nic-Ma preferred it. It probably makes sense to use the fill_holes7 solution instead.
But I agree that this needs a faster implementation...

Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 4, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 5, 2021
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 5, 2021
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 5, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 5, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 5, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 5, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 6, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 6, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 6, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 6, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 6, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Spenhouet pushed a commit to Spenhouet/MONAI that referenced this issue Aug 6, 2021
Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
@wyli wyli closed this as completed in #2692 Aug 7, 2021
wyli pushed a commit that referenced this issue Aug 7, 2021
* Add transform to fill holes and to filter (#2678)

Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>

* Change name of label filter class (#2678)

Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>

* Change fill holes to growing logic (#2678)

Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>

* Fix missing entry in min_tests (#2678)

Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>

* Fix review comments (#2678)

Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>

* Remove batch dim and add one-hot handling (#2678)

Signed-off-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>

* [MONAI] python code formatting

Signed-off-by: monai-bot <monai.miccai2019@gmail.com>

Co-authored-by: Sebastian Penhouet <sebastian.penhouet@airamed.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants