Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface loss in keras-tensorflow #14

Closed
AmericaBG opened this issue Jul 17, 2019 · 54 comments
Closed

Surface loss in keras-tensorflow #14

AmericaBG opened this issue Jul 17, 2019 · 54 comments

Comments

@AmericaBG
Copy link

Hi!
Could you help me with the implementation of surface loss function in keras and tensorflow?
Thank you very much in advance!

My best wishes.

@HKervadec
Copy link
Member

Hey,

This should actually be quite easy. The most difficult part is actually generating the distance maps from the ground truth, and fortunately this is done in NumPy. I don't know much about Keras or Tensorflow, but I assume you can either do that asynchronously in some data loader, or you could do that offline once.

Assuming you have a class-encoded ground truth as an array of shape wh, where each pixel is the class number, the first step would be to move to some one-hot encoding. A function like this should do:

def numpy_class2one_hot(seg: np.ndarray, C: int) -> np.ndarray:
    assert set(np.unique(seg)).issubset(list(range(C)))

    w, h = seg.shape  # type: Tuple[int, int, int]

    res = np.stack([seg == c for c in range(C)], axis=0).astype(np.int32)
    assert res.shape == (C, w, h)
    assert np.all(res.sum(axis=0) == 1)

    return res

This will output a one-hot encoded image of shape cwh.

Then, you could plug the results directly into the one_hot2dist function defined in utils.py:

def one_hot2dist(seg: np.ndarray) -> np.ndarray:
    assert one_hot(torch.Tensor(seg), axis=0)
    C: int = len(seg)

    res = np.zeros_like(seg)
    for c in range(C):
        posmask = seg[c].astype(np.bool)

        if posmask.any():
            negmask = ~posmask
            res[c] = distance(negmask) * negmask - (distance(posmask) - 1) * posmask
    return res

This will compute one distance map per class, for an output shape cwh.

Then, you can load that in your favorite framework, and multiply it element-wise to your network softmax probabilities (which should be of shape cwh as well). I personally then average the results, but this is a matter of taste, summing them would be fine also (it merely affects the learning rate).

Hope that helps, let me know if you need any help.

@che85
Copy link

che85 commented Jul 31, 2019

@HKervadec FIrst of all, a very nice paper! Do you have any test data and expected results when running the loss function? I wanted to try your loss on my data (different project template) but I am not sure if it's actually spitting out the correct loss. Do you have any recommendations on how to test this?

Thank you,
Christian

@Eason270
Copy link

Could you provide the implement of surface_loss in Keras?

@HKervadec
Copy link
Member

@HKervadec FIrst of all, a very nice paper! Do you have any test data and expected results when running the loss function? I wanted to try your loss on my data (different project template) but I am not sure if it's actually spitting out the correct loss. Do you have any recommendations on how to test this?

Hey sorry for the slow reply. I do not have much test data/values, but I had this testing script that I made a while back that you could use as a basis

#!/usr/bin/env python3.6

import unittest

import torch
import numpy as np

import utils

class TestDistMap(unittest.TestCase):
    def test_closure(self):
        a = np.zeros((1, 256, 256))
        a[:, 50:60, :] = 1

        o = utils.class2one_hot(torch.Tensor(a).type(torch.float32), C=2).numpy()
        res = utils.one_hot2dist(o[0])
        self.assertEqual(res.shape, (2, 256, 256))

        neg = (res <= 0) * res

        self.assertEqual(neg.sum(), (o * res).sum())

    def test_full_coverage(self):
        a = np.zeros((1, 256, 256))
        a[:, 50:60, :] = 1

        o = utils.class2one_hot(torch.Tensor(a).type(torch.float32), C=2).numpy()
        res = utils.one_hot2dist(o[0])
        self.assertEqual(res.shape, (2, 256, 256))

        self.assertEqual((res[1] <= 0).sum(), a.sum())
        self.assertEqual((res[1] > 0).sum(), (1 - a).sum())

    def test_empty(self):
        a = np.zeros((1, 256, 256))

        o = utils.class2one_hot(torch.Tensor(a).type(torch.float32), C=2).numpy()
        res = utils.one_hot2dist(o[0])
        self.assertEqual(res.shape, (2, 256, 256))

        self.assertEqual(res[1].sum(), 0)
        self.assertEqual((res[0] <= 0).sum(), a.size)

    def test_max_dist(self):
        """
        The max dist for a box should be at the midle of the object, +-1
        """
        a = np.zeros((1, 256, 256))
        a[:, 1:254, 1:254] = 1

        o = utils.class2one_hot(torch.Tensor(a).type(torch.float32), C=2).numpy()
        res = utils.one_hot2dist(o[0])
        self.assertEqual(res.shape, (2, 256, 256))

        self.assertEqual(res[0].max(), 127)
        self.assertEqual(np.unravel_index(res[0].argmax(), (256, 256)), (127, 127))

        self.assertEqual(res[1].min(), -126)
        self.assertEqual(np.unravel_index(res[1].argmin(), (256, 256)), (127, 127))

    def test_border(self):
        """
        Make sure the border inside the object is 0 in the distance map
        """

        for l in range(3, 5):
            a = np.zeros((1, 25, 25))
            a[:, 3:3 + l, 3:3 + l] = 1

            o = utils.class2one_hot(torch.Tensor(a).type(torch.float32), C=2).numpy()
            res = utils.one_hot2dist(o[0])
            self.assertEqual(res.shape, (2, 25, 25))

            border = (res[1] == 0)

            self.assertEqual(border.sum(), 4 * (l - 1))


if __name__ == "__main__":
    unittest.main()

utils refers to utils.py in the main repository (or maybe a more recent version of it, but no big update)

Could you provide the implement of surface_loss in Keras?

I'm sorry, I do not have the time to do that, not the means to test it afterwards. But I think I gave enough informations in #14 (comment) to do it.

I would be willing to publish in this repository a Keras implementation that someone submit as a PR.

@warmestwind
Copy link

@Eason270 @SouthAmericaB
For Tensorflow or Keras, just use tf.py_func to wrap numpy_class2one_hot and one_hot2dist then use them as a TensorFlow OP.

@akamojo
Copy link
Contributor

akamojo commented Oct 25, 2019

Hi,
@HKervadec, thanks to the surface loss we managed to improve our results of i.a. hippocampus segmentation task :)
This is how we implemented surface loss in keras, together with @marcinkaczor:

from keras import backend as K
import numpy as np
import tensorflow as tf
from scipy.ndimage import distance_transform_edt as distance


def calc_dist_map(seg):
    res = np.zeros_like(seg)
    posmask = seg.astype(np.bool)

    if posmask.any():
        negmask = ~posmask
        res = distance(negmask) * negmask - (distance(posmask) - 1) * posmask

    return res

def calc_dist_map_batch(y_true):
    y_true_numpy = y_true.numpy()
    return np.array([calc_dist_map(y)
                     for y in y_true_numpy]).astype(np.float32)

def surface_loss(y_true, y_pred):
    y_true_dist_map = tf.py_function(func=calc_dist_map_batch,
                                     inp=[y_true],
                                     Tout=tf.float32)
    multipled = y_pred * y_true_dist_map
    return K.mean(multipled)

However, this is an implementation only for the problem of binary classification (foreground and background).

@HKervadec
Copy link
Member

HKervadec commented Oct 28, 2019

This looks great.

However (I am not familiar much with Keras and Tensorflow), the distance map isn't precomputed ; meaning it has to be done when computing the loss, which I assume slow things down.

Is there some scheduling / compiling process that optimize the order of the operation ? From what I recall Tensorflow build its graph then runs into a session ; a clever compiler would probably reorder those parts to avoid waiting for it.

@AmericaBG
Copy link
Author

Hi,
@HKervadec, thanks to the surface loss we managed to improve our results of i.a. hippocampus segmentation task :)
This is how we implemented surface loss in keras, together with @marcinkaczor:

from keras import backend as K
import numpy as np
import tensorflow as tf
from scipy.ndimage import distance_transform_edt as distance


def calc_dist_map(seg):
    res = np.zeros_like(seg)
    posmask = seg.astype(np.bool)

    if posmask.any():
        negmask = ~posmask
        res = distance(negmask) * negmask - (distance(posmask) - 1) * posmask

    return res

def calc_dist_map_batch(y_true):
    y_true_numpy = y_true.numpy()
    return np.array([calc_dist_map(y)
                     for y in y_true_numpy]).astype(np.float32)

def surface_loss(y_true, y_pred):
    y_true_dist_map = tf.py_function(func=calc_dist_map_batch,
                                     inp=[y_true],
                                     Tout=tf.float32)
    multipled = y_pred * y_true_dist_map
    return K.mean(multipled)

However, this is an implementation only for the problem of binary classification (foreground and background).

Hi! Thank you very much for your code!
I've tried to use it as loss function in a binary classification task but I haven't got good results (my accuracy index tends to 0 instead of 1.

I use your surface_loss function when model.compile(....loss=[surface_loss])

Have you any idea what could be happening?

Thank you very much in advance!

@marcinkaczor
Copy link

@HKervadec, we considered computing distance maps before network training, but we noticed that it might be hard to associate a label (image mask) with a proper distance map during training (dataset is shuffled). So that we are doing it on the fly.
@SouthAmericaB, using a surface loss alone doesn't give good results also with us. We combined a surface loss with a generalised dice loss as described in the paper. We used a callback to schedule alpha value for each epoch:

class AlphaScheduler(Callback):
  def init(self, alpha, update_fn):
    self.alpha = alpha
    self.update_fn = update_fn
  def on_epoch_end(self, epoch, logs=None):
    updated_alpha = self.update_fn(K.get_value(self.alpha))

alpha = K.variable(1, dtype='float32')

def update_alpha(value):
  return np.clip(value - 0.01, 0.01, 1)

history = model.fit_generator(
  ...,
  callbacks=AlphaScheduler(alpha, update_alpha)
)

@HKervadec
Copy link
Member

we noticed that it might be hard to associate a label (image mask) with a proper distance map during training (dataset is shuffled).

I am not quite sure I understand. How come it is harder to keep track of a tuple (input, dist_transform(label)) instead of (input. label) ? I mean, you already need to maintain the association for standard losses, so I don't get why this is different when you transform the label further. Is there something in the Keras/Tensorflow paradigm that makes it more difficult ?

In any case, thanks for all the contributions ! The scheduler bit is very useful as well ; I will integrate it in the pull request as well.

@AmericaBG AmericaBG reopened this Oct 29, 2019
@marcinkaczor
Copy link

marcinkaczor commented Oct 29, 2019

@HKervadec, thank you for sharing your improvement idea. We would have to take a closer look at whether such an implementation would be possible in our case.

@AmericaBG
Copy link
Author

@HKervadec, we considered computing distance maps before network training, but we noticed that it might be hard to associate a label (image mask) with a proper distance map during training (dataset is shuffled). So that we are doing it on the fly.
@SouthAmericaB, using a surface loss alone doesn't give good results also with us. We combined a surface loss with a generalised dice loss as described in the paper. We used a callback to schedule alpha value for each epoch:

class AlphaScheduler(Callback):
  def init(self, alpha, update_fn):
    self.alpha = alpha
    self.update_fn = update_fn
  def on_epoch_end(self, epoch, logs=None):
    updated_alpha = self.update_fn(K.get_value(self.alpha))

alpha = K.variable(1, dtype='float32')

def update_alpha(value):
  return np.clip(value - 0.01, 0.01, 1)

history = model.fit_generator(
  ...,
  callbacks=AlphaScheduler(alpha, update_alpha)
)

Thank you very much about your explanation!
But now I don't understand what the alpha paramether means

@HKervadec, we considered computing distance maps before network training, but we noticed that it might be hard to associate a label (image mask) with a proper distance map during training (dataset is shuffled). So that we are doing it on the fly.
@SouthAmericaB, using a surface loss alone doesn't give good results also with us. We combined a surface loss with a generalised dice loss as described in the paper. We used a callback to schedule alpha value for each epoch:

class AlphaScheduler(Callback):
  def init(self, alpha, update_fn):
    self.alpha = alpha
    self.update_fn = update_fn
  def on_epoch_end(self, epoch, logs=None):
    updated_alpha = self.update_fn(K.get_value(self.alpha))

alpha = K.variable(1, dtype='float32')

def update_alpha(value):
  return np.clip(value - 0.01, 0.01, 1)

history = model.fit_generator(
  ...,
  callbacks=AlphaScheduler(alpha, update_alpha)
)

Oh! That's true! Thank you very much!
One last thing, could you give me the generalised dice loss function in keras-tensorflow??

You will be doing me a great service!

@HKervadec
Copy link
Member

One last thing, could you give me the generalised dice loss function in keras-tensorflow??

You are not limited to GDL for the regional loss ; any other can work (cross-entropy and its derivative, dice loss and its derivatives). The best one will depend on your specific application, but you can already try with others.

@akamojo
Copy link
Contributor

akamojo commented Oct 29, 2019

we noticed that it might be hard to associate a label (image mask) with a proper distance map during training (dataset is shuffled).

I am not quite sure I understand. How come it is harder to keep track of a tuple (input, dist_transform(label)) instead of (input. label) ? I mean, you already need to maintain the association for standard losses, so I don't get why this is different when you transform the label further. Is there something in the Keras/Tensorflow paradigm that makes it more difficult ?

In any case, thanks for all the contributions ! The scheduler bit is very useful as well ; I will integrate it in the pull request as well.

There is no problem in keeping track of a tuple (input, dist_transform(label)) instead of (input. label). However, in our case we would have to keep track of a triple (input, label, dist_transform(label)), because original labels are necessary to calculate generalized dice loss which we later combine with surface loss. Keeping track of such triple would be a bit more complicated. Anyway, it is doable and we can try to implement it.
Nevertheless, in our particular task there is the other problem that we would have to tackle. We are using random augmentation when generating subsequent batches for training, which includes random shift, scale and rotate operations. Unfortunately, they change the distance map, so probably precomputing distance map while using augmentation will not be useful.

@HKervadec
Copy link
Member

Nevertheless, in our particular task there is the other problem that we would have to tackle. We are using random augmentation when generating subsequent batches for training, which includes random shift, scale and rotate operations.

Ah yes I see, the scaling will be mostly problematic. In pytorch usually these steps are done in the dataloader, before the final transforms, so that this is not really concern. But I understand how it requires to change the structure of the code a lot if this is not the case.

@AmericaBG
Copy link
Author

@HKervadec, we considered computing distance maps before network training, but we noticed that it might be hard to associate a label (image mask) with a proper distance map during training (dataset is shuffled). So that we are doing it on the fly.
@SouthAmericaB, using a surface loss alone doesn't give good results also with us. We combined a surface loss with a generalised dice loss as described in the paper. We used a callback to schedule alpha value for each epoch:

class AlphaScheduler(Callback):
  def init(self, alpha, update_fn):
    self.alpha = alpha
    self.update_fn = update_fn
  def on_epoch_end(self, epoch, logs=None):
    updated_alpha = self.update_fn(K.get_value(self.alpha))

alpha = K.variable(1, dtype='float32')

def update_alpha(value):
  return np.clip(value - 0.01, 0.01, 1)

history = model.fit_generator(
  ...,
  callbacks=AlphaScheduler(alpha, update_alpha)
)

Thank you very much!
Just one thing, this definition:

def update_alpha(value):
return np.clip(value - 0.01, 0.01, 1)

is correct? Or would be
def update_fn(value):

return np.clip(value - 0.01, 0.01, 1)

according to your code??

@akamojo
Copy link
Contributor

akamojo commented Oct 31, 2019

Just one thing, this definition:

def update_alpha(value):
return np.clip(value - 0.01, 0.01, 1)

is correct? Or would be
def update_fn(value):

return np.clip(value - 0.01, 0.01, 1)

according to your code??

@SouthAmericaB
It is correct, the name of function should be update_alpha and then, while creating an instance of AlphaScheduler we pass it as the update_fn parameter.

@rytisss
Copy link

rytisss commented Nov 17, 2019

Hi,
I'm currently using this loss function for road pavements cracks detection. @HKervadec thank you for a nice idea and paper and @akamojo and @marcinkaczor for implementation in Keras and training hints. I ran multiple training and checked results, however, i did not succeed in improving results that i have received with only using DICE loss (it reaches up to 0.7 dice score with only DICE loss). My training routines included:

  1. increasing 'alpha' at the end of every epoch end (by 0.01 / 0.02 / 0.03)
  2. increasing 'alpha' at the end of every epoch end (by 0.01 / 0.02 / 0.03) until certain threshold (alpha <= 0.5)
  3. starting to increase 'alpha' only from 10 or 15 epoch when model converges
  4. from 15 epoch, change to 0.5 surface loss and 0.5 dice loss

Also i tried to reduce learning rate every 10/15 epoch.
Although, from losses, i can see that dice loss is bigger than surface loss (~0.2-0.3) when model converges, and surface loss tends to be minimized to 0.01 or even less. (while after first epoch, surface loss is ~23)
I'm using U-Net with batchnorm and one small mod: 5x5 convolution kernel in first 2 encoder layers.
Moreover, my data is quite lousy.
I attached one example from the dataset.
Image:
original
Label:
label
Prediction:
prediction
Maybe you have an idea or hint for the training?

(Pictures are just an example, i'm not sure if it is best-performing weights)

@rytisss
Copy link

rytisss commented Nov 17, 2019

Also another question: is it possible for this loss to be ~23? Maybe my implementation is not correct?

@HKervadec
Copy link
Member

Hey,

Also another question: is it possible for this loss to be ~23? Maybe my implementation is not correct?

It's definitely possible for big images. This is even truer in your case, where most of the generated distance map will be positive.

You could try lowering the weight of the loss (i.e. dividing it by some factor), or normalize it as some other people reported #18 (comment)

I will close this issue, as the whole keras/tensorflow implementation seems solved. You can open another issue for your problem. Let me know

@tipani86
Copy link

tipani86 commented May 14, 2020

@HKervadec, we considered computing distance maps before network training, but we noticed that it might be hard to associate a label (image mask) with a proper distance map during training (dataset is shuffled). So that we are doing it on the fly.
@SouthAmericaB, using a surface loss alone doesn't give good results also with us. We combined a surface loss with a generalised dice loss as described in the paper. We used a callback to schedule alpha value for each epoch:

class AlphaScheduler(Callback):
  def init(self, alpha, update_fn):
    self.alpha = alpha
    self.update_fn = update_fn
  def on_epoch_end(self, epoch, logs=None):
    updated_alpha = self.update_fn(K.get_value(self.alpha))

alpha = K.variable(1, dtype='float32')

def update_alpha(value):
  return np.clip(value - 0.01, 0.01, 1)

history = model.fit_generator(
  ...,
  callbacks=AlphaScheduler(alpha, update_alpha)
)

Hi @marcinkaczor, I am trying to follow-up on what you guys did and build a dynamic alpha scheduler balancing two separate losses. I now got the alpha value updated alright, but I am not seeing the updated alpha value propagating into the combined loss function itself. I assume the loss function cannot be written as a regular python function as then only the beginning state is compiled into the model, and any subsequent alpha value changes do not get updated? How did you solve this issue? Thanks!

/cc @akamojo

@akamojo
Copy link
Contributor

akamojo commented May 19, 2020

Hi @marcinkaczor, I am trying to follow-up on what you guys did and build a dynamic alpha scheduler balancing two separate losses. I now got the alpha value updated alright, but I am not seeing the updated alpha value propagating into the combined loss function itself. I assume the loss function cannot be written as a regular python function as then only the beginning state is compiled into the model, and any subsequent alpha value changes do not get updated? How did you solve this issue? Thanks!

/cc @akamojo

Hi @tipani86, if you define alpha as keras backend variable and update it in the way we presented (in the code above) you should observe it decreasing (updating) after each epoch. Of course you have to pass the same keras backed variable alpha to your loss function, for example this way:

def gl_sl_wrapper(alpha):
    def gl_sl(y_true, y_pred):
        return alpha * generalized_dice_loss(y_true, y_pred) + (1 - alpha) * surface_loss(y_true, y_pred)
    return gl_sl

model.compile(loss=gl_sl_wrapper(alpha), optimizer="...", metrics=[...])  # alpha is the same variable we pass to AlphaScheduler

(we combine generalized_dice_loss and surface_loss)

@tipani86
Copy link

tipani86 commented May 20, 2020

Thanks @akamojo

Yes I see you used this kind of "loss function factory" structure. However, I am still struggling to see the alpha value updated after each epoch.

I don't know if it's a version issue (python 2.7, tf 1.15, keras 2.2.4) but the alpha update callback doesn't seem to return anything or otherwise change anything. Sure the updated_alpha variable gets changed but it's not assigned to any other variable, i.e. I don't see where the updated alpha gets sent back to update the actual alpha being used for calculating the loss.

If I observe the value of updated_alpha after each epoch, it always starts over from 1 - 0.01, so ends up being constant at 0.99.

EDIT: Nevermind, I think I figured it out. You are supposed to set that value afterwards with something like K.set_value(alpha, updated_alpha), just as I was suspecting...

@akamojo
Copy link
Contributor

akamojo commented Jul 23, 2020

EDIT: Nevermind, I think I figured it out. You are supposed to set that value afterwards with something like K.set_value(alpha, updated_alpha), just as I was suspecting...

Yes, exactly @tipani86! I guess, we did not paste this line of code, sorry about that.

@HKervadec
Copy link
Member

So I finally did the merge request (thanks a lot for that), and also added the scheduler that @marcinkaczor suggested.

Since the journal extension is now published and online, I'll keep updating this repo in the coming days, I've yet to add the new interesting stuff (3D computation of the boundary loss and distance maps).

@akamojo
Copy link
Contributor

akamojo commented Oct 27, 2020

@HKervadec it's great to hear that! Looking forward to seeing updates of your repo.

@ljc1231
Copy link

ljc1231 commented Dec 1, 2020

Thanks @akamojo

Yes I see you used this kind of "loss function factory" structure. However, I am still struggling to see the alpha value updated after each epoch.

I don't know if it's a version issue (python 2.7, tf 1.15, keras 2.2.4) but the alpha update callback doesn't seem to return anything or otherwise change anything. Sure the updated_alpha variable gets changed but it's not assigned to any other variable, i.e. I don't see where the updated alpha gets sent back to update the actual alpha being used for calculating the loss.

If I observe the value of updated_alpha after each epoch, it always starts over from 1 - 0.01, so ends up being constant at 0.99.

EDIT: Nevermind, I think I figured it out. You are supposed to set that value afterwards with something like K.set_value(alpha, updated_alpha), just as I was suspecting...

问下alpha怎么检测每次epoch之后变了没有啊?我把alpha直接放到metrics里面报错了,我也不太理解为啥alpha不写成alpha=1,而写成alpha = K.variable(1, dtype='float32'),有什么区别吗?

@tipani86
Copy link

tipani86 commented Dec 1, 2020

问下alpha怎么检测每次epoch之后变了没有啊?我把alpha直接放到metrics里面报错了,我也不太理解为啥alpha不写成alpha=1,而写成alpha = K.variable(1, dtype='float32'),有什么区别吗?

  1. You can print the updated alpha in the on_epoch_end function in the AlphaScheduler callback, so you can monitor if the value gets updated correctly. See code example below... Moreover, you can also record all losses in the metrics and track them like this: [loss A, loss B, combined_loss] and monitor on TensorBoard how they behave as epoch numbers go on.
  2. You set alpha as K.variable instead of a normal variable so keras and tensorflow have access to it during runtime.
class AlphaScheduler(Callback):
     def __init__(self, epochs):
         self.epochs = epochs
     def on_epoch_end(self, epoch, logs=None):
         updated_alpha = np.clip(1 - 1.0 * (epoch + 1)/self.epochs, 0.01, 1)
         K.set_value(alpha, updated_alpha)
         print("epoch: {}".format(epoch + 1))
         print("alpha: {}".format(K.get_value(alpha)))

For the actual model loss, you would use the result from loss factory gl_sl_wrapper as indicated by @akamojo in above reply.

@ljc1231
Copy link

ljc1231 commented Dec 1, 2020

问下alpha怎么检测每次epoch之后变了没有啊?我把alpha直接放到metrics里面报错了,我也不太理解为啥alpha不写成alpha=1,而写成alpha = K.variable(1, dtype='float32'),有什么区别吗?

  1. You can print the updated alpha in the on_epoch_end function in the AlphaScheduler callback, so you can monitor if the value gets updated correctly. See code example below... Moreover, you can also record all losses in the metrics and track them like this: [loss A, loss B, combined_loss] and monitor on TensorBoard how they behave as epoch numbers go on.
  2. You set alpha as K.variable instead of a normal variable so keras and tensorflow have access to it during runtime.
class AlphaScheduler(Callback):
     def __init__(self, epochs):
         self.epochs = epochs
     def on_epoch_end(self, epoch, logs=None):
         updated_alpha = np.clip(1 - 1.0 * (epoch + 1)/self.epochs, 0.01, 1)
         K.set_value(alpha, updated_alpha)
         print("epoch: {}".format(epoch + 1))
         print("alpha: {}".format(K.get_value(alpha)))

For the actual model loss, you would use the result from loss factory gl_sl_wrapper as indicated by @akamojo in above reply.

thx!

@ljc1231
Copy link

ljc1231 commented Dec 2, 2020

问下alpha怎么检测每次epoch之后变了没有啊?我把alpha直接放到metrics里面报错了,我也不太理解为啥alpha不写成alpha=1,而写成alpha = K.variable(1, dtype='float32'),有什么区别吗?

  1. You can print the updated alpha in the on_epoch_end function in the AlphaScheduler callback, so you can monitor if the value gets updated correctly. See code example below... Moreover, you can also record all losses in the metrics and track them like this: [loss A, loss B, combined_loss] and monitor on TensorBoard how they behave as epoch numbers go on.
  2. You set alpha as K.variable instead of a normal variable so keras and tensorflow have access to it during runtime.
class AlphaScheduler(Callback):
     def __init__(self, epochs):
         self.epochs = epochs
     def on_epoch_end(self, epoch, logs=None):
         updated_alpha = np.clip(1 - 1.0 * (epoch + 1)/self.epochs, 0.01, 1)
         K.set_value(alpha, updated_alpha)
         print("epoch: {}".format(epoch + 1))
         print("alpha: {}".format(K.get_value(alpha)))

For the actual model loss, you would use the result from loss factory gl_sl_wrapper as indicated by @akamojo in above reply.

问下这个调整学习率的检测指标还是val_loss吗
reduce_lr_loss = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, verbose=1, epsilon=1e-4, mode='min')
但是这个loss的定义不是动态变化的吗,这个val_loss会是一直减小的吗?但是我看这个monitor好像不支持直接检测dice

@tipani86
Copy link

tipani86 commented Dec 2, 2020

问下这个调整学习率的检测指标还是val_loss吗
reduce_lr_loss = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, verbose=1, epsilon=1e-4, mode='min')
但是这个loss的定义不是动态变化的吗,这个val_loss会是一直减小的吗?但是我看这个monitor好像不支持直接检测dice

Yeah this may not work well for you. Because alpha is a moving ratio between two different losses which may have very different values, the absolute combined loss value might actually go up between epochs.

How I approached this problem is just use a lr scheduling method which does not depend on how loss develops. For example, a continuously decreasing lr with some minimum value that it should not go below in the end.

@ljc1231
Copy link

ljc1231 commented Dec 2, 2020

问下这个调整学习率的检测指标还是val_loss吗
reduce_lr_loss = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, verbose=1, epsilon=1e-4, mode='min')
但是这个loss的定义不是动态变化的吗,这个val_loss会是一直减小的吗?但是我看这个monitor好像不支持直接检测dice

Yeah this may not work well for you. Because alpha is a moving ratio between two different losses which may have very different values, the absolute combined loss value might actually go up between epochs.

How I approached this problem is just use a lr scheduling method which does not depend on how loss develops. For example, a continuously decreasing lr with some minimum value that it should not go below in the end.

这个可以根据dice是否减小来调整学习率吗?您说的根据连续减小的学习率的调整策略可以上传一下您的代码吗,我不太会写代码,真的万分感谢!
还有个问题想问下您,就是我的train的CT图片是250张增强到2000张左右然后训练的,只用dice loss训练,test了50张CT dice0.45左右,但是换了一个loss ,val_loss 更低了,(metrics里面我也查看了dice的确也训练得更低了),但是test的dice只有0.35左右,这是因为过拟合的原因吗?有什么建议的改进方法吗?
最后,现在我在是这篇论文的loss,但是我出现了训练是,loss突然变大到1的情况,不知道是否属于正常呢?
image

@tipani86
Copy link

tipani86 commented Dec 2, 2020

这个可以根据dice是否减小来调整学习率吗?您说的根据连续减小的学习率的调整策略可以上传一下您的代码吗,我不太会写代码,真的万分感谢!

I am not very familiar with the reduce loss on plateau callback. The continuous lr reduction can be written for example as follows:

def scheduler(epoch):
    new_lr = lr - epoch * lr/epochs
    return new_lr

where epochs is the total number of epochs.

还有个问题想问下您,就是我的train的CT图片是250张增强到2000张左右然后训练的,只用dice loss训练,test了50张CT dice0.45左右,但是换了一个loss ,val_loss 更低了,(metrics里面我也查看了dice的确也训练得更低了),但是test的dice只有0.35左右,这是因为过拟合的原因吗?有什么建议的改进方法吗?

Sorry, I have not worked with medical image segmentation before, so not familiar with your problem setting.

最后,现在我在是这篇论文的loss,但是我出现了训练是,loss突然变大到1的情况,不知道是否属于正常呢?

I'm afraid each case is different and I cannot comment on how your data/code performs. However, you should be aware that in its original form, this surface loss does not reduce towards 0, but towards some arbitrary negative number which is determined by the ground truth mask image. I think this issue was raised already elsewhere.

So it's difficult to say how your combined loss will behave when mixed up from two individual losses that have a very different output range. What you can do is try to normalize the surface loss output so that it's always a positive number that reduces towards 0, for example.

P.S. could we use English for the benefit of everybody else participating in this discussion?

@ljc1231
Copy link

ljc1231 commented Dec 2, 2020

这个可以根据dice是否减小来调整学习率吗?您说的根据连续减小的学习率的调整策略可以上传一下您的代码吗,我不太会写代码,真的万分感谢!

I am not very familiar with the reduce loss on plateau callback. The continuous lr reduction can be written for example as follows:

def scheduler(epoch):
    new_lr = lr - epoch * lr/epochs
    return new_lr

where epochs is the total number of epochs.

还有个问题想问下您,就是我的train的CT图片是250张增强到2000张左右然后训练的,只用dice loss训练,test了50张CT dice0.45左右,但是换了一个loss ,val_loss 更低了,(metrics里面我也查看了dice的确也训练得更低了),但是test的dice只有0.35左右,这是因为过拟合的原因吗?有什么建议的改进方法吗?

Sorry, I have not worked with medical image segmentation before, so not familiar with your problem setting.

最后,现在我在是这篇论文的loss,但是我出现了训练是,loss突然变大到1的情况,不知道是否属于正常呢?

I'm afraid each case is different and I cannot comment on how your data/code performs. However, you should be aware that in its original form, this surface loss does not reduce towards 0, but towards some arbitrary negative number which is determined by the ground truth mask image. I think this issue was raised already elsewhere.

So it's difficult to say how your combined loss will behave when mixed up from two individual losses that have a very different output range. What you can do is try to normalize the surface loss output so that it's always a positive number that reduces towards 0, for example.

P.S. could we use English for the benefit of everybody else participating in this discussion?

thx!

@ljc1231
Copy link

ljc1231 commented Dec 10, 2020

这个可以根据dice是否减小来调整学习率吗?您说的根据连续减小的学习率的调整策略可以上传一下您的代码吗,我不太会写代码,真的万分感谢!

I am not very familiar with the reduce loss on plateau callback. The continuous lr reduction can be written for example as follows:

def scheduler(epoch):
    new_lr = lr - epoch * lr/epochs
    return new_lr

where epochs is the total number of epochs.

还有个问题想问下您,就是我的train的CT图片是250张增强到2000张左右然后训练的,只用dice loss训练,test了50张CT dice0.45左右,但是换了一个loss ,val_loss 更低了,(metrics里面我也查看了dice的确也训练得更低了),但是test的dice只有0.35左右,这是因为过拟合的原因吗?有什么建议的改进方法吗?

Sorry, I have not worked with medical image segmentation before, so not familiar with your problem setting.

最后,现在我在是这篇论文的loss,但是我出现了训练是,loss突然变大到1的情况,不知道是否属于正常呢?

I'm afraid each case is different and I cannot comment on how your data/code performs. However, you should be aware that in its original form, this surface loss does not reduce towards 0, but towards some arbitrary negative number which is determined by the ground truth mask image. I think this issue was raised already elsewhere.

So it's difficult to say how your combined loss will behave when mixed up from two individual losses that have a very different output range. What you can do is try to normalize the surface loss output so that it's always a positive number that reduces towards 0, for example.

P.S. could we use English for the benefit of everybody else participating in this discussion?

sorry to bother you again!
i use the boundary loss(keras version) and unet to train ISLES2018 dataset, but i found the dice didn't raise when epoch is 9, i found writer use unet which didn't use maxpooling and use conv with stride=2. Did you use the writer code to train ISLES data? and i found the original unet didn't use BN, so my loss didn't decline, when i add BN after conv, it works, but it didn't reach the expected effect.
image
And do you know how to use the writer's pyrorch code? i have already make the dataset in .npy format, whether i just need to use the main.py, and change the get_loader ,load my data in .npy , and run the main.py, is that correct?
image

@tipani86
Copy link

tipani86 commented Dec 10, 2020

sorry to bother you again!
i use the boundary loss(keras version) and unet to train ISLES2018 dataset, but i found the dice didn't raise when epoch is 9, i found writer use unet which didn't use maxpooling and use conv with stride=2. Did you use the writer code to train ISLES data? and i found the original unet didn't use BN, so my loss didn't decline, when i add BN after conv, it works, but it didn't reach the expected effect.
And do you know how to use the writer's pyrorch code? i have already make the dataset in .npy format, whether i just need to use the main.py, and change the get_loader ,load my data in .npy , and run the main.py, is that correct?

I am sorry, I don't have any experience from Unets or Pytorch. That's why I only came to this specific issue to checkout the Keras+Tensorflow implementation. I used DeeplabV3+ Keras version for my problem...

Edit: I don't know if it's a problem specific thing but training for 9 epochs in general seems a bit small. In our problem we generally did 50-100 or sometimes even more epochs. For the alpha transition between surface loss and the initial "carrier loss", having more epochs will also lead to a smoother gradient change.

@ljc1231
Copy link

ljc1231 commented Dec 10, 2020

sorry to bother you again!
i use the boundary loss(keras version) and unet to train ISLES2018 dataset, but i found the dice didn't raise when epoch is 9, i found writer use unet which didn't use maxpooling and use conv with stride=2. Did you use the writer code to train ISLES data? and i found the original unet didn't use BN, so my loss didn't decline, when i add BN after conv, it works, but it didn't reach the expected effect.
And do you know how to use the writer's pyrorch code? i have already make the dataset in .npy format, whether i just need to use the main.py, and change the get_loader ,load my data in .npy , and run the main.py, is that correct?

I am sorry, I don't have any experience from Unets or Pytorch. That's why I only came to this specific issue to checkout the Keras+Tensorflow implementation. I used DeeplabV3+ Keras version for my problem...

Edit: I don't know if it's a problem specific thing but training for 9 epochs in general seems a bit small. In our problem we generally did 50-100 or sometimes even more epochs. For the alpha transition between surface loss and the initial "carrier loss", having more epochs will also lead to a smoother gradient change.

thx!
can you use deeplabv3+ model with boundary loss to segmentate small object or lesion? i use unet , and it cann't recognize any lesion pixel in CT. i use the same dataset with the writer, only he use conv stride=2 replace maxpooling, i will try his model.
i reduce some conv layers in unet ,like they have 2 conv layers and maxpooling , i only use 1 conv and maxpooling , i found the train loss is samller, but the test effect is bad. like train and val dice is 0.8, but test dice is only 0.3.
Should i just use more reduce overfitting method or i should use ordinary unet model and add some module to rease their train dice?

@tipani86
Copy link

tipani86 commented Dec 10, 2020

thx!
can you use deeplabv3+ model with boundary loss to segmentate small object or lesion? i use unet , and it cann't recognize any lesion pixel in CT. i use the same dataset with the writer, only he use conv stride=2 replace maxpooling, i will try his model.
i reduce some conv layers in unet ,like they have 2 conv layers and maxpooling , i only use 1 conv and maxpooling , i found the train loss is samller, but the test effect is bad. like train and val dice is 0.8, but test dice is only 0.3.
Should i just use more reduce overfitting method or i should use ordinary unet model and add some module to rease their train dice?

As I said earlier, I am not at all familiar with medical imaging. In my use case the object is usually relatively large. When the size of the object becomes small, deeplabv3+ also has some trouble producing good segmentation results. In your case, it seems the model is overfitting, though. Maybe your training set is too small, have you added some image augmentation to boost your training set size?

@dydxdt
Copy link

dydxdt commented Mar 29, 2021

#14 (comment)
Thanks for your paper and code! I'd like to ask whether there is the implementation of “3D computation of the boundary loss and distance maps”. I'm using 3D unet,and the label shape is (batch_size, y, x, z, num_class), where the last dimension means the numbers of classes. Can I just use "for" loops to calculate boundary loss for each class and each slice(2D)? Or are there any other right implementations? @HKervadec

@dydxdt
Copy link

dydxdt commented Mar 29, 2021

#14 (comment)
Thanks for your paper and code! I'd like to ask whether there is the implementation of “3D computation of the boundary loss and distance maps”. I'm using 3D unet,and the label shape is (batch_size, y, x, z, num_class), where the last dimension means the numbers of classes. Can I just use "for" loops to calculate boundary loss for each class and each slice(2D)? Or are there any other right implementations? @HKervadec

oh I just saw the right issues for my problem. Thank you!

@HKervadec
Copy link
Member

Glad that you've found it.

To explicit things, the implementation of one_hot2dist now support both 2d and 3d tensors. (In my experiments, I pre-computed the 3D distance map when pre-processing the dataset, then sliced it into 2D. You might want to adapt it to your use-case.) Pay attention to the spatial resolution, as it has a big impact in 3D.

Do not hesitate to create another issue if you feel your use-case is a bit different, I'll be happy to help : )

@rose-jinyang
Copy link

rose-jinyang commented Apr 3, 2021

Hello
I've imported the Keras implementation by @akamojo as boundary loss for 2D binary segmentation.
But the loss value has a negative value.

image

How should I understand this?
Thanks

@HKervadec
Copy link
Member

Does the FAQ answer your question ?

Yes. As the distance map is signed (meaning that inside the object, the distance is negative), a perfect prediction will sum only negative distances, leading to a negative value. As we are in a minimization setting, this is not an issue.

@rose-jinyang
Copy link

Could u support full scripts for training in Keras?
That's because I can NOT find the generalized dice loss in Keras.
I need a Keras version for this PyTorch project.
Thanks.

@dydxdt
Copy link

dydxdt commented Apr 7, 2021

Does the FAQ answer your question ?

Yes. As the distance map is signed (meaning that inside the object, the distance is negative), a perfect prediction will sum only negative distances, leading to a negative value. As we are in a minimization setting, this is not an issue.

My result shows that at the beginning the boundary loss is positive(around 0.3) and after training for a long time, it turns out to be 0.0000 or -0.0000. I wonder whether it is right or not. Should it be a negative one? Thanks! (I use it in 3D UNet medical segmentation for very small objects.) @HKervadec

@rose-jinyang
Copy link

rose-jinyang commented Apr 8, 2021

Hi @HKervadec
How about applying this loss on a balanced segmentation such as portrait segmentation?

@HKervadec
Copy link
Member

Could u support full scripts for training in Keras?
That's because I can NOT find the generalized dice loss in Keras.

Unfortunately no, as I do not have the time for that, and do not know Keras either. Have you tried contacting the GDL authors ?
In any case, I assume that modifying a keras dice loss into a GDL wouldn't be too difficult.

Also, bear in mind that you can combine the boundary loss with any other loss, like cross-entropy, focal loss, or anyting really.

How about applying this loss on a balanced segmentation such as portrait segmentation?

It might help, or it might not. The only way to find out is to try.

@HKervadec
Copy link
Member

My result shows that at the beginning the boundary loss is positive(around 0.3) and after training for a long time, it turns out to be 0.0000 or -0.0000. I wonder whether it is right or not. Should it be a negative one? Thanks! (I use it in 3D UNet medical segmentation for very small objects.)

It sounds good, though you might want to increase your decimal point when monitoring the results.

Things that you can modify and try:

  • increase the weight of the boundary loss;
  • normalize the distance maps, some people have reported better results with it.

I guess the normalization of the distance map would go like this:

def norm_distmap(distmap: Tensor) -> Tensor:
    _m: float = torch.abs(distmap).max()
    return distmap / _m

@andre-ye
Copy link

I'm using DeepLab with the Keras implementation here of boundary loss. The loss steadily decreases, but when I stop and have the model predict, the predictions are all NaN. I was wondering if anyone else had this problem or things I might be doing wrong.

@ansariyusuf
Copy link

ansariyusuf commented Jun 16, 2021

Hi,
@HKervadec, thanks to the surface loss we managed to improve our results of i.a. hippocampus segmentation task :)
This is how we implemented surface loss in keras, together with @marcinkaczor:

from keras import backend as K
import numpy as np
import tensorflow as tf
from scipy.ndimage import distance_transform_edt as distance


def calc_dist_map(seg):
    res = np.zeros_like(seg)
    posmask = seg.astype(np.bool)

    if posmask.any():
        negmask = ~posmask
        res = distance(negmask) * negmask - (distance(posmask) - 1) * posmask

    return res

def calc_dist_map_batch(y_true):
    y_true_numpy = y_true.numpy()
    return np.array([calc_dist_map(y)
                     for y in y_true_numpy]).astype(np.float32)

def surface_loss(y_true, y_pred):
    y_true_dist_map = tf.py_function(func=calc_dist_map_batch,
                                     inp=[y_true],
                                     Tout=tf.float32)
    multipled = y_pred * y_true_dist_map
    return K.mean(multipled)

However, this is an implementation only for the problem of binary classification (foreground and background).

Hi, Thanks you for this. Do I have to compute the dist_map of my predictions before passing to surface_loss function?

@HKervadec
Copy link
Member

I'm using DeepLab with the Keras implementation here of boundary loss. The loss steadily decreases, but when I stop and have the model predict, the predictions are all NaN. I was wondering if anyone else had this problem or things I might be doing wrong.

As the boundary loss is merely an element-wise multiplication between two tensors, it cannot produce NaN, unless its inputs contains NaN themselves (so either in the softmax or the distance map).

My guess would be that some other part of your code is unsafe (in the sense that it can produce NaN, with log(x) or something like that), and that using the boundary loss only reveals it. Perhaps, with the boundary loss, it pushes some predicted probabilities to a perfect 0, which causes the issue in the rest of the code. But it is difficult to tell without more information.

@HKervadec
Copy link
Member

Hi, Thanks you for this. Do I have to compute the dist_map of my predictions before passing to surface_loss function?,

No, I think in this version the distance map is computed on the fly, so you need to send only the predicted probabilities to the loss.

@andre-ye
Copy link

I'm using DeepLab with the Keras implementation here of boundary loss. The loss steadily decreases, but when I stop and have the model predict, the predictions are all NaN. I was wondering if anyone else had this problem or things I might be doing wrong.

As the boundary loss is merely an element-wise multiplication between two tensors, it cannot produce NaN, unless its inputs contains NaN themselves (so either in the softmax or the distance map).

My guess would be that some other part of your code is unsafe (in the sense that it can produce NaN, with log(x) or something like that), and that using the boundary loss only reveals it. Perhaps, with the boundary loss, it pushes some predicted probabilities to a perfect 0, which causes the issue in the rest of the code. But it is difficult to tell without more information.

Thanks for the response! I later figured out it was something wrong with the model architecture.

@ansariyusuf
Copy link

gl_sl_wrapper(alpha)

Hi can you share the keras implementation of the generalized dice loss?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests