Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in Stirling Approximation Handling in Poisson NLL Loss Compared to TensorFlow and PyTorch #56733

Closed
akshatvishu opened this issue Aug 28, 2023 · 9 comments
Assignees

Comments

@akshatvishu
Copy link
Contributor

akshatvishu commented Aug 28, 2023

bug描述 Describe the Bug

Hello,

While using the Poisson NLL loss function in PaddlePaddle, I noticed a difference in the way the Stirling approximation is added to the loss, as compared to both TensorFlow and PyTorch.

Current PaddlePaddle Behavior:

For values of label that are less than or equal to 1, the Stirling approximation is being added to the loss. CODE

loss_out = loss_out + paddle.where(
            stirling_approx <= 1,
            paddle.zeros_like(stirling_approx),
            stirling_approx,
        )

Expected Behavior (Based on TensorFlow and PyTorch):

The Stirling approximation should be added to the loss only for label/target values greater than 1. It should not be added for values equal to 1.

Here's how TensorFlow and PyTorch handle it:

  • TensorFlow: Uses a condition to check targets > 1 before adding the Stirling approximation. CODE
  • PyTorch: Uses .masked_fill(target <= 1, 0) to ensure the Stirling term is only added for targets greater than 1. CODE

Here is a working example:

import tensorflow as tf
import torch
import paddle

# paddle:
input = paddle.to_tensor([2.], dtype=paddle.float32)
label = paddle.to_tensor([2.], dtype=paddle.float32)
epsilon = 1e-8
loss_out = input - label * paddle.log(input + epsilon)
print(f"loss_out:{loss_out}")
stirling_approx = (
            label * paddle.log(label)
            - label
            + 0.5 * paddle.log(2 * math.pi * label)
        )
print(f"string_approx:{stirling_approx}")
loss_out = loss_out + paddle.where(
            stirling_approx <= 1,
            paddle.zeros_like(stirling_approx),
            stirling_approx,
        )



print(f"loss after stirling:{loss_out}")


print("_______________TORCH_______________")
input = torch.tensor([2.], dtype=torch.float32)
target = torch.tensor([2.], dtype=torch.float32)
eps = epsilon
loss = input - target * torch.log(input + eps)
print(f"loss_out torch:{loss}")
stirling_term = (
            target * torch.log(target) - target + 0.5 * torch.log(2 * torch.pi * target)
        )
print(f" stirling calculation torch:{stirling_term}")

loss = loss + stirling_term.masked_fill(target <= 1, 0)
print(f"loss after striling torch:{loss}")

print("__________________TF___________")




# I have portend TF code to use TensorFlow API to showcase the internal working

def log_poisson_loss(log_input, targets, compute_full_loss=False, name=None):
    with tf.name_scope(name or "log_poisson_loss"):
        log_input = tf.convert_to_tensor(log_input, name="log_input")
        targets = tf.convert_to_tensor(targets, name="targets")

        # Ensure log_input and targets have the same shape
        if log_input.shape != targets.shape:
            raise ValueError(
                "`log_input` and `targets` must have the same shape, received "
                f"({log_input.shape} vs {targets.shape}).")
        # input - target * torch.log(input + eps)
        result = log_input - target * tf.math.log(log_input + epsilon)
        print(f"loss before stirling:{result}")
        if compute_full_loss:
            # Need to create constant tensors here so that their dtypes can be matched to that of the targets
            point_five = tf.constant(0.5, dtype=targets.dtype)
            two_pi = tf.constant(2 * math.pi, dtype=targets.dtype)

            stirling_approx = (targets * tf.math.log(targets)) - targets + (point_five * tf.math.log(two_pi * targets))
            print(f"stirling approx:{stirling_approx}")
            zeros = tf.zeros_like(targets, dtype=targets.dtype)
            ones = tf.ones_like(targets, dtype=targets.dtype)
            cond = tf.math.logical_and(tf.math.greater_equal(targets, zeros), tf.math.less_equal(targets, ones))
            result += tf.where(cond, zeros, stirling_approx)
            print(f"final result:{result}")
        return result

# Usage example
a = tf.constant([2.], dtype=tf.float32)
b = tf.constant([2.], dtype=tf.float32)
output = log_poisson_loss(a, b, compute_full_loss=True)
print(output)

"""
loss_out:Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.61370564])
string_approx:Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.65180647])
loss after stirling:Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.61370564])
_______________TORCH_______________
loss_out torch:tensor([0.6137])
 stirling calculation torch:tensor([0.6518])
loss after striling torch:tensor([1.2655])
__________________TF___________
loss before stirling:[0.61370564]
stirling approx:[0.6518065]
final result:[1.2655121]
tf.Tensor([1.2655121], shape=(1,), dtype=float32)
"""

As you can see that at PaddlePaddle we check if the values in stirling_approx are less than or equal to 1, which is different from checking the values of label or target as in PyTorch and TensorFlow.

I wanted to know that if this was done knowingly or do we want to make it behave similar to TensorFlow and PyTorch ; I can happily raise PR in-case we went with the latter option.

其他补充信息 Additional Supplementary Information

A thing to be take an note of is that the working example I provide is when the input is not log_input .i.e. log_input=False and since Tensorflow does not support the non log_input values i modified the code ; this line to be precise: result = log_input - target * tf.math.log(log_input + epsilon)

Thus, if we want a more simpler example to reproduce this error :

import torch
import paddle.nn.functional as F

x = torch.tensor([2.], dtype=torch.float32)
y = torch.tensor([2.], dtype=torch.float32)
z = torch.nn.functional.poisson_nll_loss(x, y,full=True, log_input=False, reduction="none")
print(z)
print(z.dtype)
print("_______________PADDLE________________")
a = paddle.to_tensor([2.], dtype=paddle.float64)
b = paddle.to_tensor([2.], dtype=paddle.float64)
c = F.poisson_nll_loss(a,b,log_input=False, full=True,reduction="none")
print(c)
print(c.dtype)

"""
tensor([1.2655])
torch.float32
_______________PADDLE________________
Tensor(shape=[1], dtype=float64, place=Place(cpu), stop_gradient=True,
       [0.61370563])
paddle.float64
"""

And this was the PR which added Poisson NLL loss function

@ZzSean
Copy link
Contributor

ZzSean commented Aug 29, 2023

I'm sure this is a bug of implementation, we will fix this as soon as possible.
Thanks for the issue!

@luotao1 luotao1 assigned luotao1 and zhwesky2010 and unassigned ZzSean Aug 29, 2023
@akshatvishu
Copy link
Contributor Author

Hey @zhwesky2010 @luotao1 Can I raise a PR to correct this bug?

@akshatvishu
Copy link
Contributor Author

akshatvishu commented Aug 29, 2023

Anyways, I have tried to setup developer environment for paddle( for linux ) but the instructions were not very clear on how to use docker container or run a test etc.. hence, I am posting the correction here itself:

# this is make the loss cond. on `label/taget` like torch and tensorflow
# the same will needed to be modified at test location for poisson nll loss 
loss_out = loss_out + paddle.where(
            label > 1,
            stirling_approx,
            paddle.zeros_like(stirling_approx),
        )

@Ligoml Ligoml added status/developing 开发中 and removed status/new-issue 新建 labels Aug 30, 2023
@luotao1
Copy link
Contributor

luotao1 commented Aug 30, 2023

@LyndonKong Could you help see this issue? @ZzSean already reproduces the problem.

@LyndonKong
Copy link
Contributor

Thanks for your issue @akshatvishu. It is a bug of implementation and I believe your fix should work. It would be nice if you could open a PR to correct this bug : )

@akshatvishu
Copy link
Contributor Author

akshatvishu commented Aug 30, 2023

Thanks for your issue @akshatvishu. It is a bug of implementation and I believe your fix should work. It would be nice if you could open a PR to correct this bug : )

Can you please clarify on how to build and test paddle by compiling it locally via using docker as shown here specifically for linux Debian using docker.

If I already cloned my fork of main repo (as shown under contributing guideline ) , do I need to clone the main repo again as mentioned in the first of building from source for linux via docker?
Specifically , I skipped these two steps as I already clone the fork at my local machine:

First select the path where you want to store PaddlePaddle, then use the following command to clone PaddlePaddles source code from github to a folder named Paddle in the local current directory:

git clone https://github.com/PaddlePaddle/Paddle.git

2. Go to the Paddle directory:

cd Paddle

If not then when I run docker container via executing the command -> docker exec -it paddle-test /bin/bash then it shows me this, I am confused on whether my setup is correctly installed or not. :

(base) vishu@aja:~/Paddle$ docker exec -it paddle-test /bin/bash
λ aja /home python3.9
Python 3.9.7 (default, Sep 10 2021, 00:03:59) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
>>> paddle.__version__
'0.0.0'
>>> exit()
λ aja /home ls
ccache-3.7.9/  ccache-3.7.9.tar.gz  cmake-3.16.0-Linux-x86_64/  patchelf_0.10-2_amd64.deb

@LyndonKong
Copy link
Contributor

The guideline actually requires you to mount your local developing directory to /paddle path in the docker, so I don't think you need to clone the main repo again.You can also find the details of how to build and test this operator from this document. Hope it can be helpful.

@luotao1
Copy link
Contributor

luotao1 commented Aug 31, 2023

While using the Poisson NLL loss function in PaddlePaddle, I noticed a difference in the way the Stirling approximation is added to the loss, as compared to both TensorFlow and PyTorch.

@akshatvishu How do you notice the difference, do you have a tool like https://github.com/PaddlePaddle/PaConvert ?

@akshatvishu
Copy link
Contributor Author

@akshatvishu How do you notice the difference, do you have a tool like https://github.com/PaddlePaddle/PaConvert ?

I was just testing the various frameworks for the Poisson nll Loss and for some reason while paddle and torch shared the same parameters , they were giving different results when I was passing Full=True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants