Discrepancy in Stirling Approximation Handling in Poisson NLL Loss Compared to TensorFlow and PyTorch #56733

akshatvishu · 2023-08-28T18:04:47Z

bug描述 Describe the Bug

Hello,

While using the Poisson NLL loss function in PaddlePaddle, I noticed a difference in the way the Stirling approximation is added to the loss, as compared to both TensorFlow and PyTorch.

Current PaddlePaddle Behavior:

For values of label that are less than or equal to 1, the Stirling approximation is being added to the loss. CODE

loss_out = loss_out + paddle.where(
            stirling_approx <= 1,
            paddle.zeros_like(stirling_approx),
            stirling_approx,
        )

Expected Behavior (Based on TensorFlow and PyTorch):

The Stirling approximation should be added to the loss only for label/target values greater than 1. It should not be added for values equal to 1.

Here's how TensorFlow and PyTorch handle it:

TensorFlow: Uses a condition to check targets > 1 before adding the Stirling approximation. CODE
PyTorch: Uses .masked_fill(target <= 1, 0) to ensure the Stirling term is only added for targets greater than 1. CODE

Here is a working example:

import tensorflow as tf
import torch
import paddle

# paddle:
input = paddle.to_tensor([2.], dtype=paddle.float32)
label = paddle.to_tensor([2.], dtype=paddle.float32)
epsilon = 1e-8
loss_out = input - label * paddle.log(input + epsilon)
print(f"loss_out:{loss_out}")
stirling_approx = (
            label * paddle.log(label)
            - label
            + 0.5 * paddle.log(2 * math.pi * label)
        )
print(f"string_approx:{stirling_approx}")
loss_out = loss_out + paddle.where(
            stirling_approx <= 1,
            paddle.zeros_like(stirling_approx),
            stirling_approx,
        )



print(f"loss after stirling:{loss_out}")


print("_______________TORCH_______________")
input = torch.tensor([2.], dtype=torch.float32)
target = torch.tensor([2.], dtype=torch.float32)
eps = epsilon
loss = input - target * torch.log(input + eps)
print(f"loss_out torch:{loss}")
stirling_term = (
            target * torch.log(target) - target + 0.5 * torch.log(2 * torch.pi * target)
        )
print(f" stirling calculation torch:{stirling_term}")

loss = loss + stirling_term.masked_fill(target <= 1, 0)
print(f"loss after striling torch:{loss}")

print("__________________TF___________")




# I have portend TF code to use TensorFlow API to showcase the internal working

def log_poisson_loss(log_input, targets, compute_full_loss=False, name=None):
    with tf.name_scope(name or "log_poisson_loss"):
        log_input = tf.convert_to_tensor(log_input, name="log_input")
        targets = tf.convert_to_tensor(targets, name="targets")

        # Ensure log_input and targets have the same shape
        if log_input.shape != targets.shape:
            raise ValueError(
                "`log_input` and `targets` must have the same shape, received "
                f"({log_input.shape} vs {targets.shape}).")
        # input - target * torch.log(input + eps)
        result = log_input - target * tf.math.log(log_input + epsilon)
        print(f"loss before stirling:{result}")
        if compute_full_loss:
            # Need to create constant tensors here so that their dtypes can be matched to that of the targets
            point_five = tf.constant(0.5, dtype=targets.dtype)
            two_pi = tf.constant(2 * math.pi, dtype=targets.dtype)

            stirling_approx = (targets * tf.math.log(targets)) - targets + (point_five * tf.math.log(two_pi * targets))
            print(f"stirling approx:{stirling_approx}")
            zeros = tf.zeros_like(targets, dtype=targets.dtype)
            ones = tf.ones_like(targets, dtype=targets.dtype)
            cond = tf.math.logical_and(tf.math.greater_equal(targets, zeros), tf.math.less_equal(targets, ones))
            result += tf.where(cond, zeros, stirling_approx)
            print(f"final result:{result}")
        return result

# Usage example
a = tf.constant([2.], dtype=tf.float32)
b = tf.constant([2.], dtype=tf.float32)
output = log_poisson_loss(a, b, compute_full_loss=True)
print(output)

"""
loss_out:Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.61370564])
string_approx:Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.65180647])
loss after stirling:Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.61370564])
_______________TORCH_______________
loss_out torch:tensor([0.6137])
 stirling calculation torch:tensor([0.6518])
loss after striling torch:tensor([1.2655])
__________________TF___________
loss before stirling:[0.61370564]
stirling approx:[0.6518065]
final result:[1.2655121]
tf.Tensor([1.2655121], shape=(1,), dtype=float32)
"""

As you can see that at PaddlePaddle we check if the values in stirling_approx are less than or equal to 1, which is different from checking the values of label or target as in PyTorch and TensorFlow.

I wanted to know that if this was done knowingly or do we want to make it behave similar to TensorFlow and PyTorch ; I can happily raise PR in-case we went with the latter option.

其他补充信息 Additional Supplementary Information

A thing to be take an note of is that the working example I provide is when the input is not log_input .i.e. log_input=False and since Tensorflow does not support the non log_input values i modified the code ; this line to be precise: result = log_input - target * tf.math.log(log_input + epsilon)

Thus, if we want a more simpler example to reproduce this error :

import torch
import paddle.nn.functional as F

x = torch.tensor([2.], dtype=torch.float32)
y = torch.tensor([2.], dtype=torch.float32)
z = torch.nn.functional.poisson_nll_loss(x, y,full=True, log_input=False, reduction="none")
print(z)
print(z.dtype)
print("_______________PADDLE________________")
a = paddle.to_tensor([2.], dtype=paddle.float64)
b = paddle.to_tensor([2.], dtype=paddle.float64)
c = F.poisson_nll_loss(a,b,log_input=False, full=True,reduction="none")
print(c)
print(c.dtype)

"""
tensor([1.2655])
torch.float32
_______________PADDLE________________
Tensor(shape=[1], dtype=float64, place=Place(cpu), stop_gradient=True,
       [0.61370563])
paddle.float64
"""

And this was the PR which added Poisson NLL loss function

The text was updated successfully, but these errors were encountered:

ZzSean · 2023-08-29T03:24:23Z

I'm sure this is a bug of implementation, we will fix this as soon as possible.
Thanks for the issue!

akshatvishu · 2023-08-29T07:42:08Z

Hey @zhwesky2010 @luotao1 Can I raise a PR to correct this bug?

akshatvishu · 2023-08-29T18:02:14Z

Anyways, I have tried to setup developer environment for paddle( for linux ) but the instructions were not very clear on how to use docker container or run a test etc.. hence, I am posting the correction here itself:

# this is make the loss cond. on `label/taget` like torch and tensorflow
# the same will needed to be modified at test location for poisson nll loss 
loss_out = loss_out + paddle.where(
            label > 1,
            stirling_approx,
            paddle.zeros_like(stirling_approx),
        )

luotao1 · 2023-08-30T06:29:04Z

@LyndonKong Could you help see this issue? @ZzSean already reproduces the problem.

LyndonKong · 2023-08-30T06:49:32Z

Thanks for your issue @akshatvishu. It is a bug of implementation and I believe your fix should work. It would be nice if you could open a PR to correct this bug : )

akshatvishu · 2023-08-30T13:05:11Z

Thanks for your issue @akshatvishu. It is a bug of implementation and I believe your fix should work. It would be nice if you could open a PR to correct this bug : )

Can you please clarify on how to build and test paddle by compiling it locally via using docker as shown here specifically for linux Debian using docker.

If I already cloned my fork of main repo (as shown under contributing guideline ) , do I need to clone the main repo again as mentioned in the first of building from source for linux via docker?
Specifically , I skipped these two steps as I already clone the fork at my local machine:

First select the path where you want to store PaddlePaddle, then use the following command to clone PaddlePaddle’s source code from github to a folder named Paddle in the local current directory:

git clone https://github.com/PaddlePaddle/Paddle.git

2. Go to the Paddle directory:

cd Paddle

If not then when I run docker container via executing the command -> docker exec -it paddle-test /bin/bash then it shows me this, I am confused on whether my setup is correctly installed or not. :

(base) vishu@aja:~/Paddle$ docker exec -it paddle-test /bin/bash
λ aja /home python3.9
Python 3.9.7 (default, Sep 10 2021, 00:03:59) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
>>> paddle.__version__
'0.0.0'
>>> exit()
λ aja /home ls
ccache-3.7.9/  ccache-3.7.9.tar.gz  cmake-3.16.0-Linux-x86_64/  patchelf_0.10-2_amd64.deb

LyndonKong · 2023-08-30T13:19:39Z

The guideline actually requires you to mount your local developing directory to /paddle path in the docker, so I don't think you need to clone the main repo again.You can also find the details of how to build and test this operator from this document. Hope it can be helpful.

luotao1 · 2023-08-31T03:47:54Z

While using the Poisson NLL loss function in PaddlePaddle, I noticed a difference in the way the Stirling approximation is added to the loss, as compared to both TensorFlow and PyTorch.

@akshatvishu How do you notice the difference, do you have a tool like https://github.com/PaddlePaddle/PaConvert ?

akshatvishu · 2023-08-31T10:05:28Z

@akshatvishu How do you notice the difference, do you have a tool like https://github.com/PaddlePaddle/PaConvert ?

I was just testing the various frameworks for the Poisson nll Loss and for some reason while paddle and torch shared the same parameters , they were giving different results when I was passing Full=True

akshatvishu added status/new-issue 新建 type/bug-report 报bug labels Aug 28, 2023

paddle-bot bot assigned ZzSean Aug 28, 2023

luotao1 assigned luotao1 and zhwesky2010 and unassigned ZzSean Aug 29, 2023

Ligoml added status/developing 开发中 and removed status/new-issue 新建 labels Aug 30, 2023

akshatvishu mentioned this issue Aug 30, 2023

[Fixes] Poisson NLL Loss to align with PyTorch and TensorFlow #56824

Closed

luotao1 added the API label Sep 5, 2023

akshatvishu mentioned this issue Sep 5, 2023

make loss cond. on label while using Stirling approximation #56992

Merged

luotao1 closed this as completed Sep 6, 2023

paddle-bot bot added status/close 已关闭 and removed status/developing 开发中 labels Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in Stirling Approximation Handling in Poisson NLL Loss Compared to TensorFlow and PyTorch #56733

Discrepancy in Stirling Approximation Handling in Poisson NLL Loss Compared to TensorFlow and PyTorch #56733

akshatvishu commented Aug 28, 2023 •

edited

Loading

ZzSean commented Aug 29, 2023

akshatvishu commented Aug 29, 2023

akshatvishu commented Aug 29, 2023 •

edited

Loading

luotao1 commented Aug 30, 2023

LyndonKong commented Aug 30, 2023

akshatvishu commented Aug 30, 2023 •

edited

Loading

LyndonKong commented Aug 30, 2023

luotao1 commented Aug 31, 2023

akshatvishu commented Aug 31, 2023

Discrepancy in Stirling Approximation Handling in Poisson NLL Loss Compared to TensorFlow and PyTorch #56733

Discrepancy in Stirling Approximation Handling in Poisson NLL Loss Compared to TensorFlow and PyTorch #56733

Comments

akshatvishu commented Aug 28, 2023 • edited Loading

bug描述 Describe the Bug

Current PaddlePaddle Behavior:

Expected Behavior (Based on TensorFlow and PyTorch):

其他补充信息 Additional Supplementary Information

ZzSean commented Aug 29, 2023

akshatvishu commented Aug 29, 2023

akshatvishu commented Aug 29, 2023 • edited Loading

luotao1 commented Aug 30, 2023

LyndonKong commented Aug 30, 2023

akshatvishu commented Aug 30, 2023 • edited Loading

LyndonKong commented Aug 30, 2023

luotao1 commented Aug 31, 2023

akshatvishu commented Aug 31, 2023

akshatvishu commented Aug 28, 2023 •

edited

Loading

akshatvishu commented Aug 29, 2023 •

edited

Loading

akshatvishu commented Aug 30, 2023 •

edited

Loading