Why function `sum` can change the dtype of ndl.Tensor #11

xnuohz · 2023-03-14T12:35:40Z

I met the following error when testing sgd

@data.setter
    def data(self, value):
        assert isinstance(value, Tensor)
>       assert value.dtype == self.dtype, "%s %s" % (
            value.dtype,
            self.dtype,
        )
E       AssertionError: float64 float32

Then I found 1 line in the function compute_gradient_of_variables will cause this error

node.grad = sum(node_to_output_grads_list[node])

I change it and things go right

node_grads = node_to_output_grads_list[node]
node.grad = node_grads[0] if len(node_grads) == 1 else sum(node_grads)

The following dtype in pdb is wired. Maybe I was wrong.

(Pdb) node_grads
[needle.Tensor(1.0)]
(Pdb) node_grads[0].dtype
dtype('float32')
(Pdb) sum(node_grads).dtype
dtype('float64')

The text was updated successfully, but these errors were encountered:

MartinLwx · 2023-07-12T07:36:05Z

I also came across this problem and I may have a clue why this happen.

First, Numpy will apply type promotion to decide the result type. The rules can be found here.

Second, the sum function in Python implicitly set start=0. So we are actually trying to compute node_grads[0] + 0 here. By using np.result_type function, we can reveal the result type

(Pdb) node_grads[0].dtype
dtype('float32')
(Pdb) np.result_type(node_grads[0] + 0)
dtype('float64')

I also found that Numpy's promotion rules sometimes make my scalar ops(i.e. AddScalar, DivScalar, etc) produce np.float64 types.

yofufufufu · 2024-04-07T18:54:29Z

I also found that Numpy's promotion rules sometimes make my scalar ops(i.e. AddScalar, DivScalar, etc) produce np.float64 types.

That's true. In softmaxloss computation, I use code snipet like:

# DivScalar
return batch_res.sum() / batch_num

can produce np.float64 type.
So how to produce np.float32 type in above cases? I cannot fully understand the Numpy's promotion rules...

LittleHeroZZZX · 2024-07-23T02:29:18Z

I also found that Numpy's promotion rules sometimes make my scalar ops(i.e. AddScalar, DivScalar, etc) produce np.float64 types.

That's true. In softmaxloss computation, I use code snipet like:
# DivScalar
return batch_res.sum() / batch_num
can produce np.float64 type. So how to produce np.float32 type in above cases? I cannot fully understand the Numpy's promotion rules...

The divScalar function can be implemented by explicitly calling np.true_divide, which supports the keyword dtype to specify the return type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why function `sum` can change the dtype of ndl.Tensor #11

Why function `sum` can change the dtype of ndl.Tensor #11

xnuohz commented Mar 14, 2023

MartinLwx commented Jul 12, 2023

yofufufufu commented Apr 7, 2024

LittleHeroZZZX commented Jul 23, 2024

Why function sum can change the dtype of ndl.Tensor #11

Why function sum can change the dtype of ndl.Tensor #11

Comments

xnuohz commented Mar 14, 2023

MartinLwx commented Jul 12, 2023

yofufufufu commented Apr 7, 2024

LittleHeroZZZX commented Jul 23, 2024

Why function `sum` can change the dtype of ndl.Tensor #11

Why function `sum` can change the dtype of ndl.Tensor #11