add utility functions to NX #616

tiagodavi · 2022-01-28T15:29:21Z

This PR adds some utility functions to NX.

josevalim · 2022-01-28T15:51:09Z

Thank you @tiagodavi! @polvalente / @seanmor5, do you think this belongs in Nx? Or some statistics oriented package?

polvalente · 2022-01-28T16:18:25Z

Thank you @tiagodavi! @polvalente / @seanmor5, do you think this belongs in Nx? Or some statistics oriented package?

@josevalim I think std/var/avg are common and basic enough to belong in Nx. For instance, random variables are often normalized through the standard deviation, and those are Nx vectors.

However, perhaps we could think about moving them to another module.

Speaking of avg (or mean), do we have it yet? Could be a good companion function here.

josevalim · 2022-01-28T16:31:30Z

We do have mean. @tiagodavi, can you please rename those functions to they have the full name instead of abbreviations? We typically avoid abbreviations in Nx.

nx/lib/nx.ex

josevalim · 2022-01-28T16:32:58Z

nx/lib/nx.ex

+  1.25
+  >
+  """
+  @doc type: :element


I think the type is aggregation or similar. Pls check what Nx.mean does :)

I'll take a look. thank you

jonatanklosko · 2022-01-28T16:41:17Z

nx/lib/nx.ex

+    |> subtract(mean)
+    |> power(2)
+    |> sum()
+    |> divide(total)


We should document the exact formula, in particular that we divide by N. In statistical use cases it's often desired to divide by N - 1 when calculating sample variance/stdev, since that gives an unbiased estimate.

For the record numpy.var has ddof option and divides by N - ddof, torch.var has a more explicit unbiased option, while tfp.stats.variance always uses N.

added. good call.

floatn · 2022-01-28T18:23:15Z

nx/lib/nx.ex

+
+    tensor
+    |> Nx.subtract(mean)
+    |> Nx.divide(std(tensor))


What about zero std value here?

what do you mean by zero ?

I believe the standard_deviation can return 0 (for example if all values are equal). In this case, this function will error. Are there other formulas to implement standard_scale?

Look that std is 0 only and only if all the values are equal. In this case, there is no possibility to achieve unit variance, so the data should be left as-is. And actually, I've searched how it is calculated in Sci-kit learn. They based on this article http://www.cs.yale.edu/publications/techreports/tr222.pdf.

But this problematic case is still handled particularly. What is more, they catch the cases when the std is non-zero, but values are almost constant for stability issues. https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/preprocessing/_data.py#L84

Yeah, let's give up on the standard_scale/normalize for now then and revisit later.

nx/lib/nx.ex

polvalente · 2022-01-28T18:46:39Z

nx/lib/nx.ex

+  """
+  @doc type: :element
+  @spec var(tensor :: Nx.Tensor.t()) :: Nx.Tensor.t()
+  def var(%Nx.Tensor{shape: shape} = tensor) do


What is the definition for the variance of a 0-dimensional tensor? Perhaps we should have a check against that

iex(3)> Nx.variance Nx.tensor(5) #Nx.Tensor< f32 0.0 >

nx/lib/nx.ex

josevalim · 2022-01-28T20:30:04Z

nx/lib/nx.ex

+  """
+  @doc type: :aggregation
+  @spec standard_deviation(tensor :: Nx.Tensor.t(), ddof :: number()) :: Nx.Tensor.t()
+  def standard_deviation(tensor, ddof \\ 0)


Since we may want to add an axes option to those functions, perhaps ddof must be an option too? See how mean is done, we want to accept the same options. :)

seanmor5 · 2022-01-28T23:21:50Z

I am in favor of standard deviation and variance, but I'm not sure standard scale belongs in Nx. It feels more like the responsibility of another library related to data preprocessing

tiagodavi · 2022-01-29T19:51:27Z

I am trying to do something like this to respect the guidelines, but not sure how to access the tensor over its axes to compute the variance at the end:

elixir
def variance(tensor, opts \\ []) do
    %T{shape: shape, names: names} = tensor = to_tensor(tensor)

    mean_den =
      if axes = opts[:axes] do
        mean_den(shape, Nx.Shape.normalize_axes(shape, axes, names))
      else
        mean_den(shape, nil)
      end

    ddof = Keyword.get(opts, :ddof, 0)
    mean = mean(tensor, Keyword.take(opts, [:axes, keep_axes: false]))

    axes = axes(tensor)

    IO.inspect axes

    tensor[axes] # I am assuming I need to calculate based on its axes because mean and mean_den are based on it.
    |> subtract(mean)
    |> power(2)
    |> sum()
    |> divide(mean_den - ddof)
  end

josevalim · 2022-01-29T19:54:58Z

Oh, I see. Let’s not worry about the axes version for now then!

tiagodavi · 2022-01-31T18:29:21Z

@josevalim done. Fixed spaces, removed standard_scale function and added the ddof option to variance, standard_deviation without axes for the time being.

nx/lib/nx.ex

polvalente · 2022-01-31T20:12:28Z

nx/lib/nx.ex

+    %T{shape: shape} = tensor = to_tensor(tensor)
+
+    total = Tuple.product(shape)
+    ddof = Keyword.get(opts, :ddof, 0)


Suggested change

ddof = Keyword.get(opts, :ddof, 0)

ddof = Keyword.fetch!(opts, :ddof)

polvalente · 2022-01-31T20:13:46Z

nx/lib/nx.ex

+    tensor = to_tensor(tensor)
+


Suggested change

tensor = to_tensor(tensor)

In this specific case we can just delegate to variance

nx/lib/nx.ex

josevalim · 2022-01-31T21:48:33Z

💚 💙 💜 💛 ❤️

add utility functions to NX

e61743e

josevalim reviewed Jan 28, 2022

View reviewed changes

nx/lib/nx.ex Outdated Show resolved Hide resolved

josevalim reviewed Jan 28, 2022

View reviewed changes

jonatanklosko reviewed Jan 28, 2022

View reviewed changes

floatn reviewed Jan 28, 2022

View reviewed changes

polvalente reviewed Jan 28, 2022

View reviewed changes

nx/lib/nx.ex Outdated Show resolved Hide resolved

polvalente reviewed Jan 28, 2022

View reviewed changes

nx/lib/nx.ex Outdated Show resolved Hide resolved

polvalente reviewed Jan 28, 2022

View reviewed changes

nx/lib/nx.ex Outdated Show resolved Hide resolved

add improvements based on comments

dfba19c

josevalim reviewed Jan 28, 2022

View reviewed changes

nx/lib/nx.ex Outdated Show resolved Hide resolved

josevalim reviewed Jan 28, 2022

View reviewed changes

try to follow guidelines

6b25977

polvalente reviewed Jan 31, 2022

View reviewed changes

nx/lib/nx.ex Show resolved Hide resolved

polvalente reviewed Jan 31, 2022

View reviewed changes

improve opts and implementation

9e8ae22

polvalente approved these changes Jan 31, 2022

View reviewed changes

josevalim reviewed Jan 31, 2022

View reviewed changes

nx/lib/nx.ex Outdated Show resolved Hide resolved

josevalim reviewed Jan 31, 2022

View reviewed changes

nx/lib/nx.ex Outdated Show resolved Hide resolved

josevalim reviewed Jan 31, 2022

View reviewed changes

nx/lib/nx.ex Outdated Show resolved Hide resolved

Apply suggestions from code review

40eb8e2

josevalim merged commit 6259450 into elixir-nx:main Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add utility functions to NX #616

add utility functions to NX #616

tiagodavi commented Jan 28, 2022

josevalim commented Jan 28, 2022

polvalente commented Jan 28, 2022

josevalim commented Jan 28, 2022

josevalim Jan 28, 2022

tiagodavi Jan 28, 2022

jonatanklosko Jan 28, 2022

tiagodavi Jan 28, 2022

floatn Jan 28, 2022

tiagodavi Jan 28, 2022

josevalim Jan 28, 2022

msluszniak Jan 29, 2022

msluszniak Jan 29, 2022

josevalim Jan 29, 2022

polvalente Jan 28, 2022

tiagodavi Jan 28, 2022

josevalim Jan 28, 2022

seanmor5 commented Jan 28, 2022

tiagodavi commented Jan 29, 2022

josevalim commented Jan 29, 2022

tiagodavi commented Jan 31, 2022

polvalente Jan 31, 2022

tiagodavi Jan 31, 2022

polvalente Jan 31, 2022

tiagodavi Jan 31, 2022

josevalim commented Jan 31, 2022

	ddof = Keyword.get(opts, :ddof, 0)
	ddof = Keyword.fetch!(opts, :ddof)

add utility functions to NX #616

add utility functions to NX #616

Conversation

tiagodavi commented Jan 28, 2022

josevalim commented Jan 28, 2022

polvalente commented Jan 28, 2022

josevalim commented Jan 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanmor5 commented Jan 28, 2022

tiagodavi commented Jan 29, 2022

josevalim commented Jan 29, 2022

tiagodavi commented Jan 31, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josevalim commented Jan 31, 2022