Better initialization support #670

domluna · 2019-03-08T15:43:24Z

This is very nice, thanks!

It would be useful to open an issue to discuss the need for the Linear layer here. Hopefully we can make the builtins more flexible so this kind of thing is less necessary.

Originally posted by @MikeInnes in FluxML/model-zoo#115 (comment)

The primary need for making a new type Linear, was the bias initializer only takes in the output dimension, which is intuitive but is problematic when considering some bias initialization rely on more than the output dimension. For example, the default nn.Linear layer in PyTorch scales the initialization of the bias by the input dimension. Relevant code:

def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

The text was updated successfully, but these errors were encountered:

MikeInnes · 2019-03-08T22:55:35Z

I see, so I think there are two issues we can solve here. Firstly we could use better initialisation for the Dense layers anyway; that would be an easy patch.

Secondly in terms of making initialisation be more flexible, would the syntax Dense(W, b) be sufficient here? If so I think this is largely a documentation issue which I'm recording in #671.

domluna · 2019-03-10T18:46:43Z

That should definitely be documented.

It just occurred to me how multiple dispatch allows for a much more flexible API, whereas Python TensorFlow or PyTorch is limited to a single constructor. For people coming from those ecosystems its nice to have a familiar API but we should investigate further how we can make initialization more Julian.

dnabanita7 · 2019-09-14T04:53:10Z

It looks interesting! Is anybody working on this? I would like to take this up.

darsnack · 2021-02-13T00:24:20Z

The constructor issues here are already addressed. #1243 fixes the rescaling issue, #1423 fixes the missing constructor from the docstring, and #1440 will simplify any issues with initW/initb and the docstrings. I think this can be safely closed.

mcabbott · 2021-02-13T08:32:40Z

Would it be worth including in the docs something like this?

"For more complicated initialisation, it's recommended to just write a function. For example, this should match Pytorch's nn.Linear layer defaults:

function pydense(in, out, σ=identity; bias=true)
  W = Flux.kaiming_uniform(out, in, gain=sqrt(2/5))
  fan_in, _ = Flux.nfan(out, in)
  b = (rand(out) .- 1/2) .* 2 ./ sqrt(fan_in) .|> Float32
  Dense(W, bias && b, σ)
end

Except that's not quite right, for the weights?

domluna changed the title ~~This is very nice, thanks!~~ Better initialization support Mar 8, 2019

MikeInnes added documentation help wanted labels Mar 26, 2019

darsnack closed this as completed Feb 13, 2021

mcabbott mentioned this issue Feb 15, 2021

Dense keyword handling, and docstring #1440

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better initialization support #670

Better initialization support #670

domluna commented Mar 8, 2019 •

edited

Loading

MikeInnes commented Mar 8, 2019

domluna commented Mar 10, 2019

dnabanita7 commented Sep 14, 2019

darsnack commented Feb 13, 2021

mcabbott commented Feb 13, 2021 •

edited

Loading

Better initialization support #670

Better initialization support #670

Comments

domluna commented Mar 8, 2019 • edited Loading

MikeInnes commented Mar 8, 2019

domluna commented Mar 10, 2019

dnabanita7 commented Sep 14, 2019

darsnack commented Feb 13, 2021

mcabbott commented Feb 13, 2021 • edited Loading

domluna commented Mar 8, 2019 •

edited

Loading

mcabbott commented Feb 13, 2021 •

edited

Loading