Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better initialization support #670

Closed
domluna opened this issue Mar 8, 2019 · 5 comments
Closed

Better initialization support #670

domluna opened this issue Mar 8, 2019 · 5 comments

Comments

@domluna
Copy link
Contributor

domluna commented Mar 8, 2019

This is very nice, thanks!

It would be useful to open an issue to discuss the need for the Linear layer here. Hopefully we can make the builtins more flexible so this kind of thing is less necessary.

Originally posted by @MikeInnes in FluxML/model-zoo#115 (comment)

The primary need for making a new type Linear, was the bias initializer only takes in the output dimension, which is intuitive but is problematic when considering some bias initialization rely on more than the output dimension. For example, the default nn.Linear layer in PyTorch scales the initialization of the bias by the input dimension. Relevant code:

def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)
@domluna domluna changed the title This is very nice, thanks! Better initialization support Mar 8, 2019
@MikeInnes
Copy link
Member

I see, so I think there are two issues we can solve here. Firstly we could use better initialisation for the Dense layers anyway; that would be an easy patch.

Secondly in terms of making initialisation be more flexible, would the syntax Dense(W, b) be sufficient here? If so I think this is largely a documentation issue which I'm recording in #671.

@domluna
Copy link
Contributor Author

domluna commented Mar 10, 2019

That should definitely be documented.

It just occurred to me how multiple dispatch allows for a much more flexible API, whereas Python TensorFlow or PyTorch is limited to a single constructor. For people coming from those ecosystems its nice to have a familiar API but we should investigate further how we can make initialization more Julian.

@dnabanita7
Copy link
Contributor

It looks interesting! Is anybody working on this? I would like to take this up.

@darsnack
Copy link
Member

The constructor issues here are already addressed. #1243 fixes the rescaling issue, #1423 fixes the missing constructor from the docstring, and #1440 will simplify any issues with initW/initb and the docstrings. I think this can be safely closed.

@mcabbott
Copy link
Member

mcabbott commented Feb 13, 2021

Would it be worth including in the docs something like this?

"For more complicated initialisation, it's recommended to just write a function. For example, this should match Pytorch's nn.Linear layer defaults:

function pydense(in, out, σ=identity; bias=true)
  W = Flux.kaiming_uniform(out, in, gain=sqrt(2/5))
  fan_in, _ = Flux.nfan(out, in)
  b = (rand(out) .- 1/2) .* 2 ./ sqrt(fan_in) .|> Float32
  Dense(W, bias && b, σ)
end

Except that's not quite right, for the weights?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants