-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better initialization support #670
Comments
I see, so I think there are two issues we can solve here. Firstly we could use better initialisation for the Dense layers anyway; that would be an easy patch. Secondly in terms of making initialisation be more flexible, would the syntax |
That should definitely be documented. It just occurred to me how multiple dispatch allows for a much more flexible API, whereas Python TensorFlow or PyTorch is limited to a single constructor. For people coming from those ecosystems its nice to have a familiar API but we should investigate further how we can make initialization more Julian. |
It looks interesting! Is anybody working on this? I would like to take this up. |
Would it be worth including in the docs something like this? "For more complicated initialisation, it's recommended to just write a function. For example, this should match Pytorch's function pydense(in, out, σ=identity; bias=true)
W = Flux.kaiming_uniform(out, in, gain=sqrt(2/5))
fan_in, _ = Flux.nfan(out, in)
b = (rand(out) .- 1/2) .* 2 ./ sqrt(fan_in) .|> Float32
Dense(W, bias && b, σ)
end Except that's not quite right, for the weights? |
This is very nice, thanks!
It would be useful to open an issue to discuss the need for the
Linear
layer here. Hopefully we can make the builtins more flexible so this kind of thing is less necessary.Originally posted by @MikeInnes in FluxML/model-zoo#115 (comment)
The primary need for making a new type
Linear
, was the bias initializer only takes in the output dimension, which is intuitive but is problematic when considering some bias initialization rely on more than the output dimension. For example, the defaultnn.Linear
layer in PyTorch scales the initialization of the bias by the input dimension. Relevant code:The text was updated successfully, but these errors were encountered: