-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add kaiming initialization and relevant docstrings #1243
Conversation
For future reference:
So neither libraries use by default the ReLU rescaling factor sqrt(2). Let's keep this in mind if we are going to change the default init |
bump |
Lgtm, although the references to the other functions are a bit verbose. We should also add a section for initialisation in the docs (in a future PR) Thanks @johnnychen94 |
We should add a bit in the docs about how to use the initializations in regular layers |
This might be a future PR too, but could we consider taking |
* updates the docstring of glorot initialization * add a method for nfan(::Tuple) for robustness consideration, otherwise nfan((100, 400)) would return 1, (100, 400), which isn't correct. Co-authored-by: Aniket Das <aniketd@iitk.ac.in> Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
Rebased commits with no content changes. |
bors r+ |
Build succeeded: |
Pytorch's default is |
This is an updated version of #425,
Distributions
is not used because it always generatesArray{Float64, N}
instead ofArray{Float32, N}
.nfan(::Tuple)
for robustness consideration, otherwisenfan((100, 400))
would return(1, (100, 400))
, which isn't correct.These methods are not exported because
glorot_*
aren't, either.If this get merged, we could switch from
glorot_uniform
tokaiming_uniform
forConv
since people nowadays userelu
mostly, but that belongs to another PR.closes #425 closes #424
Co-authored-by: Aniket Das aniketd@iitk.ac.in
PR Checklist
@MikeInnes
or@dhairyagandhi96
(for API changes).