-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dense
keyword handling, and docstring
#1440
Conversation
should we add |
I think simpler is better here. I can imagine wanting a different scaling for For example, there is not a single use of this in the model zoo. Although there are a few cases using |
agree, changing the bias init is very rare, so since we have to modify either Conv or Dense for consistency, better go with the simplest interface. One more thing should be changed though: if the bias is not explicitly passed to the constructor, it should be created with the same eltype of the weights, while currently is always Float32. This would solve #1421 |
Hadn't seen the issue. It sounds like we should change |
me neither, just found it scrolling open issues
The interface could be |
Oh I didn't look before I wrote this, sorry. I went for |
Adding a size check to I am sure I've suggested this before, but what thoughts on If we do this, I think |
I'm unsure about this: on one hand, Dense(2 => 3) is clearer, but Dense(2, 3) has been there forever and is much nicer to write. Since I can foresee an endless discussion about this, let's leave it out of this PR |
OK. Then perhaps this is done, I think,
|
also recurrent layers have |
Oh I didn't see the RNN ones. Not this PR, I think. |
Worth noting that |
Yes, maybe I forgot to say that. Although I think of that as a plus, Dense treats the first dimension as channels. Is that what you meant too, or do you see a conflict of meanings? |
I meant that while we're on the topic of inconsistencies between Dense, Conv and RNN constructors, might as well note that the syntax for specifying # of input and output features(/channels) is also inconsistent. e.g. PyTorch does separate |
Ok! I am very happy to make that change here. Or in another PR, this one is done I think, and perhaps better to do things one by one. |
Yeah, ultimately it's more of a cosmetic change and I'm not qualified to comment on what the ROI for breaking the interface might be. We can revisit this in another issue :) |
Bump? |
Thoughts on including #670 (comment), btw? Doesn't have to be now obviously but isn't a bad fit. I just didn't quite get the |
I would say that since we support the ability to do that kind of initialization, this PR is still good to go. Things like the snippet you linked would be nice for an overhauled initialization section in the docs. That snippet addresses some particular conventions for No strong feelings though. |
OK, yes maybe best not to delay this with any further additions. |
Yes, I think we should have that snippet somewhere to help reproducibility, not material for this PR though. Maybe we should have a "moving from pytorch" or a "differences with pytorch" section in the docs at some point |
another bump for this, needs just an @DhairyaLGandhi's approval for merge, let's please prioritize the v0.12 stuff |
@DhairyaLGandhi close your eyes and hit approve, we are going to be fine :) |
bors r+ |
1440: `Dense` keyword handling, and docstring r=CarloLucibello a=mcabbott Closes #1422, by killing the `initW` keyword, in favour of `init` as used by the Conv layers. Also fixes "in×out weight matrix" which was incorrect. And makes `Dense(rand(2,3), bias)` work like `Dense(3,2; bias)`, which again is like the Conv layers. Edit -- also closes #1421 now, ensuring that the bias vectors of both Conv and Dense layers match the eltype of the weights. ### PR Checklist - [x] Tests are added - [x] Entry in NEWS.md - [x] Documentation, if applicable - [ ] Final review from `@dhairyagandhi96` (for API changes). Co-authored-by: Michael Abbott <me@escbook>
bors r+ |
1440: `Dense` keyword handling, and docstring r=CarloLucibello a=mcabbott Closes #1422, by killing the `initW` keyword, in favour of `init` as used by the Conv layers. Also fixes "in×out weight matrix" which was incorrect. And makes `Dense(rand(2,3), bias)` work like `Dense(3,2; bias)`, which again is like the Conv layers. Edit -- also closes #1421 now, ensuring that the bias vectors of both Conv and Dense layers match the eltype of the weights. ### PR Checklist - [x] Tests are added - [x] Entry in NEWS.md - [x] Documentation, if applicable - [ ] Final review from `@dhairyagandhi96` (for API changes). Co-authored-by: Michael Abbott <me@escbook>
Yeah, not comfortable with this change. I'm also not clear on what the exact benefit is, it seems like the field names etc change but was anyone asking them to be? If all you wanted was the ability to call into dense with different symbols just do, Which is much more acceptable. |
bors r+ |
Build succeeded: |
Closes #1422, by killing the
initW
keyword, in favour ofinit
as used by the Conv layers.Also fixes "in×out weight matrix" which was incorrect.
And makes
Dense(rand(2,3), bias)
work likeDense(3,2; bias)
, which again is like the Conv layers.Edit -- also closes #1421 now, ensuring that the bias vectors of both Conv and Dense layers match the eltype of the weights.
PR Checklist
@dhairyagandhi96
(for API changes).