-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix weight sharing #2866
Fix weight sharing #2866
Conversation
7feef51
to
b1d42c8
Compare
This looks good to me once #2836 is merged. The switch for test data to accommodate the accumulation checks is fine by me. Thanks for sorting this out Jeff! |
This makes progress on #1211. @jeffdonahue I think this addresses the https://github.com/BVLC/caffe/pull/546/files#r16817721 issue by the |
@@ -458,7 +461,7 @@ template <typename Dtype> | |||
void SGDSolver<Dtype>::ClipGradients() { | |||
const Dtype clip_gradients = this->param_.clip_gradients(); | |||
if (clip_gradients < 0) { return; } | |||
const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); | |||
const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); | |||
Dtype sumsq_diff = 0; | |||
for (int i = 0; i < net_params.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could loop over Now that this is a loop over learnable_params
now as could the loop on 477.learnable_params
, the condition on param_owners()[i]
is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed... see the above diff :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which we learn I have a short term memory of 1 line... sorry haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shelhamer later pointed out that the remaining check on param_owners
below was wrong -- just fixed that, thanks!
This is very nice. Does the learnable_params depend on HDF5 snapshots in #2836? If not, can we just separate learnable_params out and merge independently? Thanks! |
b1d42c8
to
d8ed02a
Compare
@raingo if you want, you should be able to cherry-pick my last commit (you probably also need my first two commits to avoid conflicts in |
… params -Params now share diffs as well as data (works due to layers accumulating gradients into param diffs, rather than overwriting) -It's now required that any shared params with specified lr_mult's, decay_mult's match -TestGradientBasedSolver checks that behavior remains correct with shared weights
d8ed02a
to
d5b42bf
Compare
Rebased now that #2836 is merged, and unrelated changes to |
I took a last glance and didn't catch anything to change so go ahead and
|
Great, thanks for the review @shelhamer! |
On further thought I think that The harm in changing it could be if there is downstream code such as solvers or serializers that made use of What's your take @jeffdonahue ? |
Yeah, I think I agree. Downstream code that is actually aware of weight sharing and conditions on |
This fixes a couple issues with weight sharing:
params_
vector. Someone can do the math to figure out exactly how it was incorrect if they want to; I'll just say that the added tests definitely fail without this fix.The one possible downside is that you can no longer specify different
lr_mult
s for shared parameters; but using this "feature" was probably a bad idea before, and if you were also using momentum or weight decay it was probably behaving in an incorrect/unexpected way.This is based on @erictzeng's PR #2836; adding just the last commit on top of that.