Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-the-fly net resizing, without reallocation (where possible) #594

Merged
merged 19 commits into from
Sep 18, 2014

Conversation

longjon
Copy link
Contributor

@longjon longjon commented Jul 3, 2014

This PR allows nets to change their input sizes in-place, reusing allocated memory for blobs and buffers. This allows, for example:

  • building a net with a large batch size, and using it for batches of any smaller size, without memory or significant computational cost, and
  • building a large convolutional net, and using it for inputs of any smaller size, again at no cost.

Net gets a new method Reshape that provides this functionality. One first resizes input blobs manually, then calls Net::Reshape with no arguments. (This has the unfortunate property that one has to momentarily "break" the net before calling Reshape, but it avoids the awkwardness of typing Reshape to accept a vector of 4-tuples.)

Net::Reshape just calls Layer::Reshape for each layer in turn, bottom-up. Since reshaping doesn't make sense for all layers, and layers may constrain acceptable new sizes, Layer::Reshape is opt-in; only layers that implement it can be reshaped. (Neuron layers can all be reshaped, most of them in a trivial way, so an implementation of NeuronLayer::Reshape is provided.)

Note that reshaping is intended only for cases where the existing parameters can continue to be used with modification. Reshaping is not intended for use with data layers. This PR provides reshapability for essentially only the layers needed for a Krizhevsky-style net.

Many layers use internal buffers, which are sometimes implemented as Blobs, sometimes shared_ptr<Blob>s, and sometimes SyncedMemorys. This PR uniformizes some of these to be just Blobs in order to simplify implementation. As far as I can tell, no shared_ptrs were removed that needed to be shared_ptrs; someone let me know if this is not true (@jeffdonahue?)

This PR includes a simple implementation of part of #355 in 33959e5f9a4c4342ee797fca71d607d74f792483. Unlike #355, the implementation is entirely within Blob::Reshape, and does not touch SyncedMemory. The disadvantage is that it doesn't call realloc when enlarging blobs; this could be added in a later patch if desired. (This PR does not address sharing blobs between train and test nets.)

Although I haven't used it a lot yet, this PR is fully-baked: tests are included, it's usable from Python, it builds everything with -Wall -Werror, and passes tests and lint. There is a clean, linear history; if the reader feels overwhelmed by the changes, I suggest reading it one commit at a time (in commit order, not github order).

@kloudkl
Copy link
Contributor

kloudkl commented Jul 3, 2014

Sounds like generalization of @sguada's #108.

@kloudkl
Copy link
Contributor

kloudkl commented Jul 3, 2014

Does this mean to partially resolve #557? The reshapable net still requires the images of a batch to be of the same size. To be more general, the involved blobs have to reshape themselves for each image on demand. Therefore the reshaping should not be initiated by the Net or the Layer.

@longjon
Copy link
Contributor Author

longjon commented Jul 3, 2014

@kloudkl, I believe this is orthogonal to both #108 and #557. Let me clarify.

#108 is a particular kind of layer, where reshaping is applied to data during a forward pass; this is about reshaping between passes, applied to entire nets.

#557, if I understand correctly, is about data layers that read variously sizes images from disk, but still produce fixed-sized top blobs. This patch does not address that issue, and actually cannot be used with data layers. It's really meant for data supplied through the Python or matlab wrapper (or custom C++ code).

One might imagine a data layer that produces variously sized top blobs, fed into a network with blobs without definite sizes. That should be a straightforward extension of this patch: just call Layer::Reshape between forward calls in the forward pass, instead of explicitly calling Net::Reshape. That also removes the awkwardness I mentioned above of having to "break" the net before calling Reshape; instead, just reshape the inputs and go!

I won't do that right away though.

@sguada
Copy link
Contributor

sguada commented Jul 3, 2014

@longjon thanks it is a nice PR, you even clean up quite a bit the code, specially all the temporary data :)

I will check it with the Matlab wrapper.

#557 it is only meant to allow different size images as inputs, but data layers will still produce fixed size blobs. So it is complementary to this PR

@kloudkl #108 is a layer, and therefore don't change the size dynamically, as this PR does.

@@ -272,6 +272,11 @@ struct CaffeNet {
return output_blob_names;
}

void reshape(int num, int channels, int height, int width) {
net_->input_blobs()[0]->Reshape(num, channels, height, width);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is more than one input_blob how could be resized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Python wrapper, I just implemented the common case where one wants to resize the first input blob (usually there is only one input blob, or the second blob is a fixed-size label). If people feel it should be included, I can implement the general case, or it can be done in a future PR.

@shelhamer
Copy link
Member

@longjon nice job on a long-wished for feature.

I'm a little uncomfortable about the restriction to a single (resizable) input blob. I'd vote for the general case now instead of later. Since Caffe does DAGs I'd vote for this too as well and not have some state where some features are only supported for certain classes of models.

@longjon
Copy link
Contributor Author

longjon commented Jul 11, 2014

(@shelhamer) Here's my plan for this PR:

  • add a name parameter to the Python reshape method, giving it the full power of the C++ interface (and check that that name is an input layer) [reshaping is supported in Python in the same way as in C++; directly reshape input blobs and proceed with forward]
  • switch to the implementation under "One might imagine" above, which I've already written
  • add checks to that implementation so that (1) reshaping and immediately calling backward is an error instead of producing bad results, and (2) there is no performance hit from an inefficient implementation of Layer::Reshape that is not being used [(1) has been abandoned, and there are no inefficient implementations of Reshape]
  • support DAGs by adding reshape support to SplitLayer, which I've already written and am using
  • check each layer to make sure reshape is supported for exactly those layers for which reshaping makes sense [reshaping is supported for all layers]
  • submit a partner PR with a DataLayer that generates images of different sizes, which I've also already implemented (done in Reshape single input batches for inputs of varying dimension #1313)

I expect/hope all that will happen by the end of next week. [ha!]

@shelhamer
Copy link
Member

@longjon please resurrect and complete this as outlined in #594 (comment). This'll settle plenty of workarounds with varying inputs, aspect ratio, and sequences.

@longjon
Copy link
Contributor Author

longjon commented Aug 31, 2014

@shelhamer, yes, I'll rebase this soon and add the promised features, which I already use quite a bit. There is one design issue that has kept me from hastily pushing this forward, but I'm ready with a proposed solution, coming soon.

@longjon
Copy link
Contributor Author

longjon commented Sep 13, 2014

Rebased!

As mentioned above, I've also switched to an implementation where Layer::Reshape is called between Forward calls, so that layers can reshape their top blobs in their forward passes without any special extra work. Even though this leads to some redundant computation when layers are not being reshaped, there is no effect on performance. (E.g., for 20 passes of the reference caffenet with batch dimension 256, reshaping takes a total of 1.5 ms, compared to ~19 s for forward and backward with cuDNN.)

Reshape is supported for all layers. (Of course, layers with parameters will error out if you try to reshape to a size incompatible with their parameters.)

The layer interface is changed slightly. LayerSetUp is split into LayerSetUp and Reshape, with the former being called once for one-time set up, and the latter being called before every forward pass to update the sizes of the top blobs and any internal buffers. This does not mean that more code needs to be written per layer; it just needs to be organized a little differently. Making the split was a trivial matter for most layers. Reshape is made mandatory instead of LayerSetUp, and docs are updated accordingly.

To make use of net reshaping:

  • if you want to write a layer that produces blobs of varying sizes, just call Blob::Reshape as necessary in the forward pass
  • if you want to change the size of an input blob, just call Blob::Reshape on that blob and continue with the forward pass

It is up to the user to not Reshape an input blob and then immediately call Backward (or to otherwise backprop to a just-reshaped blob, e.g. by some combination of From/To calls). There is no way to avoid this without some extra mechanism coordinating reshapes, and in practice this has not been an issue.

Reshaping can be done in pycaffe with the same interface as in C++; just call Blob.reshape on input blobs. With #1020, layers can be written in Python that produce top blobs of varying sizes.

This is ready for (re-)review.

This allows nets to be reshaped very quickly (essentially for free) as
long as sufficient memory has been allocated. Calling Blob::Reshape in
order to free up memory becomes impossible; however, this is not a
normal use case (and deleting blobs does free memory).
Note that calling Reshape when no reshape is necessary should be
effectively a no-op, so this is not a performance regression.
This will make it possible to add reshaping to cuDNN layers.
@shelhamer
Copy link
Member

Thanks Jon! This is not only a long-awaited improvement but a model PR with orderly and clear description and history.

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
On-the-fly net resizing, without reallocation (where possible)
shelhamer added a commit to shelhamer/caffe that referenced this pull request Oct 16, 2014
share the im2col / col2im buffers among convolution layers by making the
buffer a static member.

@longjon deserves all the credit for the reshaping BVLC#594 and this patch.
@czhsuccess
Copy link

It seems that this PR does not support matlab wrapper, am I right?@longjon

@longjon
Copy link
Contributor Author

longjon commented Nov 1, 2014

As far as I know, support for manual reshaping of the input has not been added to the matlab wrapper, though it might or might not work fine with layers that produce tops of various sizes. @sguada, do you know the full story?

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
On-the-fly net resizing, without reallocation (where possible)
@shelhamer shelhamer mentioned this pull request Dec 15, 2014
@longjon longjon deleted the layer-reshaping branch December 30, 2014 04:59
shelhamer added a commit to shelhamer/caffe that referenced this pull request Mar 3, 2015
share the im2col / col2im buffers among convolution + deconvolution
layers by making the buffer a static member.

@longjon deserves all the credit for the reshaping BVLC#594 and this patch.
shelhamer added a commit to shelhamer/caffe that referenced this pull request Mar 3, 2015
share the im2col / col2im buffers among convolution + deconvolution
layers by making the buffer a static member.

@longjon deserves all the credit for the reshaping BVLC#594 and this patch.
shelhamer added a commit to shelhamer/caffe that referenced this pull request Mar 4, 2016
share the im2col / col2im buffers among convolution + deconvolution
layers by making the buffer a static member.

@longjon deserves all the credit for the reshaping BVLC#594 and this patch.
jonlong-symbio pushed a commit to jonlong-symbio/caffe that referenced this pull request Jan 18, 2017
share the im2col / col2im buffers among convolution + deconvolution
layers by making the buffer a static member.

@longjon deserves all the credit for the reshaping BVLC#594 and this patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants