Python layer for rapidly writing nets in Python #1020

longjon · 2014-09-01T06:30:44Z

~~This PR depends on #1014, and shouldn't be merged before that one.~~

Caffe is fast, but adding new layers is a multistep process (see #684), and is subject to all of the pitfalls of development in a low-level, compiled language. For quickly trying new ideas, speed of development may be more important than runtime speed.

This PR addresses this gap by allowing layers to be written in Python. One adds layers to the net prototxt that look like the following:

layers {
  name: "python"
  type: PYTHON
  bottom: "data"
  top: "output"
  python_param {
    module: "my_module"
    layer: "MyLayer"
  }
}

Then, one implements layers in Python with more or less the same interface as in C++, as below.

class MyLayer(object):
    """Simple layer that multiplies input by ten."""

    def setup(self, bottom, top):
        pass

    def reshape(self, bottom, top):
        top[0].reshape(bottom[0].num, bottom[0].channels,
                bottom[0].height, bottom[0].width)

    def forward(self, bottom, top):
        top[0].data[...] = 10 * bottom[0].data

    def backward(self, top, propagate_down, bottom):
        if propagate_down[0]:
            bottom[0].diff[...] = 10 * top[0].diff

In order to make this work, the main caffe binaries need to be linked against the Python libraries. This is solved by adding a build option, WITH_PYTHON_LAYER, as well as a couple ifdefs. These changes are probably not quite ready for merge; in particular, this should break the cmake build system, and the build option is not being tested in Travis yet. (@akosiorek or others, if you're eager to make this work with cmake, PRs against this branch are welcome.)

Layers with learnable parameters are not supported yet.

In theory, this means you ought to be able to write layers in Theano (making use of Theano's symbolic differentiation), embed them in caffe nets, and solve using caffe. I haven't tried that yet, but it might be worth adding some Python-level helper code for that later on.

This hasn't really been tested at all (but does build and run).

akosiorek · 2014-09-08T14:37:44Z

Looks like a helpful addition. Once it's ready I'll be eager to help with CMake.

jeffdonahue · 2014-09-22T00:08:40Z

This will be awesome whenever it's ready.

I was just thinking this layer could be even more useful if a prefetch thread could call it as a data provider (in fact this is how all external data is provided in Alex Krizhevsky's cuda-convnet, which honestly makes it more flexible/extensible than our current set of data layers) -- it could be very useful if someone wanted to extend @longjon's python layer to work that way.

Or maybe that would be better implemented in some other way? (It's getting a bit meta with a Python wrapper calling a C++ library which itself sometimes calls Python code...) I guess you can technically already write your data provider in Python, editing the input blobs, but for training you'd have to reimplement the solver logic, and it would be done serially between passes rather than in a separate thread...so maybe we're back to the idea of having a solver callback? Anyway, now I'm just rambling; hopefully someone else thinks this would be useful and has some more coherent thoughts on a good design.

shelhamer · 2014-09-22T00:49:22Z

Rebasing ~~and fitting this into the build~~ aside I think this is ready to go if I remember my last conversation with @longjon right.

@jeffdonahue having a prefetch hook to python / matlab / not C++ is a fine idea but I'm not sure what it should look like either. I do entirely agree it would be useful. While we have the solver callback now the data processing and solving are done in alternation instead of simultaneously by prefetching. Perhaps a PythonLayer and PythonData split is worthwhile since the layer interface is setup / forward / backward whereas for data it's really about setup / prefetch.

I'm not certain a layer interface for prefetch is entirely right although that is part of the data / transformation layer conversation and more broadly a question of what to do about phases and stages. Should there be privileged stages like PREFETCH and DEPLOY that the caffe actions, solver, and wrappers know about?

Now I've done my rambling too. Should we meet up for a brew session on this? The cold brew returns tomorrow so there's occasion for Caffe conversation.

bhack · 2014-09-22T09:14:31Z

I agree with @shelhamer that this will involve to think also on transformation augmentation (and generally synthetic generation). Finding a good design could be useful also in further for exploring transformation sampling space dynamically through monitoring loss or accuracy.

This is necessary to allow Python to access Blobs from the layer interface, which takes raw pointers. (It does, however, mean that Python layers must not hold onto Blobs beyond their layer calls.)

This is needed for passing propagate_down to Python layers.

This option links the standard caffe binaries, not just pycaffe, against the Python libraries, making it possible to embed Python in caffe.

jeffdonahue · 2014-10-26T19:42:59Z

Hey Jon, I was thinking of trying this out. I didn't actually test this, but is the backward example below correct?

    def backward(self, top, propagate_down, bottom):
        if propagate_down[0]:
            bottom[0].data[...] = 10

Should it not be something like:

    def backward(self, top, propagate_down, bottom):
        if propagate_down[0]:
            bottom[0].diff = 10 * top[0].diff

longjon · 2014-10-26T22:09:02Z

Oh dear, @jeffdonahue, of course you're correct. I've fixed the example.

jeffdonahue · 2014-10-28T19:47:37Z

I rebased and tried building this. I get a bunch of multiple definition errors when trying to compile pycaffe. Everything compiles correctly if I check out the second to last commit (before the PythonLayer is added) and comment out the Python object being added to OBJS in the Makefile:

[/home/jdonahue/caffe-bvlc 3]$ gd
diff --git a/Makefile b/Makefile
index c8be847..a57cbd4 100644
--- a/Makefile
+++ b/Makefile
@@ -102,9 +102,9 @@ OBJ_BUILD_DIR := $(BUILD_DIR)/src/$(PROJECT)
 LAYER_BUILD_DIR := $(OBJ_BUILD_DIR)/layers
 UTIL_BUILD_DIR := $(OBJ_BUILD_DIR)/util
 OBJS := $(PROTO_OBJS) $(CXX_OBJS) $(CU_OBJS)
-ifeq ($(USE_PYTHON_LAYER), 1)
-       OBJS += python/$(PROJECT)/_$(PROJECT).o
-endif
+# ifeq ($(USE_PYTHON_LAYER), 1)
+#      OBJS += python/$(PROJECT)/_$(PROJECT).o
+# endif
 # tool, example, and test objects
 TOOL_OBJS := $(addprefix $(BUILD_DIR)/, ${TOOL_SRCS:.cpp=.o})
 TOOL_BUILD_DIR := $(BUILD_DIR)/tools

So I think the multiple definition errors come from the fact that Caffe builds with the PyCaffe object and PyCaffe builds with the Caffe object, leading to multiple definitions of things in the PyCaffe object? Not sure though...

jeffdonahue · 2014-10-28T20:19:43Z

(Sorry, never mind -- it works when I compile from your non-rebased branch so I must have introduced a problem while rebasing.)

longjon · 2014-10-28T20:50:55Z

@jeffdonahue, I suspect this is due to the addition of the --whole-archive linker option with the registration stuff.

Python layer doesn't really need to statically link against pycaffe; the problem is that dynamically loading _caffe.so when launching caffe causes protobuf's registration functions to be called twice. This actually isn't (wasn't?) a problem if caffe is only used through pycaffe, because then _caffe.so is loaded once, and statically links against the rest of caffe. (Not sure if caffe's new registration will affect that...) In fact, if a python layer net is invoked from python, using the statically linked pycaffe causes problems, because then there are two pycaffe modules around, and this breaks the dynamic rewriting of classes like Net.

In my local code, I link statically against _caffe.o, but load the _caffe module from _caffe.so iff the Python interpreter is running, which works in most cases (unless one tries to use the caffe_pb2 module from a Python layer, which also tries to re-register the protobuf classes).

So I think you've gathered by now that the linking situation is a confusing mess, and I don't have a straightforward answer right now. Actually my local version is very much in flux at the moment, so stay tuned, use the older branch if it suits your needs, and let me know if you run into more problems... or solutions.

shelhamer · 2014-10-30T21:46:47Z

@Yangqing it'd be great to have your thoughts on the linking. The registry does keep the layer code neat, but the Python layer makes layer prototyping a breeze and gives a lot of flexibility. Lacking the C++ tool chain knowledge, it could take me a lot of cups of coffee to figure this out.

Yangqing · 2014-10-30T22:36:19Z

Hmm, let me take a look tonight and see if I can find the problem... it seems that we may have mixed a few things in the source code (compiled and linked the same cc file twice, maybe?) and it's most likely not caused by the registry code.

Yangqing · 2014-10-31T16:43:36Z

So I think I've found the problem. See my comments in the code prefixed with "[Compilation]" for details.

Overall, my feeling is that we should really put all the PythonLayer related things into python_layer.hpp/cpp, and avoid the loop of having caffe depending on _caffe and _caffe depending again on caffe - I think this causes the double definition problem.

Yangqing · 2014-10-31T16:45:21Z

Makefile

@@ -103,6 +103,9 @@ OBJ_BUILD_DIR := $(BUILD_DIR)/src/$(PROJECT)
 LAYER_BUILD_DIR := $(OBJ_BUILD_DIR)/layers
 UTIL_BUILD_DIR := $(OBJ_BUILD_DIR)/util
 OBJS := $(PROTO_OBJS) $(CXX_OBJS) $(CU_OBJS)
+ifeq ($(USE_PYTHON_LAYER), 1)
+	OBJS += python/$(PROJECT)/_$(PROJECT).o


[Compilation] I think this causes the mutliple definition problem: python/caffe/_caffe.o is going to be linked into libcaffe.a because of this, and then when we make pycaffe, python/caffe/_caffe.cpp (which _caffe.o comes from) gets linked again - causing multiple definitions.

We should remove this line (together with other changes, see below).

Yangqing · 2014-10-31T16:53:46Z

OK I've finished my pass. @shelhamer @longjon please take another look.

It does not seem to be the registration problem - mostly because we indeed linked the same cpp file twice. Should be relatively easy to fix. Happy to chip in if any further problems emerge.

shelhamer · 2014-10-31T17:25:02Z

Great, thanks for the investigation @Yangqing. Note this isn't the latest
rebase, which is why the old factory code is there. I will try your
suggestion in my rebase on dev and report back.
On Fri, Oct 31, 2014 at 09:53 Yangqing Jia notifications@github.com wrote:

OK I've finished my pass. @shelhamer https://github.com/shelhamer
@longjon https://github.com/longjon please take another look.

It does not seem to be the registration problem - mostly because we indeed
linked the same cpp file twice. Should be relatively easy to fix.

—
Reply to this email directly or view it on GitHub
#1020 (comment).

longjon · 2014-11-01T01:28:48Z

It's a little more subtle than that. Before I go into the agonizing details though, do note that this PR is now quite out-of-date, and was not intended as a final linking solution.

The reason for linking against _caffe.o was not really to include PyBlob, which is gone in the latest version anyway. It's rather that the boost converters need to be set up before the Python layer can do anything, and that (normally) happens in the module initialization, which means that the caffe module needs to be initialized by Python layer.

This could be done just by having Python layer import caffe. However, this interferes with protobuf's registration. (Caffe links against the protobuf shared library. When you invoke the caffe binary with a network containing a Python layer, libprotobuf.so gets loaded twice: once by the caffe binary, and once when caffe gets imported, and this causes double-registration which crashes protobuf (https://code.google.com/p/protobuf/issues/detail?id=128).)

The hack that I used to work around this was: link statically against _caffe.o, giving access to the module initialization code. (The further hack, not in this PR right now, was to actually load from the shared library module when being invoked from within pycaffe, to fix a bug whose details are no longer important.) This worked fine (despite the double inclusion) until --whole-archive was added to the linking.

Actually now I think the double protobuf loading was probably due to the fact that the caffe module imports caffe_pb2. If this can be avoided, (at least when using Python layer), this may fix the issue. (Another way is to link against the static libprotobuf.a, but then one has to recompile protobuf with -fPIC, which we can't expect everyone to do.)

In any case I've already made a lot of changes which include not needing to link against _caffe.o (although importing caffe_pb2 from Python layer code will still cause a problem). I'm going to update this with a proposed mergeable version, but not for a couple weeks :)

longjon · 2015-01-10T09:23:17Z

Upgraded and replaced by #1703.

longjon mentioned this pull request Sep 1, 2014

Clean up pycaffe internals #1014

Merged

shelhamer added interface labels Sep 1, 2014

This was referenced Sep 12, 2014

Enable the users to disable optional dependencies #1074

Closed

On-the-fly net resizing, without reallocation (where possible) #594

Merged

shelhamer force-pushed the dev branch from 64258b6 to 403b56b Compare September 19, 2014 04:38

shelhamer added in progress and removed in progress labels Sep 21, 2014

longjon added 4 commits September 28, 2014 21:53

[pycaffe] add raw pointer constructor to PyBlob

ede8d58

This is necessary to allow Python to access Blobs from the layer interface, which takes raw pointers. (It does, however, mean that Python layers must not hold onto Blobs beyond their layer calls.)

[pycaffe] add wrapper for vector<bool>

d075e6f

This is needed for passing propagate_down to Python layers.

[pycaffe] small style fixes

9a09a28

add USE_PYTHON_LAYER build option to include Python libraries in caffe

ebf148b

This option links the standard caffe binaries, not just pycaffe, against the Python libraries, making it possible to embed Python in caffe.

longjon force-pushed the python-layer branch 3 times, most recently from 3205563 to b46ede2 Compare September 29, 2014 06:33

add PYTHON_LAYER for layers implemented by embedding Python

a48990d

longjon force-pushed the python-layer branch from b46ede2 to a48990d Compare September 29, 2014 06:39

longjon mentioned this pull request Oct 2, 2014

Python solver improvements #1196

Merged

shelhamer force-pushed the dev branch from d8eb4df to 914da95 Compare October 8, 2014 16:36

sergeyk force-pushed the dev branch from 2fb4c97 to 1718903 Compare October 17, 2014 18:44

longjon mentioned this pull request Oct 29, 2014

Supercharge Caffe's relationship with Python #1376

Open

6 tasks

Yangqing reviewed Oct 31, 2014
View reviewed changes

sguada mentioned this pull request Dec 9, 2014

Custom loss function #1543

Closed

shelhamer added the focus label Dec 30, 2014

longjon mentioned this pull request Jan 10, 2015

Reform the boost::python wrapper, including layers implemented in Python #1703

Merged

longjon closed this Jan 10, 2015

shelhamer mentioned this pull request Feb 11, 2015

How to update layer parameters from python? #1855

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python layer for rapidly writing nets in Python #1020

Python layer for rapidly writing nets in Python #1020

longjon commented Sep 1, 2014

akosiorek commented Sep 8, 2014

jeffdonahue commented Sep 22, 2014

shelhamer commented Sep 22, 2014

bhack commented Sep 22, 2014

jeffdonahue commented Oct 26, 2014

longjon commented Oct 26, 2014

jeffdonahue commented Oct 28, 2014

jeffdonahue commented Oct 28, 2014

longjon commented Oct 28, 2014

shelhamer commented Oct 30, 2014

Yangqing commented Oct 30, 2014

Yangqing commented Oct 31, 2014

Yangqing Oct 31, 2014

Yangqing commented Oct 31, 2014

shelhamer commented Oct 31, 2014

longjon commented Nov 1, 2014

longjon commented Jan 10, 2015

Python layer for rapidly writing nets in Python #1020

Python layer for rapidly writing nets in Python #1020

Conversation

longjon commented Sep 1, 2014

akosiorek commented Sep 8, 2014

jeffdonahue commented Sep 22, 2014

shelhamer commented Sep 22, 2014

bhack commented Sep 22, 2014

jeffdonahue commented Oct 26, 2014

longjon commented Oct 26, 2014

jeffdonahue commented Oct 28, 2014

jeffdonahue commented Oct 28, 2014

longjon commented Oct 28, 2014

shelhamer commented Oct 30, 2014

Yangqing commented Oct 30, 2014

Yangqing commented Oct 31, 2014

Yangqing Oct 31, 2014

Choose a reason for hiding this comment

Yangqing commented Oct 31, 2014

shelhamer commented Oct 31, 2014

longjon commented Nov 1, 2014

longjon commented Jan 10, 2015