Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Python/C++ interface #2750

Closed
wangkuiyi opened this issue Jul 5, 2017 · 10 comments
Closed

New Python/C++ interface #2750

wangkuiyi opened this issue Jul 5, 2017 · 10 comments

Comments

@wangkuiyi
Copy link
Collaborator

I read and followed this article http://intermediate-and-advanced-software-carpentry.readthedocs.io/en/latest/c++-wrapping.html, which compares the following interfacing technology:

  1. manual wrapping. I followed this official Python document for more details: https://docs.python.org/2/extending/extending.html for some example programs. There includes some complex boilerplate code -- parsing argument in each C function, build and return Python object in each C function, and the method list.

  2. SWIG. It seems a general method that can generate bindings for various client languages, but not "native" enough. Also, it takes some time to learn the interfacing language (*.i files).

  3. ctypes. This requires us to respecify the return type and other meta-data about each C function at the Python side, again.

  4. SIP. This is the Qt community's version of SWIG. We also need to learn an interfacing language.

  5. Boost.Python. This is some C++ templates that simplify the manual wrapping. We no longer need to write a C wrapper function for each C++ function. Only a few extra lines in addition to the original C++ code are required to build a .so that can be called from Python.

I personally prefer Boost.Python. Here is an example for your reference:

Suppose that we already have C++ functions like:

char const* greet() {
   return "hello, world";
}

only the following few lines is required to build the Python-callable .so file:

#include <boost/python.hpp>

BOOST_PYTHON_MODULE(hello_ext) {
  boost::python::def("greet", greet);
}
@Superjomn
Copy link
Contributor

Superjomn commented Jul 6, 2017

boost.python is simpler than SWIG, but it seems that it only supports python while SWIG supports more language wrappers.

For sure, we may only support Python in a long time, consider time and workload, but some choices need to be considered:

  • if use boost.python, it is hard to support multiple language wrappers.
  • boost.python is a more native wrapper for Python compared to SWIG, so if we use boost.python, it is easier to make a python-first system, in other words, better Python APIs (something like defining customize op/layer in Python? ) than TF/caffe2.

In short, boost.python for creating better Python APIs like PyTorch while SWIG for multiple language wrappers.

@typhoonzero
Copy link
Contributor

Boost.Python. This is some C++ templates that simplify the manual wrapping. We no longer need to write a C wrapper function for each C++ function. Only a few extra lines in addition to the original C++ code are required to build a .so that can be called from Python.

  1. We need muli-language wrappers, at least Python and Go, because we'll need to call Tensors and Ops in pserver side to do remote parameter optimization.
  2. A c wrapper is also needed for high-performance online inference.

So I think we can't avoid making a c wrapper at last, once we have it, implement python extension is simple.

@wangkuiyi
Copy link
Collaborator Author

What is supposed to be included in the Go API?

I my mind, the Go API differs from the Python API significantly. In particular, we don't need Go packages like paddle.op, paddle.layer, paddle.scope, paddle.variable. Is this correct? @typhoonzero

@jacquesqiao
Copy link
Member

There are another lib https://github.com/pybind/pybind11 that just work like Boost.python, but is much more lightweight, If we will consider Boost.python, we can also have a look at this lib.

@typhoonzero
Copy link
Contributor

What is supposed to be included in the Go API?

At least paddle.variable or paddle.tensor I think. The current implementation of go pserver and optimizer use an independent implementation of tensor, at paddle/optimizer/tensor.h, better to use the new implement so we don't have two "tensor"s in the code base.

We don't need Go packages like paddle.op, paddle.layer, paddle.scope, paddle.variable indeed. Making parameter server as an "op" like tensorflow isn't what we intended to.

@dzhwinter
Copy link
Contributor

dzhwinter commented Jul 7, 2017

consider that we just write operator in C++ and generate them in python. We cannot find a proper language binding generator library/generate technical. Maybe it is too hard to generate OP for Go at present.
I think we can just build a core system in c++, strong binding with python. other languages invoke functions from C-API binding.
Agree with @jacquesqiao, according to pybind11's doc pybind11 pybind11 Similar to Boost.Python, but with a lean header-only implementation for C++11-capable compilers. It may be a better choice if we only consider the python c++ binding things.

ctypes is not a good choice. Not only we need to write every function again in python side, but also it makes python binding tedious to maintain/upgrade. e.g., mxnet, choose ctypes in the very beginning, but they maintain another logic in cython nowadays.

@reyoung
Copy link
Collaborator

reyoung commented Jul 10, 2017

Whether uses C-API depends on is there any other languages need invoke Paddle C++ Core or not.

I am not sure only Python API is enough or not. At least there are several needs for us to give a C-API.

  • Android/IOS API. Give a Java/Objective-C wrapper for Paddle.
  • Go API. Needs for our Parameter Server.

Also, I think pybind11 is better than boost::Python because Paddle is in C++ 11.

But if we have a C-API for Paddle, wrap that C-API to Python is extremely easy by Cython.

cdef extern from "math.h":
    double sin(double x)

@reyoung
Copy link
Collaborator

reyoung commented Jul 10, 2017

Also, pybind11 and boost::Python has a very major defect. It enforces the compiling Python version and running Python version EXACTLY SAME. It means if Paddle is compiled with Python 2.7.2 but run with Python 2.7.3, an error will be raised.

See video here

@reyoung
Copy link
Collaborator

reyoung commented Jul 10, 2017

@wangkuiyi and all,

I write two demos, one used pybind11, other used Cython+C-API. They are:

The conclusion is:

  • If we don't want a C-API, PyBind11 is simpler. I think PyBind11 has a great design of interface personally. I barely do not have to look at its documents to develop this demo. But there is one thing need to be careful, the ownership of C++ object.

  • If we have a C-API, use Cython is extremely easy. Just include the header and write some interface code. Cython has a great interface, too(See here). However, developing a C-API is noising.

@reyoung
Copy link
Collaborator

reyoung commented Jul 11, 2017

Fixed by #2793

@reyoung reyoung closed this as completed Jul 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants