Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Drake's fork of pybind11 does not interfere with downstream users #7856

Closed
1 of 2 tasks
EricCousineau-TRI opened this issue Jan 25, 2018 · 20 comments
Closed
1 of 2 tasks
Assignees
Labels
component: pydrake Python API and its supporting Starlark macros priority: medium type: bug

Comments

@EricCousineau-TRI
Copy link
Contributor

EricCousineau-TRI commented Jan 25, 2018

Per conversation with @gizatt and @jamiesnape:

Two situations that may affect downstream users (see examples below for illustration):
(a) there may be an ABI incompatibility between our pybind11 version and any upstream versions that could cause segfaults, and
(b) downstream C++ libraries (which use Drake C++ in its public interface) that want to generate Python bindings should use our version of pybind11.

  • The proper solution to (a) is ensuring that pybind11 is robust against loading bindings that use different pybind11 versions, but do not mix C++ APIs.
    I've asked a quick question on pybind11's Gitter (may transform it to an issue) on how well different versions of pybind11 play with each other when being loaded by the interpreter:
    https://gitter.im/pybind/Lobby?at=5a69f79ac95f22546de3e53c
    (It would really suck to force all libraries using pybind11 be compiled with the same version, if their C++ APIs do not mix.)

UPDATE: If pybind11s version does not matter, and the combination of internals version + type is truly the only indicator of the structure of the internals, then the solution is for me to ensure our fork uses a different number than pybind11.

  • The A solution to (b) is to redistribute our fork of pybind11, reverting Do not install pybind11 headers #7805. The motivating factor is less convenience, and more that there should be no incompatibility. Resolved by exposing the information in Python at runtime. Alternative is to parse .../pybind11/repository.bzl.

Ultimately, this would not be an issue if we can use upstream pybind11, either by:

  • Working around the features that we want which motivate our fork, and use pybind11 at its present state - I do not believe this is an option at the moment.
  • Upstream our features. This is in progress, with outcome indeterminate (not sure if these features are useful to the broader community?)

Example of (a)

A user has their own library custom (which does not consume Drake C++) that they've written pybind11 bindings for, naming it pycustom. They want to write a Python script that consumes both pycustom and pydrake.
There may be a possibility that they get a segfault, due to ABI incompatibility in pybind11s internals; they're being penalized for things they didn't even care about in the first place.

Example of (b)

A spartan user wants to write Python bindings of C++ functionality which consumes Drake in its public interface.

In this case, they must consume the same version of pybind11, such that they can access the registered C++ types to mix Drake C++, pydrake, their library's C++ API, and their Python bindings.

@EricCousineau-TRI
Copy link
Contributor Author

@clalancette @basicNew @mikaelarguedas From looking at the discussion in #7805, it looks like you may be part of situation (b).

Can I ask if you have ran into any of these such issues?
(Right now, my evaluation is speculation based on having dealt with the internals of pybind. I believe it to be accurate, but have not fully tested it yet.)

@jamiesnape
Copy link
Contributor

The solution to (b) is to redistribute our fork of pybind11, reverting #7805. The motivating factor is less convenience, and more that there should be no incompatibility.

Or they use ExternalProject_Add on that repo per the suggestion in #7805.

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Jan 25, 2018

Or they use ExternalProject_Add on that repo per the suggestion in #7805.

My concern is that they then have the same issue as the original problem with consuming Drake as a Bazel external: they now must ensure that they are in lock-step with Drake's version of pybind11 (introducing instability).

That being said, I'd be OK with a CMake macro that says "give me Drake's version of pybind11" (which could wrap ExternalProject_Add, or just provide the URL (w/ mirrors), Git SHA, and checksum), as long as Drake provides this information.

If there are complications in providing that macro, then I feel we should provide it as part of the Bazel install.

@jamiesnape
Copy link
Contributor

My concern is that they then have the same issue as the original problem with consuming Drake as a Bazel external: they now must ensure that they are in lock-step with Drake's version of pybind11 (introducing instability).

Effectively upgrading pybind11 automatically with drake may introduce instabilities in their Python bindings, so really you end up choosing one instability over the other.

@clalancette
Copy link
Contributor

@EricCousineau-TRI Right, delphyne did run into the problem that now that drake doesn't install the pybind11 headers, we can no longer consume them. For now, I have a PR open to vendor pybind11 into delphyne, using exactly the same commit version that drake is using.

@EricCousineau-TRI
Copy link
Contributor Author

Effectively upgrading pybind11 automatically with drake may introduce instabilities in their Python bindings, so really you end up choosing one instability over the other.

If Drake is being consumed directly (which I believe is the case for (b)-style situations), then I would prefer instability from upgrading - because without these upgrades, interfacing a downstream C++ library with Drake C++, and have that work with pydrake and their pybind bindings may be harder to catch.

In making my fork, I ensure that I do not break any previous unittests on Linux (Windows is another story, but that can be fixed if need be);
If the fork does break backwards incompatibility, it's generally something esoteric (e.g. what was counted as a copy becomes a move, like this change, or the number of references to a Python class might increase due to a custom __del__ injection).

For now, I have a PR open to vendor pybind11 into delphyne, using exactly the same commit version that drake is using.

Gotcha. I would like it if y'all didn't have to ensure that you monitored and bumped the version yourself, if you don't need to fork off of what Drake is using.

@EricCousineau-TRI EricCousineau-TRI self-assigned this Jan 25, 2018
@clalancette
Copy link
Contributor

Gotcha. I would like it if y'all didn't have to ensure that you monitored and bumped the version yourself, if you don't need to fork off of what Drake is using.

Agreed. While I understand the point of view that Drake shouldn't necessarily be providing everything that any project might need to build, I think this might be a case where things are pretty intimate between drake and its consumers. For instance, what would happen if Drake were using pybind11-version1 for the Drake bindings, and Delphyne were using pybind11-version2 for its own bindings (which called into the drake bindings)? Would that continue to work OK if version2 had some ABI-incompatible changes? I'm guessing it would have problems, but I don't really know the details here.

@jamiesnape
Copy link
Contributor

jamiesnape commented Jan 29, 2018

The other issue, given that pybind11 is becoming fairly popular is what if a third package is using pybind11-version3, say. Does that rule out using it with Drake's Python bindings, Delphyne's, or those of any project that builds its bindings with pybind11? If pybind11 versioning is that sensitive, it starts to get problematic.

@EricCousineau-TRI
Copy link
Contributor Author

@RussTedrake mentioned wanting to start adding some custom C++ code to the underactuated code. Placing CMake solution for (b) at medium (can up it if so desired).

@jamiesnape
Copy link
Contributor

He just needs to use ExternalProject_Add. It is a CMake project already and that is the CMake solution for (b).

@EricCousineau-TRI
Copy link
Contributor Author

He just needs to use ExternalProject_Add. It is a CMake project already and that is the CMake solution for (b).

Sorry, to clarify, I meant that for increasing stability, it'd be nice if they could use CMake variables, rather than manually synchronize it with //tools/workspace/pybind11:repository.bzl, with ExternalProject_Add.

@jamiesnape
Copy link
Contributor

With caching of find_package results and the fact that ExternalProject_Add executes during the build rather than the configure, that is not likely to work well.

@EricCousineau-TRI
Copy link
Contributor Author

Is there a way to overcome that? (That seems like a pretty bad defect to me.)
Can we instead provide a CMake file to include, such that modification of that file will trigger a rebuild?

@jamiesnape
Copy link
Contributor

jamiesnape commented Feb 19, 2018

(That seems like a pretty bad defect to me.)

Not really, you don't want to search around the whole file system every time you run CMake.

Can we instead provide a CMake file to include, such that modification of that file will trigger a rebuild?

That would not trigger a rebuild of a pybind11 external project, only a parent project. Moreover, in the (fairly common) case that drake is pulled in as an external project too, the variable would be a parallel (instead of parent) scope anyway, so would not be useful.

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Feb 20, 2018

I'm not sure if I fully understood all of your points, so I hacked together these two PRs:
#8107
RobotLocomotion/drake-external-examples#95

I had meant that there'd be a macro that'd be included directly into the downstream project. If the URL / SHA256 is changed, will ExternalProject invalidate the current target and re-build it?
Or do we need additional instrumentation for it?

Even if it doesn't properly invalidate the external project based on Drake's version information, I'm fine if a clean build provides the correct results, and if we tackle hermiticity in CMake later.

@clalancette
Copy link
Contributor

@EricCousineau-TRI Sorry I didn't see this earlier.

To be totally clear, even if this functionality is available in drake, delphyne probably will not use it. We are currently using https://github.com/dirk-thomas/vcstool to manage our dependencies, not CMake ExternalProject. I think the functionality that you are developing may be useful regardless, but I just wanted to be clear that we wouldn't be using it in the short-term.

@EricCousineau-TRI
Copy link
Contributor Author

No worries! If anything, it is possible to write a quick script to compare your vcs config file, and ensure it's consistent with drake/tools/workspace/pybind11/repository.bzl? (possibly just a quick Python script to handle it?)

@clalancette
Copy link
Contributor

@EricCousineau-TRI Good idea. I've opened an issue on our issue tracker about it: maliput/delphyne#267

@EricCousineau-TRI
Copy link
Contributor Author

@sloretz might've hit this here:
sloretz/mre_crash_drake_addsystem@8017550

From Slack:

Figured out the cause and created an example. In short if the example is built using the version of pybind11 shipped with Drake then there's no crash, but if it's built with the version shipped with ROS then there is a crash.
sloretz/mre_crash_drake_addsystem#1
I'm not sure if this should have been obvious to me that it wouldn't work. I'm guessing there's an ABI difference with some pybind11 type wrapping the LeafSystem<double> class, but if so my mental model of pybind11 isn't complete enough to know what that type would be.

Dunno the best solution in the case of ROS (where ideally we don't have to rebuild too much...), but at a minimum, Drake's fork should perturb the internal pybind11 numbers to force a (hopefully) less mysterious issue about incompatibility.

@jwnimmer-tri
Copy link
Collaborator

We shouldn't have a fork anyway. Duplicate of #5842.

@jwnimmer-tri jwnimmer-tri closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pydrake Python API and its supporting Starlark macros priority: medium type: bug
Projects
None yet
Development

No branches or pull requests

4 participants