Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memory leaks when casting raw pointers to python types #2742

Open
bstaletic opened this issue Dec 24, 2020 · 6 comments
Open

[BUG] Memory leaks when casting raw pointers to python types #2742

bstaletic opened this issue Dec 24, 2020 · 6 comments
Labels
bug casters Related to casters, and to be taken into account when analyzing/reworking casters

Comments

@bstaletic
Copy link
Collaborator

bstaletic commented Dec 24, 2020

Issue description

Once again I've been playing with my toys - valgrind, pybind11@ #2741 and CPython 3.9.1

Good news: We're almost there. Almost a clean run.
Bad news: Some casters leak like a broken faucet.

More specifically, py::cast(new int{}) leaks and so do all the other casters that copy the data - list_caster from stl.h is another example. Another thing that leaks is new CustomClass{} with py::return_value_policy::move.

While py::cast(new int{}); could be argued that it's a user error, that argument doesn't take into account the following:

m.def("f", []{ return new int{};});
m.def("g", []{ return new Class{};}, py::return_value_policy::move);

Even our docs have return new Example(); and "don't mix things that don't play well together" in this case is very subtle of a contract to introduce after so many years of people assuming that it just works (tm).

Reproducible example code

#include <pybind11/pybind11.h>
struct s{};
PYBIND11_MODULE(test, m) {
    pybind11::class_<s>(m, "s");
    m.def("leak1", []{ return new int{}; });
    m.def("no_leak", []{ return new s{}; });
    m.def("leak2", []{ return new s{}; }, pybind11::return_value_policy::move);
}

Calling all three functions results in these valgrind errors:

==7672== HEAP SUMMARY:
==7672==     in use at exit: 811,198 bytes in 5,907 blocks
==7672==   total heap usage: 32,402 allocs, 26,495 frees, 4,805,059 bytes allocated
==7672== 
==7672== 1 bytes in 1 blocks are definitely lost in loss record 2 of 1,971
==7672==    at 0x483ADEF: operator new(unsigned long) (vg_replace_malloc.c:342)
==7672==    by 0x5B58637: pybind11_init_test(pybind11::module_&)::{lambda()#3}::operator()() const (foo.cpp:7)
==7672==    by 0x5B59BD5: s* pybind11::detail::argument_loader<>::call_impl<s*, pybind11_init_test(pybind11::module_&)::{lambda()#3}&, , pybind11::detail::void_type>(pybind11_init_test(pybind11::module_&)::{lambda()#3}&, std::integer_sequence<unsigned long>, pybind11::detail::void_type&&) && (cast.h:2022)
==7672==    by 0x5B594A9: std::enable_if<!std::is_void<s*>::value, std::is_void>::type pybind11::detail::argument_loader<>::call<s*, pybind11::detail::void_type, pybind11_init_test(pybind11::module_&)::{lambda()#3}&>(pybind11_init_test(pybind11::module_&)::{lambda()#3}&) && (cast.h:1994)
==7672==    by 0x5B5915C: void pybind11::cpp_function::initialize<pybind11_init_test(pybind11::module_&)::{lambda()#3}, s*, , pybind11::name, pybind11::scope, pybind11::sibling, pybind11::return_value_policy>(pybind11_init_test(pybind11::module_&)::{lambda()#3}&&, s* (*)(), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::return_value_policy const&)::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call) const (pybind11.h:184)
==7672==    by 0x5B591C3: void pybind11::cpp_function::initialize<pybind11_init_test(pybind11::module_&)::{lambda()#3}, s*, , pybind11::name, pybind11::scope, pybind11::sibling, pybind11::return_value_policy>(pybind11_init_test(pybind11::module_&)::{lambda()#3}&&, s* (*)(), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::return_value_policy const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) (pybind11.h:161)
==7672==    by 0x5B67099: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (pybind11.h:717)
==7672==    by 0x498F13D: cfunction_call (methodobject.c:539)
==7672==    by 0x493C463: _PyObject_MakeTpCall (call.c:191)
==7672==    by 0x4A4962F: _PyObject_VectorcallTstate (abstract.h:116)
==7672==    by 0x4A4CF66: PyObject_Vectorcall (abstract.h:127)
==7672==    by 0x4A4CF66: call_function (ceval.c:5072)
==7672==    by 0x4A5946C: _PyEval_EvalFrameDefault (ceval.c:3487)
==7672== 
==7672== 4 bytes in 1 blocks are definitely lost in loss record 3 of 1,971
==7672==    at 0x483ADEF: operator new(unsigned long) (vg_replace_malloc.c:342)
==7672==    by 0x5B58601: pybind11_init_test(pybind11::module_&)::{lambda()#1}::operator()() const (foo.cpp:5)
==7672==    by 0x5B59B65: int* pybind11::detail::argument_loader<>::call_impl<int*, pybind11_init_test(pybind11::module_&)::{lambda()#1}&, , pybind11::detail::void_type>(pybind11_init_test(pybind11::module_&)::{lambda()#1}&, std::integer_sequence<unsigned long>, pybind11::detail::void_type&&) && (cast.h:2022)
==7672==    by 0x5B593D5: std::enable_if<!std::is_void<int*>::value, std::is_void>::type pybind11::detail::argument_loader<>::call<int*, pybind11::detail::void_type, pybind11_init_test(pybind11::module_&)::{lambda()#1}&>(pybind11_init_test(pybind11::module_&)::{lambda()#1}&) && (cast.h:1994)
==7672==    by 0x5B58D48: void pybind11::cpp_function::initialize<pybind11_init_test(pybind11::module_&)::{lambda()#1}, int*, , pybind11::name, pybind11::scope, pybind11::sibling>(pybind11_init_test(pybind11::module_&)::{lambda()#1}&&, int* (*)(), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call) const (pybind11.h:183)
==7672==    by 0x5B58DAF: void pybind11::cpp_function::initialize<pybind11_init_test(pybind11::module_&)::{lambda()#1}, int*, , pybind11::name, pybind11::scope, pybind11::sibling>(pybind11_init_test(pybind11::module_&)::{lambda()#1}&&, int* (*)(), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) (pybind11.h:161)
==7672==    by 0x5B67099: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (pybind11.h:717)
==7672==    by 0x498F13D: cfunction_call (methodobject.c:539)
==7672==    by 0x493C463: _PyObject_MakeTpCall (call.c:191)
==7672==    by 0x4A4962F: _PyObject_VectorcallTstate (abstract.h:116)
==7672==    by 0x4A4CF66: PyObject_Vectorcall (abstract.h:127)
==7672==    by 0x4A4CF66: call_function (ceval.c:5072)
==7672==    by 0x4A5946C: _PyEval_EvalFrameDefault (ceval.c:3487)

In case people find it easier to parse the embedded interpreter version:

#include <pybind11/pybind11.h>
#include <pybind11/embed.h>

int main() {
    pybind11::cpp_function("f", []{ return new int{}; })(); // Remember to call the function
}

According to @YannickJadoul's analysis, this happens because the affected casters don't end up owning the returned pointer. The casters copy the data and end up owning the copied data, but leak the returned pointer.

@YannickJadoul
Copy link
Collaborator

Sososo, as far as I can see, this is because the type_caster for int copies anyway and ignores the return_value_policy. After all, even when take_ownership is passed (or the default automatic, with a pointer), the pybind11 object doesn't take ownership of the pointer, because its content is copied.
Arguably, users should know these casters are special (in a similar way as an int*/int& not actually mutating the Python object), and should know that return_value_policies don't make sense for ints. There have been bug reports before about reference_internal for such custom caster types not keeping the object alive.
There's something to be said for @bstaletic's complaints, though, and we should at the very least warn in debug mode, I think, if the user passes a return_value_policy that seems to indicate the object's ownership is transferred.

Similar for return_value_policy::move: the object is moved out, but here, pybind11 never promises to take ownership, anyway. So arguably, this is a mistake? But not-very-obvious one, yes, so at least some docs or diagnostics might be nice. pybind11's own tests make this mistake, so...

@bstaletic
Copy link
Collaborator Author

In total, pybind11's tests leak about 5MB in total. The rvp::move was found after valgrind complained about this line:

https://github.com/pybind/pybind11/blob/master/tests/test_copy_move.cpp#L217

@bstaletic bstaletic added bug casters Related to casters, and to be taken into account when analyzing/reworking casters labels Dec 24, 2020
@YannickJadoul
Copy link
Collaborator

So, current thoughts:

  • Returning a raw pointer and expecting pybind11 to take ownership when you specify return_value_policy::move seems wrong. Except for that one test, I don't believe pybind11 suggest anywhere that this would work? Also, we can't delete it (if case users use it correctly), and we can't fail on seeing return_value_policy::move (again, in case a user uses it correctly, passing a pointer but expecting the object to be moved-out of).

  • Returning a raw int/std::tuple/.. pointer to a custom, copying caster, with automatic or take_ownership, hmmmm, already more tricky. Again, I would argue this is at least not fully unexpected, since these kind of custom casters that copy do not behave like others. Then again, it should then just not accept return_value_policy::take_ownership but ignore it.
    The main issue here is the following, though: suppose we say "OK, we'll take ownership, copy that int (or whatever, let's go with int here), then delete it. How are you deleting it? You didn't have a chance to register a custom holder class that tells you how to match the new/malloc/my_custom_alloc/... ?
    So I'd argue we can't really take ownership. Maybe we should just raise an error if a pointer is returned and it seems like someone's expecting pybind11 to take ownership? Maybe only in debug, maybe at registration, maybe when the function gets called. But then what about the cases where a int * was returned as a reference, not meaning to transfer ownership? We'd break existing code.

@bstaletic
Copy link
Collaborator Author

Returning a raw pointer and expecting pybind11 to take ownership when you specify return_value_policy::move

Note that rvp::copy also leaks if the returned pointer is owning.

Again, I would argue this is at least not fully unexpected, since these kind of custom casters that copy do not behave like others

That's something you and I understand. For most users it's a very nice automagic conversion that just works and does what the user wants. I doubt a lot of users actually understand the distinction you're drawing here.

"OK, we'll take ownership, copy that int (or whatever, let's go with int here), then delete it. How are you deleting it?

That is a big problem. My idea would be an extra argument for passing on a deleter. If no deleter was passed, keep doing what we're doing now.

But then what about the cases where a int * was returned as a reference, not meaning to transfer ownership? We'd break existing code.

Yup... Deleter argument, anyone?

@YannickJadoul
Copy link
Collaborator

Note that rvp::copy also leaks if the returned pointer is owning.

Same story as move, though? pybind11 never said it's taking ownership, so it's not.

That's something you and I understand. For most users it's a very nice automagic conversion that just works and does what the user wants. I doubt a lot of users actually understand the distinction you're drawing here.

Yeah, OK, but .. well, it is important to know, also in cases where you take that argument.

"OK, we'll take ownership, copy that int (or whatever, let's go with int here), then delete it. How are you deleting it?

That is a big problem. My idea would be an extra argument for passing on a deleter. If no deleter was passed, keep doing what we're doing now.

But then what about the cases where a int * was returned as a reference, not meaning to transfer ownership? We'd break existing code.

Yup... Deleter argument, anyone?

That also won't fix existing code, though. How is this different from e.g. adding a note to the docs, and offering "wrap it in a unique_ptr, in this case"?

@bstaletic
Copy link
Collaborator Author

Same story as move, though? pybind11 never said it's taking ownership, so it's not.

Yeah, just making sure it's mentioned for others to read it.

That also won't fix existing code, though. How is this different from e.g. adding a note to the docs, and offering "wrap it in a unique_ptr, in this case"?

That's a good point. I could imagine a way that could save a few instructions with my proposal, but I agree that unique_ptr is simpler and more idiomatic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug casters Related to casters, and to be taken into account when analyzing/reworking casters
Projects
None yet
Development

No branches or pull requests

2 participants