-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plug leaking function_records in cpp_function initialization in case of exceptions (found by Valgrind in #2746) #2756
Plug leaking function_records in cpp_function initialization in case of exceptions (found by Valgrind in #2746) #2756
Conversation
76e7605
to
f5ec1cc
Compare
So, this should be ready to review:
The one question I now have, is whether this fix is worth it. As far as I know, all of these leaks are actually errors by the pybind11 user, and can't/shouldn't happen during actual use, if the library with pybind11 is correctly designed and programmed. So do we want to incur this overhead? |
Hi @YannickJadoul, Is this sentence in the PR top comment still true?
What do you think about the diff below, to make it easier for the compiler/optimizer to see that a runtime bool works here?
|
Nope that's what I fixed, just today :-) Thanks, I fixed the original message!
Hmm, not sure what you mean. In my version it is a compile-time bool, which should be easier to optimize, no? That being said, your version should result in a smaller size overhead (at the cost of that extra runtime check), if we make sure that |
The overhead for the runtime check is almost certainly not measurable. |
Alright, yes. Given other decisions in pybind11, prioritizing space over a tiny bit of performance makes sense to me! I've pulled this to runtime. (I have kept the default argument |
@henryiii, @rwgk, @EricCousineau-TRI, can I gently ping? This is the last thing standing between us, and rebasing and merging #2746 and soon having Valgrind checks run on all PRs :-) |
Hi @YannickJadoul, a few days ago I offered to run this PR through Google's global testing system, but you discouraged it. I'd happily approve this PR after convincing myself that the tests come back clean. Fully absorbing all the nuances of this very hairy change would cost me more time than I can give it. I trust that you and @bstaletic have done a great job. Given that we have comprehensive testing including sanitizers in place, getting more heads to look at the details doesn't seem like a productive division of labor. Please let me know when it this is a good time to globally test this PR again. |
9942614
to
5febe3a
Compare
Well, you asked that for #2746, and I informed you that this code was already tested (except for the last few additional commits that were not made because tests fail, but because reviewing the code found a logical corner case for leaks that wasn't covered by tests; not at Google either, since you said ASAN/LSAN came back clean?). Also, #2746 isn't supposed to merge this functionality; I'll rebase it on top of this, and #2746 will be a bare-bones change to CI and CMake. It's the central part of our code, so the successful route is tested by all our tests. I don't believe more tests would tell us much more, since the nature of this PR is fixing leaks in exceptional cases (most often programming mistakes, actually). Though I'm of course not stopping you to run tests! (They're just not the right tool to fully assess this PR, I think.) I was mainly asking for one or two more reviews and approval, so we can merge this and get full Valgrind runs in CI (like @bstaletic's, but he was involved in creating this, so I guess that only counts as half an approval?). I'm sure that before 2.6.2 is released, you'll still run Google's tests anyway? |
5febe3a
to
a7ebdd5
Compare
I look at it as a full approval: you still have two heads looking at the same thing.
Running tests for one thing at a time has a lot of value: easier to find root causes. |
Also: what the hell is going on with our CI/GitHub Actions?
We're not running |
That's a good argument, yes. But as argued, in this case, I'm not sure the machines can tell us anything more than we already know :-/ (unless there's specialized tests that try to stress test pybind11's exceptions?) |
Anyway, go ahead, @rwgk. But given that you already tested the logic (all that was added since is an extra cleanup), I'm not expecting anything there. But if it convinces you we can merge this, then why not. |
a7ebdd5
to
be90910
Compare
OK, fixed by rebasing onto #2790. The remaining failures are still #2774, so unrelated. |
Huh, that's funny. So it's not flagging the one issue we couldn't really track down in #2746, and had to work around? (c183120) Maybe/hopefully it's a Valgrind fluke, then, but it was quite consistent, so not sure I'm believing that :-/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to get this in.
Nope. I ran ASAN with pretty strict options, but MSAN only with defaults. I could play more with the options ... later! Let's get this in as-is asap IMO, pending only on the results of the global testing run. |
Thanks, @henryiii :-)
No worries. It's the one thing @bstaletic and I couldn't figure out. So the plan is to get Valgrind in, with the workaround, then undo the workaround in another PR that can serve as bug report. So I propose we keep discussion of this one thing for that one undoing-the-workaround PR? |
…nt in destruct(function_record *)
be90910
to
1c20521
Compare
The global testing came back clean. @YannickJadoul, please merge! This PR was already extremely useful for my work on #2672, to sanitize my new code (I found a couple bugs already). |
FYI: After rolling out this PR Google-internal, our testing system discovered this (very minor) leak: |
Interesting; if you do find out anything relevant or need to link back to these changes, I'm very interested. |
Description
Another leak fix taken out of #2746.
I am not entirely happy with this solution yet, mostly because we are not reusing(kind of fixed that)destruct(function_record *)
. I tried adding this as theunique_ptr
's deleter, but the issue is that thechar *
s only get copied along the way, and are not owned before those copies.Also,(fixed that)rec->data
can still leak asrec->free_data
is not called after an exception.Suggested changelog entry: