Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 'inactive_override_cache' key #2772

Conversation

EricCousineau-TRI
Copy link
Collaborator

@EricCousineau-TRI EricCousineau-TRI commented Jan 7, 2021

Description

Avoid false cache hits for virtual overrides by using __qualname__ as the hashing key. Change storage to std::string to avoid memory management mishaps.

Supersedes #2092 - just merge conflict resolution; I'll address any further changes.
Resolves #1922

Suggested changelog entry:

Avoid false cache hits for virtual overrides by using ``__qualname__`` as the hashing key. Change storage of key to use ``std::string`` to avoid memory management mishaps.

\cc @BetsyMcPhail

@EricCousineau-TRI
Copy link
Collaborator Author

EricCousineau-TRI commented Jan 7, 2021

@YannickJadoul / @rwgk Would one of y'all be up for reviewing this?

Instead of just the function name, use '__qualname__.name'.
@EricCousineau-TRI EricCousineau-TRI force-pushed the feature-override-cache-qualname branch from 0e080f5 to 24e6f88 Compare January 7, 2021 23:06
@@ -97,7 +97,7 @@ struct internals {
type_map<type_info *> registered_types_cpp; // std::type_index -> pybind11's type information
std::unordered_map<PyTypeObject *, std::vector<type_info *>> registered_types_py; // PyTypeObject* -> base type_info(s)
std::unordered_multimap<const void *, instance*> registered_instances; // void * -> instance*
std::unordered_set<std::pair<const PyObject *, const char *>, override_hash> inactive_override_cache;
std::unordered_set<std::pair<const PyObject *, std::string>, override_hash> inactive_override_cache;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er... mayhaps this change requires a bump on the internal version.

I can't think of an easy to way to keep const char* around, as I'm unaware of what happens with Python strings 'n such...
Ideas?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely needs to bump that, yes :-/ That does complicate things slightly, release-wise?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, not going to do that until the next release, at least. We should bundle all such changes together and then do a bump, I think. Also, I thought standard library templated classes were not valid in an ABI (@jpivarski)? I take it this is why we have so many issues with the ABI? Could we redesign this to just be simple classes and do our own memory management here?

Copy link
Collaborator Author

@EricCousineau-TRI EricCousineau-TRI Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that could be done! I'll check out Boris's suggestion in Betsy's original PR, tho, and see if that gives us const char* 🤞

From Boris:

This should probably use detail::get_fully_qualified_tp_name (from #2520) [...]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dern, looking at #2520, it's std::string. But yeah, I can see about doing some naive memory management.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: for now, if we can keep the ABI the same, that's best. Eventually we will need a bump. In the future, maybe we can design internals that work cross-compiler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Cool, I'll re-open and re-scope #2773 to capture this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an "abi break" - very-brighly colored - label ( abi break ). Let's use this to indicate PRs that break the ABI and should be pooled together?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henryiii Agreed that it would be amazing if we can get rid of these and use something that's basically shared between all compilers (guess that's POD types and extern "C"?). Just curious if ríght now, we have a problem when using STL in the internals, or whether that's at least +- safely worked around.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current workaround is fine, but something we'd like to possibly redesign to enable wider compatibility in the future, perhaps. In the current design, std::string should be fine, I think it's not worse than unordered map or anything else. I know std::vector is not allowed, so guessing pretty much everything else is out. Somewhere I think @bstaletic just listed the options, which I think is what I was thinking; either we do everything ourselves, or we pay a cost. (Though I feel like we can probably do pretty well). But also haven't really looked at this part of the code much at all, mostly pretty general observations ATM.

@YannickJadoul
Copy link
Collaborator

@YannickJadoul / @rwgk Would one of y'all be up for reviewing this?

Yes! Tomorrow or during the weekend good enough?

@YannickJadoul YannickJadoul self-requested a review January 7, 2021 23:17
@YannickJadoul
Copy link
Collaborator

Note to self and others: also look at review comments in #2092, and in particular @bstaletic's #2092 (comment) (so @bstaletic should probably also review this again).

@EricCousineau-TRI
Copy link
Collaborator Author

Note to self and others: also look at review comments in #2092 [...]

🤦 Good point! I'll take a gander there!

Yes! Tomorrow or during the weekend good enough?

Yup, sounds awesome!

@EricCousineau-TRI
Copy link
Collaborator Author

EricCousineau-TRI commented Jan 8, 2021

Hm... mystified by this error: https://github.com/pybind/pybind11/pull/2772/checks?check_run_id=1665921586#step:11:70

  capfd = <_pytest.capture.CaptureFixture object at 0x000002417F72AD00>
  
      def test_err(capfd):
          msg = "Something that should not show up in log"
          stream = StringIO()
          with redirect_stderr(stream):
              m.raw_err(msg)
          stdout, stderr = capfd.readouterr()
          assert stdout == ""
  >       assert stderr == msg
  E       AssertionError: assert '' == 'Something th...how up in log'
  E         - Something that should not show up in log
  
  test_iostream.py:132: AssertionError

Gonna re-run to see if it's a flake, as the failing test (test_iostream.py, test_err + test_iostream.cpp, raw_err) don't seem like they'd be related to this change.

@rwgk
Copy link
Collaborator

rwgk commented Jan 8, 2021

Yes, it's one of the two known thread-related issues: #2754

Comment on lines +85 to 97
// Adapted from boost::hash_combine(). See also:
// https://stackoverflow.com/a/27952689/7829525.
inline void hash_combine(size_t& lhs, size_t rhs ) {
lhs ^= rhs + 0x9e3779b9 + (lhs << 6) + (lhs >> 2);
}

struct override_hash {
inline size_t operator()(const std::pair<const PyObject *, const char *>& v) const {
inline size_t operator()(const std::pair<const PyObject *, std::string>& v) const {
size_t value = std::hash<const void *>()(v.first);
value ^= std::hash<const void *>()(v.second) + 0x9e3779b9 + (value<<6) + (value>>2);
hash_combine(value, std::hash<std::string>()(v.second));
return value;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this at least doesn't regress the hash like #2092 does. However this is still not what Boost's hash_combine does. Boost's equivalent would be:

size_t seed = 0;
size_t lhs = std::hash<const void*>()(v.first); // Same as here
size_t rhs = std::hash<std::string>()(v.second); // Same as here
seed ^= lhs + 0x9e3779b9 + (seed << 6) + (seed >> 2); // This line is missing!
seed ^= rhs + 0x9e3779b9 + (seed << 6) + (seed >> 2);

Right now, instead of the first seed ^= lhs... we're just using seed = lhs. With initial seed being 0, boost's hash_combine would really do just lhs += 0x9e3779b9; before the lhs ^= rhs + ... line.

Full disclaimer, I do not claim to be an expert on good hashing functions. I'm just noticing the slight difference. I don't actually know if this should be addressed. I would however trust Boost's implementation on the grounds of "boost::hash_combine was written by someone who cares about hashes more than I do". Also, the boost::hash_combine has had this magic number since the beginning, in 2005.

@rwgk
Copy link
Collaborator

rwgk commented Jan 8, 2021

@henryiii wrote: These macros are evil;

I agree, and want to add a meta comment regarding the version bump: We are paying an enormous tax for this feature: Stifling progress.
Using std::string here is clearly the right thing to do, but we're stuck, on something so simple.
What are we gaining for that tax?
What's worth more?

  • Ability to improve/fix our product?
  • Enable interoperability of extensions compiled with different pybind11 versions?
    Clearly, both have a value, but do we have the right balance?
    Small attempt to reason out (the best I understand the situation, please correct me if needed):
  • The version isolation is awesome, an absolute keeper.
  • Users need to worry about version mismatches only if they need interoperability.
  • How much of a hardship is it to require that extensions needing interoperability have to be built with similar-enough pybind11 versions?

If the consensus or de-facto conclusion is that the hardship is worse than stifling progress, maybe we're at a stage where we need to branch, to maintain and develop concurrently? Like Python is already developing 3.10 while still supporting 3.6.

@YannickJadoul
Copy link
Collaborator

If the consensus or de-facto conclusion is that the hardship is worse than stifling progress, maybe we're at a stage where we need to branch, to maintain and develop concurrently? Like Python is already developing 3.10 while still supporting 3.6.

A few things: Python has a much larger maintenance than pybind11. I think we're current happy if the number of issues and PRs stops going up.
Though, apart from that, currently, I don't think there's anything stopping us from merging things into master and still releasing a cherry-picked/backported 2.6.2. There's just not really been any urgent things, I think, that need to get merged into master immediately? So it's been easier to say "let's wait until we know 2.7 is next". (Maybe I'm wrong here, but that's been my impression.)

At any rate, any ABI change is at least a minor bump, and any ABI breakage needs to be rare (I know from Gitter that people do make different modules that are supposed to interact, and see @henryiii's comment about conda-forge), so it's a good plan to label and "collect" ABI-breaking PRs, and not end up with an exploding ABI version counter.

@henryiii
Copy link
Collaborator

henryiii commented Jan 8, 2021

Breaking the ABI will hit quite a few projects. PyTorch might not be able to load previously JITed code, for example - they would probably need to update PyBind11 past an ABI break exactly on a specific release. And pybind11's distribution is really scattered; some projects pin via submodule, some use PyPI, some use Conda, some use brew - and there’s no way to override it.

Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the original bug report at #1922, this sounded very familiar. I remembered why and found this PR: #2564

So I think (not sure), that this might already be fixed. See

auto &cache = internals.inactive_override_cache;
for (auto it = cache.begin(), last = cache.end(); it != last; ) {
if (it->first == (PyObject *) tinfo->type)
it = cache.erase(it);
else
++it;
}
delete tinfo;
}
(introduced by #2564)

Did you have a way of reproducing the original issue, @EricCousineau-TRI? If this already fixed it, maybe we can forget about the whole breaking ABI discussion here (until the next time ;-) ), or at least discuss it from a different perspective (maybe the full type name would still be useful? The combine_hash should definitely be kept, though it wouldn't be a change that breaks the ABI anymore, I think?).

EDIT: Note that #2564 still had a few issues/rough edges, btw!

inline void hash_combine(size_t& lhs, size_t rhs ) {
lhs ^= rhs + 0x9e3779b9 + (lhs << 6) + (lhs >> 2);
}

struct override_hash {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the change to this hash is probably also breaking ABI?

@EricCousineau-TRI
Copy link
Collaborator Author

EricCousineau-TRI commented Jan 22, 2021

Did you have a way of reproducing the original issue, @EricCousineau-TRI?

Yup, though I think it's a bit picky about which machine, etc. Here's my plan to try the repro:

@EricCousineau-TRI
Copy link
Collaborator Author

Oooh, nice! Yup, @YannickJadoul, your PR does fix this! So yeah, I'll close this PR, and carve out the combine_hash mebbe (and I can ensure it has the same functionality to preserve ABI).

Thanks!

@YannickJadoul
Copy link
Collaborator

Ah, nice! Thanks for testing, @EricCousineau-TRI! :-) And sorry for not seeing this earlier, or finding these existing PRs/issues together with that fix; I didn't know you and @BetsyMcPhail had already put in so much effort :-/

carve out the combine_hash mebbe (and I can ensure it has the same functionality to preserve ABI).

Sounds good to me, yes!

@YannickJadoul
Copy link
Collaborator

Also, it's worth checking if (when we would break ABI), this solution would be better than the current solution? If so, we could replace the current fix, next time we are break ABI anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_type_overload may encounter false cache hits if derived instances are GC'd
6 participants