Update 'inactive_override_cache' key #2772

EricCousineau-TRI · 2021-01-07T23:00:34Z

Description

Avoid false cache hits for virtual overrides by using __qualname__ as the hashing key. Change storage to std::string to avoid memory management mishaps.

Supersedes #2092 - just merge conflict resolution; I'll address any further changes.
Resolves #1922

Suggested changelog entry:

Avoid false cache hits for virtual overrides by using ``__qualname__`` as the hashing key. Change storage of key to use ``std::string`` to avoid memory management mishaps.

\cc @BetsyMcPhail

EricCousineau-TRI · 2021-01-07T23:02:06Z

@YannickJadoul / @rwgk Would one of y'all be up for reviewing this?

Instead of just the function name, use '__qualname__.name'.

EricCousineau-TRI · 2021-01-07T23:08:50Z

include/pybind11/detail/internals.h

@@ -97,7 +97,7 @@ struct internals {
    type_map<type_info *> registered_types_cpp; // std::type_index -> pybind11's type information
    std::unordered_map<PyTypeObject *, std::vector<type_info *>> registered_types_py; // PyTypeObject* -> base type_info(s)
    std::unordered_multimap<const void *, instance*> registered_instances; // void * -> instance*
-    std::unordered_set<std::pair<const PyObject *, const char *>, override_hash> inactive_override_cache;
+    std::unordered_set<std::pair<const PyObject *, std::string>, override_hash> inactive_override_cache;


Er... mayhaps this change requires a bump on the internal version.

I can't think of an easy to way to keep const char* around, as I'm unaware of what happens with Python strings 'n such...
Ideas?

Definitely needs to bump that, yes :-/ That does complicate things slightly, release-wise?

Ouch, not going to do that until the next release, at least. We should bundle all such changes together and then do a bump, I think. Also, I thought standard library templated classes were not valid in an ABI (@jpivarski)? I take it this is why we have so many issues with the ABI? Could we redesign this to just be simple classes and do our own memory management here?

Yeah, I think that could be done! I'll check out Boris's suggestion in Betsy's original PR, tho, and see if that gives us const char* 🤞

From Boris:

This should probably use detail::get_fully_qualified_tp_name (from #2520) [...]

Dern, looking at #2520, it's std::string. But yeah, I can see about doing some naive memory management.

TLDR: for now, if we can keep the ABI the same, that's best. Eventually we will need a bump. In the future, maybe we can design internals that work cross-compiler?

Ah, I see. Cool, I'll re-open and re-scope #2773 to capture this.

I made an "abi break" - very-brighly colored - label ( abi break ). Let's use this to indicate PRs that break the ABI and should be pooled together?

@henryiii Agreed that it would be amazing if we can get rid of these and use something that's basically shared between all compilers (guess that's POD types and extern "C"?). Just curious if ríght now, we have a problem when using STL in the internals, or whether that's at least +- safely worked around.

The current workaround is fine, but something we'd like to possibly redesign to enable wider compatibility in the future, perhaps. In the current design, std::string should be fine, I think it's not worse than unordered map or anything else. I know std::vector is not allowed, so guessing pretty much everything else is out. Somewhere I think @bstaletic just listed the options, which I think is what I was thinking; either we do everything ourselves, or we pay a cost. (Though I feel like we can probably do pretty well). But also haven't really looked at this part of the code much at all, mostly pretty general observations ATM.

YannickJadoul · 2021-01-07T23:17:25Z

@YannickJadoul / @rwgk Would one of y'all be up for reviewing this?

Yes! Tomorrow or during the weekend good enough?

YannickJadoul · 2021-01-07T23:23:34Z

Note to self and others: also look at review comments in #2092, and in particular @bstaletic's #2092 (comment) (so @bstaletic should probably also review this again).

EricCousineau-TRI · 2021-01-07T23:46:29Z

Note to self and others: also look at review comments in #2092 [...]

🤦 Good point! I'll take a gander there!

Yes! Tomorrow or during the weekend good enough?

Yup, sounds awesome!

include/pybind11/detail/internals.h

EricCousineau-TRI · 2021-01-08T05:09:29Z

Hm... mystified by this error: https://github.com/pybind/pybind11/pull/2772/checks?check_run_id=1665921586#step:11:70

  capfd = <_pytest.capture.CaptureFixture object at 0x000002417F72AD00>
  
      def test_err(capfd):
          msg = "Something that should not show up in log"
          stream = StringIO()
          with redirect_stderr(stream):
              m.raw_err(msg)
          stdout, stderr = capfd.readouterr()
          assert stdout == ""
  >       assert stderr == msg
  E       AssertionError: assert '' == 'Something th...how up in log'
  E         - Something that should not show up in log
  
  test_iostream.py:132: AssertionError

Gonna re-run to see if it's a flake, as the failing test (test_iostream.py, test_err + test_iostream.cpp, raw_err) don't seem like they'd be related to this change.

rwgk · 2021-01-08T06:15:15Z

Yes, it's one of the two known thread-related issues: #2754

bstaletic · 2021-01-08T10:07:08Z

include/pybind11/detail/internals.h

+// Adapted from boost::hash_combine(). See also:
+// https://stackoverflow.com/a/27952689/7829525.
+inline void hash_combine(size_t& lhs, size_t rhs ) {
+    lhs ^= rhs + 0x9e3779b9 + (lhs << 6) + (lhs >> 2);
+}
+
 struct override_hash {
-    inline size_t operator()(const std::pair<const PyObject *, const char *>& v) const {
+    inline size_t operator()(const std::pair<const PyObject *, std::string>& v) const {
        size_t value = std::hash<const void *>()(v.first);
-        value ^= std::hash<const void *>()(v.second) + 0x9e3779b9 + (value<<6) + (value>>2);
+        hash_combine(value, std::hash<std::string>()(v.second));
        return value;
    }
 };


Okay, this at least doesn't regress the hash like #2092 does. However this is still not what Boost's hash_combine does. Boost's equivalent would be:

size_t seed = 0; size_t lhs = std::hash<const void*>()(v.first); // Same as here size_t rhs = std::hash<std::string>()(v.second); // Same as here seed ^= lhs + 0x9e3779b9 + (seed << 6) + (seed >> 2); // This line is missing! seed ^= rhs + 0x9e3779b9 + (seed << 6) + (seed >> 2);

Right now, instead of the first seed ^= lhs... we're just using seed = lhs. With initial seed being 0, boost's hash_combine would really do just lhs += 0x9e3779b9; before the lhs ^= rhs + ... line.

Full disclaimer, I do not claim to be an expert on good hashing functions. I'm just noticing the slight difference. I don't actually know if this should be addressed. I would however trust Boost's implementation on the grounds of "boost::hash_combine was written by someone who cares about hashes more than I do". Also, the boost::hash_combine has had this magic number since the beginning, in 2005.

rwgk · 2021-01-08T15:07:04Z

@henryiii wrote: These macros are evil;

I agree, and want to add a meta comment regarding the version bump: We are paying an enormous tax for this feature: Stifling progress.
Using std::string here is clearly the right thing to do, but we're stuck, on something so simple.
What are we gaining for that tax?
What's worth more?

Ability to improve/fix our product?
Enable interoperability of extensions compiled with different pybind11 versions?
Clearly, both have a value, but do we have the right balance?
Small attempt to reason out (the best I understand the situation, please correct me if needed):
The version isolation is awesome, an absolute keeper.
Users need to worry about version mismatches only if they need interoperability.
How much of a hardship is it to require that extensions needing interoperability have to be built with similar-enough pybind11 versions?

If the consensus or de-facto conclusion is that the hardship is worse than stifling progress, maybe we're at a stage where we need to branch, to maintain and develop concurrently? Like Python is already developing 3.10 while still supporting 3.6.

YannickJadoul · 2021-01-08T18:20:39Z

If the consensus or de-facto conclusion is that the hardship is worse than stifling progress, maybe we're at a stage where we need to branch, to maintain and develop concurrently? Like Python is already developing 3.10 while still supporting 3.6.

A few things: Python has a much larger maintenance than pybind11. I think we're current happy if the number of issues and PRs stops going up.
Though, apart from that, currently, I don't think there's anything stopping us from merging things into master and still releasing a cherry-picked/backported 2.6.2. There's just not really been any urgent things, I think, that need to get merged into master immediately? So it's been easier to say "let's wait until we know 2.7 is next". (Maybe I'm wrong here, but that's been my impression.)

At any rate, any ABI change is at least a minor bump, and any ABI breakage needs to be rare (I know from Gitter that people do make different modules that are supposed to interact, and see @henryiii's comment about conda-forge), so it's a good plan to label and "collect" ABI-breaking PRs, and not end up with an exploding ABI version counter.

henryiii · 2021-01-08T19:05:33Z

Breaking the ABI will hit quite a few projects. PyTorch might not be able to load previously JITed code, for example - they would probably need to update PyBind11 past an ABI break exactly on a specific release. And pybind11's distribution is really scattered; some projects pin via submodule, some use PyPI, some use Conda, some use brew - and there’s no way to override it.

YannickJadoul

Reading the original bug report at #1922, this sounded very familiar. I remembered why and found this PR: #2564

So I think (not sure), that this might already be fixed. See

pybind11/include/pybind11/detail/class.h

Lines 220 to 229 in 98f1bbb

    
               auto &cache = internals.inactive_override_cache; 
        
               for (auto it = cache.begin(), last = cache.end(); it != last; ) { 
        
                   if (it->first == (PyObject *) tinfo->type) 
        
                       it = cache.erase(it); 
        
                   else 
        
                       ++it; 
        
               } 
        
               delete tinfo; 
        
           }

(introduced by #2564)

Did you have a way of reproducing the original issue, @EricCousineau-TRI? If this already fixed it, maybe we can forget about the whole breaking ABI discussion here (until the next time ;-) ), or at least discuss it from a different perspective (maybe the full type name would still be useful? The combine_hash should definitely be kept, though it wouldn't be a change that breaks the ABI anymore, I think?).

EDIT: Note that #2564 still had a few issues/rough edges, btw!

YannickJadoul · 2021-01-10T14:03:05Z

include/pybind11/detail/internals.h

+inline void hash_combine(size_t& lhs, size_t rhs ) {
+    lhs ^= rhs + 0x9e3779b9 + (lhs << 6) + (lhs >> 2);
+}
+
 struct override_hash {


Note: the change to this hash is probably also breaking ABI?

EricCousineau-TRI · 2021-01-22T15:09:45Z

Did you have a way of reproducing the original issue, @EricCousineau-TRI?

Yup, though I think it's a bit picky about which machine, etc. Here's my plan to try the repro:

(Result: FAIL) From the issue, I will try this issue as-is to show that it fails: EricCousineau-TRI@9726086
(Result: FAIL) Before fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564: 59113a1
(Result: SUCCESS) After fix: test_factory_constructors.py failure triggered by test_register_duplicate_class #2564, before this PR patch: 45f0265
(Result: SUCCESS) After this patch: 8624889

EricCousineau-TRI · 2021-01-22T15:24:36Z

Oooh, nice! Yup, @YannickJadoul, your PR does fix this! So yeah, I'll close this PR, and carve out the combine_hash mebbe (and I can ensure it has the same functionality to preserve ABI).

Thanks!

YannickJadoul · 2021-01-25T00:21:10Z

Ah, nice! Thanks for testing, @EricCousineau-TRI! :-) And sorry for not seeing this earlier, or finding these existing PRs/issues together with that fix; I didn't know you and @BetsyMcPhail had already put in so much effort :-/

carve out the combine_hash mebbe (and I can ensure it has the same functionality to preserve ABI).

Sounds good to me, yes!

YannickJadoul · 2021-01-25T00:22:22Z

Also, it's worth checking if (when we would break ABI), this solution would be better than the current solution? If so, we could replace the current fix, next time we are break ABI anyway?

This was referenced Jan 7, 2021

Update 'inactive_overload_cache' key #2092

Closed

Child Class Method Not Being Called as Expected, Base Class Method Called Instead #1552

Closed

Update 'inactive_override_cache' key

24e6f88

Instead of just the function name, use '__qualname__.name'.

EricCousineau-TRI force-pushed the feature-override-cache-qualname branch from 0e080f5 to 24e6f88 Compare January 7, 2021 23:06

EricCousineau-TRI commented Jan 7, 2021

View reviewed changes

YannickJadoul self-requested a review January 7, 2021 23:17

YannickJadoul requested a review from bstaletic January 7, 2021 23:23

EricCousineau-TRI commented Jan 7, 2021

View reviewed changes

include/pybind11/detail/internals.h Outdated Show resolved Hide resolved

This was referenced Jan 8, 2021

[BUG] Should steer pybind11's internal ABI away from STL types to make them work cross-compiler #2773

Open

[BUG] 🐍 3.10-dev (ubuntu, mac, ~windows): TypeError: 'str' object is not callable #2774

Open

fixup! address review

7e991aa

bstaletic reviewed Jan 8, 2021

View reviewed changes

YannickJadoul added the abi break label Jan 8, 2021

YannickJadoul reviewed Jan 10, 2021

View reviewed changes

EricCousineau-TRI closed this Jan 22, 2021

EricCousineau-TRI mentioned this pull request Jan 22, 2021

get_type_overload may encounter false cache hits if derived instances are GC'd #1922

Closed

BetsyMcPhail mentioned this pull request Mar 2, 2021

geometry_test:test_unimplemented_rendering fails sporadically on Mac RobotLocomotion/drake#14686

Closed

rwgk mentioned this pull request Feb 10, 2023

FWD pybind11 google/pybind11clif#2772

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update 'inactive_override_cache' key #2772

Update 'inactive_override_cache' key #2772

EricCousineau-TRI commented Jan 7, 2021 •

edited

Loading

EricCousineau-TRI commented Jan 7, 2021 •

edited

Loading

EricCousineau-TRI Jan 7, 2021

YannickJadoul Jan 7, 2021

henryiii Jan 7, 2021

EricCousineau-TRI Jan 7, 2021 •

edited

Loading

EricCousineau-TRI Jan 7, 2021

henryiii Jan 8, 2021

EricCousineau-TRI Jan 8, 2021

YannickJadoul Jan 8, 2021

YannickJadoul Jan 8, 2021

henryiii Jan 8, 2021

YannickJadoul commented Jan 7, 2021

YannickJadoul commented Jan 7, 2021

EricCousineau-TRI commented Jan 7, 2021

EricCousineau-TRI commented Jan 8, 2021 •

edited

Loading

rwgk commented Jan 8, 2021

bstaletic Jan 8, 2021

rwgk commented Jan 8, 2021

YannickJadoul commented Jan 8, 2021

henryiii commented Jan 8, 2021

YannickJadoul left a comment •

edited

Loading

YannickJadoul Jan 10, 2021

EricCousineau-TRI commented Jan 22, 2021 •

edited

Loading

EricCousineau-TRI commented Jan 22, 2021

YannickJadoul commented Jan 25, 2021

YannickJadoul commented Jan 25, 2021

	auto &cache = internals.inactive_override_cache;
	for (auto it = cache.begin(), last = cache.end(); it != last; ) {
	if (it->first == (PyObject *) tinfo->type)
	it = cache.erase(it);
	else
	++it;
	}

	delete tinfo;
	}

Update 'inactive_override_cache' key #2772

Update 'inactive_override_cache' key #2772

Conversation

EricCousineau-TRI commented Jan 7, 2021 • edited Loading

Description

Suggested changelog entry:

EricCousineau-TRI commented Jan 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EricCousineau-TRI Jan 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YannickJadoul commented Jan 7, 2021

YannickJadoul commented Jan 7, 2021

EricCousineau-TRI commented Jan 7, 2021

EricCousineau-TRI commented Jan 8, 2021 • edited Loading

rwgk commented Jan 8, 2021

Choose a reason for hiding this comment

rwgk commented Jan 8, 2021

YannickJadoul commented Jan 8, 2021

henryiii commented Jan 8, 2021

YannickJadoul left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EricCousineau-TRI commented Jan 22, 2021 • edited Loading

EricCousineau-TRI commented Jan 22, 2021

YannickJadoul commented Jan 25, 2021

YannickJadoul commented Jan 25, 2021

EricCousineau-TRI commented Jan 7, 2021 •

edited

Loading

EricCousineau-TRI commented Jan 7, 2021 •

edited

Loading

EricCousineau-TRI Jan 7, 2021 •

edited

Loading

EricCousineau-TRI commented Jan 8, 2021 •

edited

Loading

YannickJadoul left a comment •

edited

Loading

EricCousineau-TRI commented Jan 22, 2021 •

edited

Loading