Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyReferenceToPersistent.__hash__ should be checked #9

Open
ale-rt opened this issue May 14, 2019 · 3 comments
Open

KeyReferenceToPersistent.__hash__ should be checked #9

ale-rt opened this issue May 14, 2019 · 3 comments

Comments

@ale-rt
Copy link
Member

ale-rt commented May 14, 2019

Probably it should contain more attributes to identify in a more unique way the object in the BTree

@mauritsvanrees
Copy link
Member

I ran into a problem that seems to be caused by __hash__ and similar methods not being unique enough. For one customer we have a ZODB with several similar Plone Sites, most of which have been created by using one Plone Site as template and copying it, or making a zexp and importing it.

We had already noticed that it was needed to reregister all objects with the intid catalog, and we have a script for that, basically calling intids.getId and if this fails call intids.register. But today I noticed that some objects were still missing an intid. Okay, I ran the script again, and they were fixed. But then I ran it again, and it found yet more objects that needed to be fixed.

Basic problem is that getId(obj) gives a KeyError, so I call register(obj), but this finds an existing intid for a similar object. A new getId call fails.

Several rabbit holes later, it looks like fixing __hash__ and a few others helps, adding self.path in the mix:

    def __hash__(self):
        return hash((self.dbname,
                     self.object._p_oid,
                     self.path
                     ))

    def __cmp__(self, other):
        # XXX This makes no sense on Python 3
        if self.key_type_id == other.key_type_id:
            return cmp((self.dbname, self.oid, self.path), (other.dbname, other.oid, other.path))
        return cmp(self.key_type_id, other.key_type_id)

    def _get_cmp_keys(self, other):
        if self.key_type_id == other.key_type_id:
            return (self.dbname, self.oid, self.path), (other.dbname, other.oid, other.path)
        return self.key_type_id, other.key_type_id

I don't know yet if this is good enough, but for the moment it helps.

(There is more wrong in these sites, like paths to objects in other Plone Sites ending up in the intid catalog, probably left-overs from a copy or zexp.)

For now I just dump the above code, so I don't lose it.

Since the five.intid key references inherit from zope.keyreference, this PR that drops the cmp methods may be interesting. Already merged in 2018, but not released yet.

@ale-rt Did you ever get around to working on fixing this, or do you maybe have a script that cleans up the intid catalog or reregisters objects?

@ale-rt
Copy link
Member Author

ale-rt commented Mar 18, 2021

I think adding self.path to the hash is definitely a step forward.

Since I opened this issue I never find myself in trouble with five.intid.

Anyway I am sharing what I have.
Ploneintranet contains some helpers: https://github.com/quaive/ploneintranet/pull/2660. for whoever does not have permissions to see it this is the code:

diff --git a/news/2659.added b/news/2659.added
new file mode 100644
index 000000000..e41d29c37
--- /dev/null
+++ b/news/2659.added
@@ -0,0 +1 @@
+Provide views to inspect the health and fix the intid persistent utility
diff --git a/src/ploneintranet/admintool/browser/configure.zcml b/src/ploneintranet/admintool/browser/configure.zcml
index 5b4975918..a96db7f45 100644
--- a/src/ploneintranet/admintool/browser/configure.zcml
+++ b/src/ploneintranet/admintool/browser/configure.zcml
@@ -60,4 +60,13 @@
       layer="ploneintranet.admintool.browser.interfaces.IPloneintranetAdmintoolLayer"
       />
 
+  <!-- XXX this should probably be moved to something like ploneintranet.management -->
+  <browser:page
+      name="pi-intids"
+      for="Products.CMFPlone.interfaces.siteroot.IPloneSiteRoot"
+      permission="cmf.ManagePortal"
+      class=".intids.ManageIntIds"
+      allowed_attributes="check fix"
+      />
+
 </configure>
diff --git a/src/ploneintranet/admintool/browser/intids.py b/src/ploneintranet/admintool/browser/intids.py
new file mode 100644
index 000000000..82e090a68
--- /dev/null
+++ b/src/ploneintranet/admintool/browser/intids.py
@@ -0,0 +1,102 @@
+# coding=utf-8
+# XXX this should probably be moved to something like ploneintranet.management
+from plone import api
+from Products.Five import BrowserView
+from zope.component import getUtility
+from zope.intid.interfaces import IIntIds
+
+
+class ManageIntIds(BrowserView):
+    """ Admin view that takes care of the intid persistent utility
+    """
+
+    def log(self, line):
+        self.request.response.appendBody(line)
+        self.request.response.appendBody("\n")
+
+    @property
+    def intids(self):
+        return getUtility(IIntIds)
+
+    def _path(self, persistentkey):
+        obj = api.content.get(UID=persistentkey.object.UID())
+        try:
+            return "/".join(obj.getPhysicalPath())
+        except Exception:
+            pass
+
+    @property
+    def keys_with_a_broken_path(self):
+        return [
+            key
+            for key in self.intids.ids
+            if not self.context.unrestrictedTraverse(key.path, None)
+        ]
+
+    @property
+    def refs_with_a_broken_path(self):
+        return [
+            ref
+            for ref in self.intids.refs.values()
+            if not self.context.unrestrictedTraverse(ref.path, None)
+        ]
+
+    def fix_broken_paths(self):
+        for key in self.keys_with_a_broken_path:
+            proper_path = self._path(key)
+            if proper_path:
+                line = "Fixed key {}: {} -> {}".format(
+                    key.object, key.path, proper_path
+                )
+                key.path = proper_path
+            else:
+                line = "Error key {}: {} -> {}".format(
+                    key.object, key.path, proper_path
+                )
+            self.log(line)
+
+        for ref in self.refs_with_a_broken_path:
+            proper_path = self._path(ref)
+            if proper_path:
+                line = "Fixed ref {}: {} -> {}".format(
+                    ref.object, ref.path, proper_path
+                )
+                ref.path = proper_path
+            else:
+                line = "Error ref {}: {} -> {}".format(
+                    ref.object, ref.path, proper_path
+                )
+            self.log(line)
+
+    def check(self):
+        """ Check the intid catalog and return a summary
+        """
+        keys_with_a_broken_path = self.keys_with_a_broken_path
+        self.log(
+            "Number of keys with broken path: {}".format(
+                len(keys_with_a_broken_path)
+            )
+        )
+        for key in keys_with_a_broken_path:
+            line = "\t{}: {} -> {}".format(
+                key.object, key.path, self._path(key)
+            )
+            self.log(line)
+        refs_with_a_broken_path = self.refs_with_a_broken_path
+        self.log(
+            "\nNumber of references with broken path: {}".format(
+                len(refs_with_a_broken_path)
+            )
+        )
+
+        for ref in refs_with_a_broken_path:
+            line = "\t{}: {} -> {}".format(
+                ref.object, ref.path, self._path(ref)
+            )
+            self.log(line)
+
+    def fix(self):
+        """ Fix the intid catalog and return a summary
+        """
+        self.fix_broken_paths()
+        return
diff --git a/src/ploneintranet/admintool/tests/test_intids.py b/src/ploneintranet/admintool/tests/test_intids.py
new file mode 100644
index 000000000..804eb130c
--- /dev/null
+++ b/src/ploneintranet/admintool/tests/test_intids.py
@@ -0,0 +1,47 @@
+# -*- coding: utf-8 -*-
+from plone import api
+from ploneintranet.admintool.testing import IntegrationTestCase
+
+
+class TestViews(IntegrationTestCase):
+    """Test installation of this  product into Plone."""
+
+    def setUp(self):
+        """Custom shared utility setup for tests."""
+        self.portal = self.layer["portal"]
+        self.request = self.layer["request"]
+
+    def test_intid_check(self):
+        """
+        """
+        view = api.content.get_view(
+            "pi-intids", self.portal, self.request.clone()
+        )
+
+        # Everything is good at the beginning
+        self.assertListEqual(view.keys_with_a_broken_path, [])
+        self.assertListEqual(view.refs_with_a_broken_path, [])
+
+        # Now we break one object
+        obj = self.portal.profiles
+        intid = view.intids.getId(obj)
+        view.intids.refs[intid].path = "/plone/broken"
+        self.assertListEqual(view.keys_with_a_broken_path, [])
+        self.assertListEqual(
+            view.refs_with_a_broken_path, [view.intids.refs[intid]]
+        )
+        view.check()
+
+        # We see that calling check will report the user what needs to be fixed
+        self.assertIn(
+            "UserProfileContainer at profiles", view.request.response.body
+        )
+
+        # Now we see that the view fixes it
+        view.fix()
+        self.assertIn(
+            "Fixed ref <UserProfileContainer at profiles>",
+            view.request.response.body,
+        )
+        self.assertListEqual(view.keys_with_a_broken_path, [])
+        self.assertListEqual(view.refs_with_a_broken_path, [])
diff --git a/src/ploneintranet/suite/tests/test_setup.py b/src/ploneintranet/suite/tests/test_setup.py
index 65543ef6f..f8b29332f 100644
--- a/src/ploneintranet/suite/tests/test_setup.py
+++ b/src/ploneintranet/suite/tests/test_setup.py
@@ -55,6 +55,15 @@ class TestSetup(unittest.TestCase):
         self.assertNotIn(
             'robinson_crusoe', self.portal.groups.intranet.members)
 
+    @unittest.skip("Enable this after we have the patch from https://github.com/plone/five.intid/pull/8")  # noqa: E501
+    def test_intids_ok(self):
+        """ Check that after everything is set up we have
+        a proper initid catalog
+        """
+        view = self.portal.restrictedTraverse('pi-intids')
+        self.assertListEqual(view.keys_with_a_broken_path, [])
+        self.assertListEqual(view.refs_with_a_broken_path, [])
+
 
 class TestUninstall(unittest.TestCase):
     """Test that ploneintranet.suite is properly uninstalled."""

In short this checks that all the objects can be traversed.
If they are not, lookup for the object by UID and fix the path.
I am not really sure it fits your use case anyway.

Maybe https://github.com/collective/collective.relationhelpers has something better.

CC @pbauer

mauritsvanrees added a commit that referenced this issue Mar 18, 2021
This makes references more unique, fixing inconsistencies.
Fixes #9
@mauritsvanrees
Copy link
Member

I have changed my mind. I think the hash is fine.

Let's say we add path to the hash, as I tried in PR #19. And we change the path of key. Then this means the hash of this key changes. This in turn can mean that the key can no longer be found in the intids BTrees.

The general idea is that hashes must be stable across the lifetime of an object. Otherwise it cannot be used as key in a dictionary (or BTree). See the Python docs:

In other words, I think your temporary workaround from 2019 (released withfive.intid 1.2.3 in Plone 5.2.0) is exactly the right fix: when an object is moved, we remove its key and intid from the intids utility, and add them fresh.

If we ever do change the __hash__ method, we should probably create an upgrade step that rebuilds the intid catalog. I have just created code for that, as I need it. In my case I actually need it both with and without the new hash method I tried, as there was too much inconsistency in the intid structure. Cause is either liberal use of zexps, or the fact that this site started in 5.1, so before your fix was available.

Code for rebuilding the intids, with extra checks and comments:

    id_keys_missing_from_id = [key for key in intids.ids if key not in intids.ids]
    if id_keys_missing_from_id:
        print(
            f"{len(id_keys_missing_from_id)} keys from intids.ids are missing from intid.ids. "
            f"This sounds weird, but may happen when the hash method changes."
        )
    if id_keys_missing_from_id:
        print("Repopulating BTrees.")
        # The refs and ids should be a mirror of each other.
        # There might be inconsistencies between refs and ids,
        # so let's take the refs as the original and rebuild from there.
        intid_refs = list(intids.refs.items())
        intids.ids.clear()
        intids.refs.clear()
        for key, value in intid_refs:
            intids.refs[key] = value
            intids.ids[value] = key
        print("Done repopulating BTrees.")
        # We check again.
        id_keys_missing_from_id = [key for key in intids.ids if key not in intids.ids]
        if id_keys_missing_from_id:
            print(
                f"ERROR: {len(id_keys_missing_from_id)} keys from intids.ids are missing from intid.ids. "
                f"This is after rebuilding the BTrees, so something is wrong."
            )

Conclusion: I think this issue can be closed (though we could cleanup some comments in the code then).

mauritsvanrees added a commit to plone/plone.api that referenced this issue Mar 19, 2021
mister-roboto pushed a commit to plone/buildout.coredev that referenced this issue Mar 22, 2021
Branch: refs/heads/master
Date: 2021-03-19T16:40:49+01:00
Author: ale-rt (ale-rt) <alessandro.pisa@gmail.com>
Commit: plone/plone.api@98ef9c9

Demonstrate that the intids catalog gets broken after moving an object

Files changed:
M src/plone/api/tests/test_content.py
Repository: plone.api

Branch: refs/heads/master
Date: 2021-03-19T16:48:25+01:00
Author: Maurits van Rees (mauritsvanrees) <maurits@vanrees.org>
Commit: plone/plone.api@b1cb5ec

initids -&gt; intids

Files changed:
M src/plone/api/tests/test_content.py
Repository: plone.api

Branch: refs/heads/master
Date: 2021-03-19T16:50:19+01:00
Author: Maurits van Rees (mauritsvanrees) <maurits@vanrees.org>
Commit: plone/plone.api@de0ab8c

Removed 4.3 code from tests.

Files changed:
M src/plone/api/tests/test_content.py
Repository: plone.api

Branch: refs/heads/master
Date: 2021-03-19T16:50:19+01:00
Author: Maurits van Rees (mauritsvanrees) <maurits@vanrees.org>
Commit: plone/plone.api@a0c7876

Tests: verify that intid keys can be found back.

Files changed:
M src/plone/api/tests/test_content.py
Repository: plone.api

Branch: refs/heads/master
Date: 2021-03-19T17:06:35+01:00
Author: Maurits van Rees (mauritsvanrees) <maurits@vanrees.org>
Commit: plone/plone.api@44bb079

Added news snippet for added intid tests.

See plone/plone.api#430
and plone/five.intid#9 (comment)

Files changed:
A news/430.bugfix
Repository: plone.api

Branch: refs/heads/master
Date: 2021-03-22T11:02:14+01:00
Author: Maurits van Rees (mauritsvanrees) <m.van.rees@zestsoftware.nl>
Commit: plone/plone.api@403b152

Merge pull request #456 from plone/maurits/verify-intids-in-tests

Verify intids in tests

Files changed:
A news/430.bugfix
M src/plone/api/tests/test_content.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants