-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On cluster join, simulation erroneously simulates owner as being local #62
Comments
Assuming that I'm right that this indeed is a problem (this is typically where @GregDThomas points out why I'm wrong), I'm seeing at least a couple of paths to fix this:
|
I think you may have misinterpreted what's happening. Or I have, probably more likely. Not quite sure what is happening. At what point does the By my reading, I think what is happening is; Maybe. It's been a while! |
I'm grateful that you're at least trying, even if it has been a while. :) I very much appreciate your feedback.
I'm not sure what you're referring to here. Are you certain of what you say under item C? Specifically, I think that at this point, the caches are explicitly no longer local - they have already been switched to the The purpose of the entryLIstener is to record what is happening on all other cluster nodes, not what's happening on the local node. This is to ensure that when a cluster unexpectedly leaves the cluster, we can use that recorded data to 'clean up'. From what I've seen so far, the data structures that are maintained through entryListeners do not even include data that's "owned" by the local node. That is why I triggered when seeing the local node to be used as the source of these events. These simulated events are only being processed locally. |
No, not at all sure. I trust your judgement! re. solution number 1, my concern is that every cache entry must be of the same shape so that the real owner can be found. Perhaps a variation on number 2; each node simply sends a single message (for each Cache) containing the keys in the cache that it owns. The new node can then fetch the values from the Cache, and simulate the entry for the correct owner. |
I think that we're largely in agreement. As for your concern for solution number 1: that is only a problem if the entries are maintained in a singular data structure (for all caches), as is (somewhat) the case in the Hazelcast plugin. When we move this to Openfire (#50), then we it is only natural to have a data structure 'per use-case' - where every use case can have it's own shape. As for two: somewhat, yes. The secondary broadcasts would be a lot lighter (only including ownership and minimal data), but it's not ideal. I'd use this only if we can't / won't record "node ownership" of a particular data type (which might not always be relevant to the datatype). @evdherberg and me are working on #50 as-we-type. As soon as we push, you'll see updates on his branch, here: https://github.com/evdherberg/Openfire/tree/test-cluster-refactor - So far, we haven't needed option 2 yet, so that's good. This is very much a work in progress (and some refactoring can be expected), but I'm interested in any feedback that you already have with the approach we've taken. (the corresponding changes in the Hazelcast plugin: https://github.com/evdherberg/openfire-hazelcast-plugin/tree/improve-clustering) |
Love seeing all the code being removed from the plugin! OK, so it's being moved, but hopefully the new code will be better. (glad to see the plugin is getting some love, at least, I've not touched it for what seems like forever) |
As a thought, is it worth doing a minor release of the HZ plug-in now, with no changes other than the max server version in the plugin.xml? (I forget the exact propertY name) The current version of the plugin won’t work on the next major version of Openfire, and would be good to head people off at the pass from installing it. |
For a limited subset of data that is stored in (clustered) caches,
ClusterListener
keeps track of what cluster node added what data. It uses a HazelcastEntryEventListener
for this, which is generated for every entry modification. The implementation listens for those, grabs the NodeID for the originating node, the data of interest, and uses that to track what needs to be tracked.When a node initially joins a cluster, it can be presented with a ClusteredCache that already contains data (added by remote nodes). In order to synchronize this preexisting data with the locally stored subset, EntryEvents are simulated for these existing entries. This allows the code that keeps track to simply process a bunch of "entry added" events, which should take care of things.
The related code is in this method:
openfire-hazelcast-plugin/src/java/org/jivesoftware/openfire/plugin/util/cache/ClusterListener.java
Lines 165 to 185 in da9b8e6
My concern here is that in line 175, the local cluster node is used as the 'source' of the event. I believe that this leads to a problem, as it results in the local node being regarded as the node that added the data, instead of the node that originally added the data. This is a problem, as that nodeId is used later to determine what data needs to be processed if a particular cluster node becomes unavailable.
(as an aside: this all assumes that only one node will only modify data, and all other nodes will only read data. That's probably true for a lot of caches, but far from a guarantee)
The text was updated successfully, but these errors were encountered: