-
Notifications
You must be signed in to change notification settings - Fork 1
Test listeners for state.json and prs state update #165
base: release/8.8
Are you sure you want to change the base?
Conversation
@@ -104,6 +104,8 @@ public class Overseer implements SolrCloseable { | |||
|
|||
private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); | |||
|
|||
public volatile OverseerMessageListener testOverseerMessageListener = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 nice to validate overseer activities to catch unexpected calls. I assume this is only added temporarily for verification purpose?
Also we might want to consider to go even lower level (sub in a testing Zk client), as some of the existing zk ops on state.json is done directly to ZK bypassing this overseer processing. Having such test zk client can even cast a wider net to ensure we don't miss any sneaky 🐟
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do we see any issue if that remains there (or any other way we can capture this)? We want to write tests for each prs-workflow (captured in other doc)
- As of now, we want all state.json update from overseer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't think it's a huge issue, personally I would rather not see any testing code in the execution flow if possible. If such listener is useful, perhaps we can consider introducing it as an actual listener (non-test) with proper setter/getter method/init in constructor as final field etc
- Yes, adding such listener might be sufficient to catch most state.json updates. However, there are also direct writes of state.json, such as this that bypasses both
Overseer
andZkStateReader
. Therefore we might want to go lower level atZkClient
so we do not miss any of those writes. It might be critical to ensure:- No direct writes to
state.json
outside of overseer - No writes of
state.json
in any form - whether via overseer the class or not, from data node
- No direct writes to
So we might need ZkClient
level interceptor as well to verify the above
8af9e51
to
bcd4e5c
Compare
bcd4e5c
to
f7b4f2c
Compare
@@ -80,6 +80,7 @@ | |||
|
|||
public class ZkStateReader implements SolrCloseable { | |||
public static final int STATE_UPDATE_DELAY = Integer.getInteger("solr.OverseerStateUpdateDelay", 2000); // delay between cloud state updates | |||
public volatile PerReplicaStateUpdateListener testPerReplicaStateUpdateListener = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can move this to PerReplicaStatesOps
as a private static volatile PerReplicaStateUpdateListener listener;
add a public static setter method there, then we don't need to modify the method signature of PerReplicaStatesOps#persist
(and all the callers 😊 ). In PerReplicaStatesOps#persist
we can simply check listener != null
as you have done similarly in ClusterStateUpdater
with testOverseerMessageListener
?
We can then set it PerReplicaStatesOps.setListener(...)
from FSPRSTest#testPRSCollectionCreation
instead, which makes it default as null for normal execution flow? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ZkStateReader is singleton class at coreContainer level. that's why I kept variable at that level. I didn't want static variable as we create multiple coreContainer in test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static variable is also guaranteed a singleton (within the same classloader - well same as ZkStateReader singleton anyway)and it's closer to where it's used (PerReplicaStateOps), which method persist
is defined.
The major benefit of that is it avoids any method signature change, hence we do not have to change any of the 5 invocations 😊 . And it's also consistent with your Overseer
change with the user of such listener is defined at the user level and no need for method signature changes for the actual invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see what u mean now! 💡 since in ur test cases u started 2 JettySolrRunner
so u r setting it per runner/coreContainer, as it is necessary to make such differentiation even within the same classloader/JVM!
Can we embed such testPerReplicaStateUpdateListener
into the ZkClient instead? My concern is that the new method signature public void persist(String znode, SolrZkClient zkClient, PerReplicaStateUpdateListener stateUpdateListener)
is not entirely clear on how to obtain such stateUpdateListener
- dev probably wouldn't know there's an instance embedded in ZkStateReader
that easily. If we can hide that as much as we can (put it in zkClient), then dev probably wouldn't need to deal with that? 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Let's keep ZkClient for separate PR if require (Or we can use zkversion of node in some cases ).
- I'm not sure it is using classloader. I think it just creating different instances
- ok, persist api issue. This is api argument to listen the events. Unfortunately, I'm not finding any better way to solve this issue as PerReplicaStatesOps instance is created for each update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting such listener as a part of ZkClient should solve 3. here? TBH, ZkClient has already provided whole bunch of other "info" so adding a PRS op listener shouldn't be too bad? 😅
And we can even keep the current setter in ZkStateReader
but instead of setting it to the current field, simply do zkClient.setPrsListener(...)
?
.process(cluster.getSolrClient()); | ||
cluster.waitForActiveCollection(COLL, 1, 1); | ||
|
||
assertTrue("overseer node should not update prs state " + overseerNodePrsListener, overseerNodePrsListener.events.size() == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here to indicate that it is still being updated by overseer node for now? otherwise reader might get confused with the error message and the actual check condition (which does allow 1 update)
cluster.waitForActiveCollection(COLL, 1, 1); | ||
|
||
assertTrue("overseer node should not update prs state " + overseerNodePrsListener, overseerNodePrsListener.events.size() == 1); | ||
assertTrue("data node should have 3 prs state update " + dataNodePrsListener , dataNodePrsListener.events.size() == 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice that we can compare the data node update vs os node update now! 💪🏼
This has major advantage over my other test testZkNodeVersions
which checks the znode version and child version but does not know who updates it.
We can consider removing testZkNodeVersions
in the future. However, there's one thing that testZkNodeVersions
catches while this test case does not - there are direct updates on state.json performed by overseer node, which NOT caught by dataNodeOverseerListener as it does not go through the overseer q 😞 (so it's actually 2 updates to state.json from overseer instead of 1). This is the update : https://github.com/fullstorydev/lucene-solr/blob/release/8.8/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L323 . in order to catch it we might need to instrument even lower level (zkclient) as discussed earlier
…eplicaStatesOps#persist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
As discussed, changing method signature persist
is hard to get around unless we do refactoring, for example having a PerReplicaStatesOps factory.
We can probably improve those in the future!
Eventually need to write test for each prs workflow https://docs.google.com/document/d/17fCMTMV3P1XoB8HatbVsuwavuOorAVLdOinCR6k_HYs/edit