-
Notifications
You must be signed in to change notification settings - Fork 1
Now overseer node doesn't update prs state #157
base: release/8.8
Are you sure you want to change the base?
Conversation
Hi @hiteshk25 . The fix looks correct. However, I tested it and I still see timeout. Maybe there are other places where we need to fix .I'll do further testing |
…e in the end so that other nodes get a single refresh message
Cool to see that we are calling the update once per collection instead of per replica! Some minor thoughts (not directly related to this PR) for @noblepaul :
|
Even the updates to ZK done by
This is actually true. We can use the same mechanism for non-PRS collections too. As PRS was newly introduced we didn't want to affect the critical path |
Thank you @noblepaul ! I want to make sure I understand your answer correctly: 😊
I assume you are referring to the underlying
I could see that when state updates are guarded by
There's no bad If I understand correctly, using direct Are there any strong reasons that we cannot use |
look at the following set of ops
in the above example, until 1.d is done the client should not assume that for non-PRS collections , it sends as many add-replica ops as there are replicas. It is sub-optimal . However, there is a new way to update the internal clusterstate without relying on a message as done in MODIFY |
Thank you for the explanations! The flow appears to be simpler for non PRS collection which all state.json update is taken care of by a single thread and single updater (Overseer$ClusterStateUpdater). Once we have other code logic that can directly modify the state.json outside of such updater, we will need to do more handlings/workarounds (for example to force sync) and there could be unforeseen edge cases (race condition etc). @noblepaul would be great for me to understand a bit more on the higher level design regarding:
|
In general prs state should be updated by data node only. I tested this fix 10 times on playpen but unable to reproduce collection creation issue with prs. Without it was able to reproduce this in 2/3 times.
We need look more throughly prs related code