-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #4730: AgentScheduler doesn't assign the leader on load #4774
Conversation
■ @fluidframework/base-host: No change
⯅ @fluid-example/bundle-size-tests: +82 Bytes
Baseline commit: c432b2b |
I'd describe problem slightly differently - "removeMember" quorum handler has this.isActive() check to ensure that client in read-only connection does not attempt to submit an op. This results in no action as we process leave ops before we get connected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, because when we load, we would have process the the |
Hold on. Is it "read only" mode or "view only" mode that you are talkin about? "read only" mode has no impact, because that client will never write. So the next client that comes along with write permission will still see the "Leave" message, process it to clear the leader and trigger reassignment (whether is connected initially as view only or not). So the issue isn't about "read" mode? This may be unique to detached create flow then, where the "fake" client used during detached never gets a "Leave" message. The fix is probably still good. |
@vladsud, separate related thought: does that mean that if you are the first client you always immediately to from the default view mode to write mode because of leader election? I assume the server would downgrade client back to view only mode after some inactivity too, so if you are the only client, then you will keep on reconnect as view only to write mode? |
Ok. I am completely wrong here. I tried reverting the change in I got confused when I was writing my test case, because when detach create, the resulting container is connected in view only mode. Where as on load, the current default is connect in "write" mode (see The fact is that you will clear and reassigned when we either we see the |
Closing this PR for now, it is not strictly necessary, but avoiding the clear then assign will avoid addition round trip and may be done later |
AgentScheduler
only clean up values in the consensus collection if there are multiple client. The last client that left would not clear up the task assigned in theAgentScheduler
and it rely on the next load of to clear it up.However on load, we try to assign tasks (including leader election) for non assigned task, and that is done before clear up. It ends up the first client won't assign a leader, and leaving session without a leader. If the session disconnect and reconnect, or there is a second client join, then leader will be assigned then.
The fix is to assign the tasks on load if there is no one assigned to it or the one assigned to it is not in the quorum (already left). In which case, we will not try to clear it.
This should fix issue #4730.
Also
LocalDocumentDeltaConnection.close
will need to disconnect the client.