-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close connection if server rejects writes while maintaining write connection #1722
Conversation
…neciton Fix Fluid debugger;
@@ -812,6 +812,9 @@ export class DeltaManager extends EventEmitter implements IDeltaManager<ISequenc | |||
if (this.readonlyPermissions) { | |||
this.close(createWriteError("WriteOnReadOnlyDocument")); | |||
} | |||
if (this.connectionMode === "write") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we never retry on nack now? If not, then what is the scenario?
Seems like if we do not have write permissions, then readonlyPermissions
is true, and we get "WriteOnReadOnlyDocument". If we do have write permissions, then connectionMode
is "write", and we get "ServerRejectsWrites".
Is the last case of checking for autoReconnect
still relevant then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm not sure when nack can happen beyond those two cases - read-only permissions and view-only connection.
I was thinking to wrap both IFs with IF (target === -1), but did not have it for first case, so not sure when target !== -1 can happen. @GaryWilber, @tanviraumi , can you please comment?
I believe checking for this.autoReconnect is still relevant, because if it's neither read-only permissions nor view-only mode (as optimization on top of r/w permissions), then we can reconnect, and we are getting nack because someone is generating an op (presumably because of user actions). In such case, if reconnect is off, that very likely points to a bug in host code that deals with user presence. Bots will not hit it, because they start with "write" mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the server can NACK a client if they send an op that has a reference seq# below the minimum seq#. We should reconnect in that case.
In reply to: 404885819 [](ancestors = 404885819)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the right way to differentiate between those cases? Is testing for target telling anything?
I'd really hate to go with solution that starts counting number of disconnects per period of time as the only signal that something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we trying to prevent reconnection attempts when a "read-only permission" user sends an op? So what happens when a user switches from read to write? We have never formalized Nacks very well but in r11s, alfred sends you a nack when you can't write. That means either you don't have read permission or you joined as readonly. In both cases, the value is -1.(https://github.com/microsoft/FluidFramework/blob/master/server/routerlicious/packages/lambdas/src/utils/messageGenerator.ts#L15)
Deli nacks you on different occasions. And in all those occasions client should always reconnect. The value in all those cases is the current min sequence number (https://github.com/microsoft/FluidFramework/blob/master/server/routerlicious/packages/lambdas/src/deli/lambda.ts#L641). The cases are:
- There is a gap so kafka might have missed something. Client should reconnect to resend the op.
- refseq < minseq.
- Stale client.
- Tried to summarize without a summary scope. This is more future proofing and r11s/Push always provides a summary claim with the write claim. May be we should not reconnect in this case.
I am pretty sure Deli behaves similarly in Push and r11s. Not sure about the alfred part of Push but @GaryWilber will know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add tenantId-documentId here just to be safe...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it's sent to a specific room, but the above image is what's received on the websocket. Unfortunately the client doesn't know what room that was sent to.
I believe rooms are purely a server-side concept to aid in sending messages to specific clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gary and I talked. If we need an urgent fix, you can check with -2 and Gary can starting send them from push. We will work on a better way to handle all cases in the meantime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should got with right fix. I can wait of make other temp fix if needed (track rate of disconnects). Can you please share bug / issue links to track?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forking Fluid debugger fix into #1773 |
+Fix Fluid debugger