Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close connection if server rejects writes while maintaining write connection #1722

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import {
ISignalMessage,
ITokenClaims,
IVersion,
ScopeType,
} from "@microsoft/fluid-protocol-definitions";
import { debug } from "./debug";
import { ReplayController } from "./replayController";
Expand Down Expand Up @@ -230,7 +231,7 @@ export class ReplayDocumentDeltaConnection extends EventEmitter implements IDocu

private static readonly claims: ITokenClaims = {
documentId: ReplayDocumentId,
scopes: [],
scopes: [ScopeType.DocRead, ScopeType.DocWrite],
tenantId: "",
user: {
id: "",
Expand Down
5 changes: 4 additions & 1 deletion packages/loader/container-loader/src/deltaManager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ import {
MessageType,
ScopeType,
} from "@microsoft/fluid-protocol-definitions";
import { createIError, createWriteError, createNetworkError } from "@microsoft/fluid-driver-utils";
import { createIError, createWriteError, createNetworkError, createFatalError } from "@microsoft/fluid-driver-utils";
import { ContentCache } from "./contentCache";
import { debug } from "./debug";
import { DeltaConnection } from "./deltaConnection";
Expand Down Expand Up @@ -812,6 +812,9 @@ export class DeltaManager extends EventEmitter implements IDeltaManager<ISequenc
if (this.readonlyPermissions) {
this.close(createWriteError("WriteOnReadOnlyDocument"));
}
if (this.connectionMode === "write") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we never retry on nack now? If not, then what is the scenario?
Seems like if we do not have write permissions, then readonlyPermissions is true, and we get "WriteOnReadOnlyDocument". If we do have write permissions, then connectionMode is "write", and we get "ServerRejectsWrites".
Is the last case of checking for autoReconnect still relevant then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not sure when nack can happen beyond those two cases - read-only permissions and view-only connection.
I was thinking to wrap both IFs with IF (target === -1), but did not have it for first case, so not sure when target !== -1 can happen. @GaryWilber, @tanviraumi , can you please comment?

I believe checking for this.autoReconnect is still relevant, because if it's neither read-only permissions nor view-only mode (as optimization on top of r/w permissions), then we can reconnect, and we are getting nack because someone is generating an op (presumably because of user actions). In such case, if reconnect is off, that very likely points to a bug in host code that deals with user presence. Bots will not hit it, because they start with "write" mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the server can NACK a client if they send an op that has a reference seq# below the minimum seq#. We should reconnect in that case.


In reply to: 404885819 [](ancestors = 404885819)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the right way to differentiate between those cases? Is testing for target telling anything?
I'd really hate to go with solution that starts counting number of disconnects per period of time as the only signal that something went wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we trying to prevent reconnection attempts when a "read-only permission" user sends an op? So what happens when a user switches from read to write? We have never formalized Nacks very well but in r11s, alfred sends you a nack when you can't write. That means either you don't have read permission or you joined as readonly. In both cases, the value is -1.(https://github.com/microsoft/FluidFramework/blob/master/server/routerlicious/packages/lambdas/src/utils/messageGenerator.ts#L15)

Deli nacks you on different occasions. And in all those occasions client should always reconnect. The value in all those cases is the current min sequence number (https://github.com/microsoft/FluidFramework/blob/master/server/routerlicious/packages/lambdas/src/deli/lambda.ts#L641). The cases are:

  1. There is a gap so kafka might have missed something. Client should reconnect to resend the op.
  2. refseq < minseq.
  3. Stale client.
  4. Tried to summarize without a summary scope. This is more future proofing and r11s/Push always provides a summary claim with the write claim. May be we should not reconnect in this case.

I am pretty sure Deli behaves similarly in Push and r11s. Not sure about the alfred part of Push but @GaryWilber will know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add tenantId-documentId here just to be safe...

Copy link
Contributor

@GaryWilber GaryWilber Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's sent to a specific room, but the above image is what's received on the websocket. Unfortunately the client doesn't know what room that was sent to.
I believe rooms are purely a server-side concept to aid in sending messages to specific clients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gary and I talked. If we need an urgent fix, you can check with -2 and Gary can starting send them from push. We will work on a better way to handle all cases in the meantime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should got with right fix. I can wait of make other temp fix if needed (track rate of disconnects). Can you please share bug / issue links to track?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.close(createFatalError("ServerRejectsWrites"));
}
if (!this.autoReconnect) {
this.logger.sendErrorEvent({ eventName: "NackWithNoReconnect", target, mode: this.connectionMode });
}
Expand Down
19 changes: 19 additions & 0 deletions packages/loader/driver-definitions/src/error.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,33 @@

// tslint:disable: no-unsafe-any
export enum ErrorType {
// Some error, most likely an exception caught by runtime and propagated to container as critical error
generalError,

// Some non-categorized (below) networking error
genericNetworkError,

// Access denied - user does not have enough privileges to open a file, or continue to operate on a file
accessDeniedError,

// File not found, or file deleted during session
fileNotFoundError,

// Throttling error from server. Server is busy and is asking not to reconnect for some time
throttlingError,

// Service error. Not used
serviceError,

// Summarizing error. Currently raised on summarizing container only.
// Work is planned to propagate these errors to main container.
summarizingError,

// User does not have write permissions to a file, but is changing content of a file.
// That might be indication of some component error - components should not generate ops in readonly mode.
writeError,

// Some fatal server error (usually 500).
fatalError,
}

Expand Down
5 changes: 3 additions & 2 deletions packages/loader/driver-utils/src/network.ts
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,6 @@ class WriteError extends ErrorWithProps implements IWriteError {
}
}

export const createWriteError = (errorMessage: string) => (new WriteError(errorMessage) as IError);

/**
* Fatal error class - when the server encountered a fatal error
*/
Expand Down Expand Up @@ -151,3 +149,6 @@ export function createNetworkError(
}
return new GenericNetworkError(errorMessage, statusCode, canRetry, online);
}

export const createWriteError = (errorMessage: string) => (new WriteError(errorMessage) as IError);
export const createFatalError = (errorMessage: string) => (new FatalError(errorMessage) as IError);