-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More telemetry, asserts and some recovery for cases where client is not up to date #7422
Conversation
@@ -137,44 +137,55 @@ export class OdspDeltaStorageWithCache implements IDocumentDeltaStorageService { | |||
let opsFromCache = 0; | |||
let opsFromStorage = 0; | |||
|
|||
const stream = requestOps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: there are no changes here (code moved around and indentation change). Other than added asserts on lines 183+.
@@ -135,9 +135,9 @@ export class OpsCache { | |||
break; | |||
} | |||
if (messages.length === 0) { | |||
if (op.sequenceNumber > from + 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: That's defiantly a bug - it treated "from" as exclusive boundary, while all the code has been switched to treat "from" as inclusive. Added asserts in other layers to catch issues like that, such that we can find them sooner (and validate this code change as well).
// actionably to report gaps in this range. | ||
this.enqueueMessages(pendingSorted, `${reason}_pending`, true /* allowGaps */); | ||
|
||
// See issue #7312 for more details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is the main chance in behavior.
This code get hit a lot (in stress tests) in "read" mode and that condition is describe in comments.
I hit it only once for "write" connection and that's where I believe we have the key to understanding better the underlying problem that is described in 7312. This new code will help "fix" about 50% of cases where we do not see join op for a long time, but telemetry we get in such cases should have a key to why we run into these problems, and hopefully - path to full fix.
⯅ @fluidframework/base-host: +632 Bytes
⯅ @fluid-example/bundle-size-tests: +665 Bytes
Baseline commit: d4b1448 |
This PR has been split into 4 individual PRs, closing |
More progress on understanding Issue #7312: