Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Client Side Only Op Processing Controller #4718

Merged

Conversation

anthony-murphy
Copy link
Contributor

This change improves the op processing controller to run more reliably when only used client side. This will allow it to run our tests against other servers like r11s, and odsp, as we no longer need to leverage server data to orchestrate op processing.

related #4655 #4660

@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Jan 5, 2021

@fluidframework/base-host: -23 Bytes
Metric NameBaseline SizeCompare SizeSize Diff
main.js 166.79 KB 166.76 KB -23 Bytes
Total Size 166.79 KB 166.76 KB -23 Bytes
@fluid-example/bundle-size-tests: No change
Metric NameBaseline SizeCompare SizeSize Diff
container.js 191.17 KB 191.17 KB No change
map.js 45.84 KB 45.84 KB No change
matrix.js 144.52 KB 144.52 KB No change
odspDriver.js 193.22 KB 193.22 KB No change
sharedString.js 158.46 KB 158.46 KB No change
Total Size 733.2 KB 733.2 KB No change

Baseline commit: 6eae286

Generated by 🚫 dangerJS against df161bb

@github-actions github-actions bot requested a review from vladsud January 5, 2021 02:06
private readonly lastInboundPerClient = new Map<string, ISequencedDocumentMessage>();
private pendingWriteConnection = false;

public expectingInboundFrom(outbound: DeltaManagerMonitor): boolean {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have a comment above it describing what it tries to achieve.
Glancing through code, I'm not sure I'm getting all the details...

// check if we are waiting to see a message from outbound
const lastInboundForOutbound = this.lastInboundPerClient.get(outbound.clientId);
if (lastInboundForOutbound !== undefined) {
return outbound.lastOutbound.clientSequenceNumber > lastInboundForOutbound.clientSequenceNumber;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this check can lie due to noop coalescing on the server. I.e. it can result in false positive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure how that would happen. no-op are excluded from lastOutbound and server NoOp should have a null client id so won't be tracked in lastInboundPerClient.


// has pending work will be true for outbound until it receives it's own seq
// this check ensures the other client has seen the same ops as the outbound
return outbound.latestSequenceNumber > this.latestSequenceNumber;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to answer slightly different question - "is there any work", not "is there work from given client".
I.e. should it return false once we got here, similar to outbound.lastOutbound === undefined?

}

// check if we are waiting to see a message from outbound
const lastInboundForOutbound = this.lastInboundPerClient.get(outbound.clientId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes there are no disconnects, i.e. clientId does not change through tests, right? Is this important condition? I.e. should this class assert if disconnects happened (as part of UT mimicking disconnection logic) as this code would start lying in some cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we clear lastOutbound on disconnect. i think this can lead to missing outbound work if the sending client disconnects with ops in flight, but that was already the case for this class. i don't think we can assert on disconnects, as this should work against real servers, so disconnects can happen

// If we are not connected, we assume that we are trying to, and classify as pending work
return this.clientId === undefined || this.pendingCount !== 0;
return !this.deltaManager.disposed
&& (this.pendingWriteConnection || this.pendingCount !== 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE pendingWriteConnection: There is really nothing requiring IRuntime to resubmit an op on a nack.
I.e. runtime may chose to not submit anything on reconnection. Great example of that would be potential improvement in Consensus Registry DDS, where it would not resent an op where it knows for sure it would be ignored by all clients (today we resend everything, i.e. we do not do any optimizations).

Can we make this logic more generic and not rely on any particular behavior of IRuntime around nacks?

Copy link
Contributor Author

@anthony-murphy anthony-murphy Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i poked around a bit here when making this change, but this transition is hard to track reliably. open to other options if you have ideas about how to make this better

const systemLeaveMessage = message as ISequencedDocumentSystemMessage;
const clientId = JSON.parse(systemLeaveMessage.data) as string;
this.lastInboundPerClient.delete(clientId);
}

if (this.clientId === undefined) {
// Ignore message when we are not connected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know if it matters, but logic below comparing of message.clientId !== this.clientId does not correctly account how disconnect/reconnect work, because we register for wrong event in constructor:

deltaManager.on("connect", (details) => this.connect(details.clientId));

clientId should be switched only when Container raises "connected" event, which happens only after inbouding all ops up to own join op. Delta Manager's "connect" fires too early and is not propagated to IRuntime, it should only be used by hosts (exposed from Container as "connect" event) for connected/disconnected banner logic.

I've opened an issue somewhere (assigned to Matt) to reevaluate slit of logic between Container & DeltaManager, as it's should be in one place (today, while DeltaManager sort of does not know about this, it actually does - there are couple places where it does unnatural things to account for this discrepancy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. the op processing controller is in a bit of a weird place. in general, i'd like to move away from it, as it's not reliable for a lot of the reasons you call out. in general i'd rather see tests await more deterministic side effects but refactoring for that is not something i think we have time for currently.

@github-actions github-actions bot requested a review from vladsud January 5, 2021 17:46
@anthony-murphy anthony-murphy merged commit a12bc56 into microsoft:main Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants