Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client should move to "connected" state after it has been added to audience #7275

Closed
Tracked by #7995
agarwal-navin opened this issue Aug 27, 2021 · 11 comments
Closed
Tracked by #7995
Assignees
Labels
ado Issue is tracked in the internal Azure DevOps backlog area: loader Loader related issues area: runtime Runtime related issues status: stale
Milestone

Comments

@agarwal-navin
Copy link
Contributor

agarwal-navin commented Aug 27, 2021

[Update]: See #7275 (comment) below, there's a service change that makes this much easier to achieve, so we're waiting for that.

Currently, there are no guarantees around when a client is added to the audience. For write mode clients, they are added to audience most definitely before they move to "connected" state. However, this is not true to read mode clients which may or may not have been added to audience by the time the client moves to "connected" state.

We should make this flow consistent for all clients - If a client is in "connected" state, it is already in the audience. We can implement this as follows:

  • Add a "filter" between Container and its ConnectionStateHandler that delays moving to "connected" state (by 2 seconds or so) if self is not in the audience. Having a "filter" will isolate the logic into a separate layer without complicating existing layers. If the timer expires and self isn't still in audience, we move to "connected" state but log a telemetry event.
  • Swap the order of raising connected state and adding previous clients to audience as per the following code in Container:
    deltaManager.on("connect", (details: IConnectionDetails, opsBehind?: number) => {
              this.connectionStateHandler.receivedConnectEvent(
                  this._deltaManager.connectionMode,
                  details,
                  opsBehind,
              );
    
              // Back-compat for new client and old server.
              this._audience.clear();
    
              for (const priorClient of details.initialClients ?? []) {
                  this._audience.addMember(priorClient.clientId, priorClient.client);
              }
          });
    
    This will ensure that existing clients are also added to the audience before a client moves to "connected" state.
  • Monitor the telemetry to gather data around how often we move to "connected" state without self being in audience (the telemetry mentioned above). Depending on this, we need to decide on next steps. For example, if a client is never added to the audience, it may break features such as presence that are built on top of it. So, we may need to rethink our assumptions and come up with a plan such as pulling audience from server it something is missing.
  • Update the audience end to end test as per the resolution here - https://github.com/microsoft/FluidFramework/blob/6e86fa8cbe29730212bc9bba2face540ef4a7883/packages/test/test-end-to-end-tests/src/test/audience.spec.ts
@tyler-cai-microsoft
Copy link
Contributor

tyler-cai-microsoft commented Dec 17, 2021

When we call getCurrentTimestamp in the ContainerRuntime, we would want to pull the connected timestamp from the audience and store it in a variable such as connectedTimestamp once we are in a connected state.
#8580 (comment)

Follow up issue: #8584

@vladsud
Copy link
Contributor

vladsud commented Jan 10, 2022

@GaryWilber, @tanviraumi , do we have guarantees that initial clients always contain self? If yes, this feature is already implemented.
If no, I do not think we can assume that client will always see itself in audience as signals can be lost. And thus, there is no way to implement it (other than wait and bail out on timeout)

@vladsud vladsud modified the milestones: January 2022, February 2022 Jan 10, 2022
@GaryWilber
Copy link
Contributor

@vladsud There is not guarantee for that in Push. I'm not sure if r11s is the same.

Push grabs the client list before adding the new client and will send that as initialClients. Then it sends the room join signal for the new client.
Would you want us to update the logic to always include the new client (self) in initialClients?

@vladsud
Copy link
Contributor

vladsud commented Jan 10, 2022

I'd say that based on feedback from our parners, they have really hard time reasoning about order of events and how / when to find self in audience. So, any simplification we can make here is welcome, but needs to be balanced with complexity of service.
I'd say that we have much bigger problem of potentially losing such signals that we talked many times and I think we will need to come back to that later. But if we can ensure that at least self-presence in audience is guaranteed (at the moment), it's substantial jump in quality of service, even if it happens async (as client can wait for such event). But I think your suggestion is the only way to make it guaranteed, so yes, if possible, that would be great addition

@vladsud
Copy link
Contributor

vladsud commented Jan 30, 2022

We can't start working here until we have service design (and implementation) that ensures client always gets its own join signal. Discussions are ongoing about changes in PUSH RE join/leave signal processing.

@vladsud vladsud removed their assignment Feb 17, 2022
@markfields
Copy link
Member

Moving to April. Meeting with Gary and Tanvir next week to catch up on progress. Design was reviewed earlier this month and implementation is underway for ODSP, and we will discuss any work required for FRS as well.

@markfields
Copy link
Member

See #1733 for additional back-story / context

@markfields
Copy link
Member

markfields commented Apr 11, 2022

Blocked on service work #9191 and related ODSP and FRS changes to switch to the new model for managing audience. Moving to "next" milestone (we'll still track this work via parent epic anyway)

Once that's in, we'll need to update the logic in the client (loader layer) to listen for the Join signal for both read/write and only issue connected event at that point.

@markfields markfields modified the milestones: April 2022, Next Apr 11, 2022
@markfields markfields added the area: loader Loader related issues label Apr 11, 2022
@markfields
Copy link
Member

#9191 is in, @GaryWilber how can we track progress on the ODSP and FRS work to switch to the new model?

@GaryWilber
Copy link
Contributor

Waiting on #9925 now

@curtisman curtisman added the ado Issue is tracked in the internal Azure DevOps backlog label Jun 20, 2022
@ghost ghost added the status: stale label Dec 18, 2022
@ghost
Copy link

ghost commented Dec 18, 2022

This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

@ghost ghost closed this as completed Dec 26, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ado Issue is tracked in the internal Azure DevOps backlog area: loader Loader related issues area: runtime Runtime related issues status: stale
Projects
None yet
Development

No branches or pull requests

6 participants