Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not summarizing on first connection #4730

Closed
arinwt opened this issue Jan 5, 2021 · 11 comments
Closed

Not summarizing on first connection #4730

arinwt opened this issue Jan 5, 2021 · 11 comments
Assignees
Labels
area: runtime Runtime related issues bug Something isn't working
Milestone

Comments

@arinwt
Copy link
Contributor

arinwt commented Jan 5, 2021

There seems to be an issue where the first connection to a document isn't summarizing.

Maybe related to detached creation flow?

Reproduce by creating a new Fluid document (or collaborative meeting) and enter some text. Wait 15+ seconds, and then refresh. It will not have summarized.

@ghost ghost added the triage label Jan 5, 2021
@leeviana
Copy link
Contributor

leeviana commented Jan 5, 2021

Also determined to not be caused by snapshot caching as this repros with multiple refreshes if you don't give it time to summarize in between.

@curtisman curtisman added area: runtime Runtime related issues and removed triage labels Jan 8, 2021
@curtisman curtisman added this to the January 2021 milestone Jan 8, 2021
@curtisman curtisman added the bug Something isn't working label Jan 8, 2021
@curtisman
Copy link
Member

curtisman commented Jan 8, 2021

I was writing unit test for the "leader" event and noticed an odd behavior that the container created using detached creation would lose the leadership immediately by AgentAcheduler.clearRunningTasks in AgentScheduler.initialize() when the AgentScheduler transition from detached to attached state.

        if (this.runtime.attachState === AttachState.Detached) {
            this.runtime.waitAttached().then(() => {
                this.clearRunningTasks();
            }).catch((error) => {
                this.sendErrorEvent("AgentScheduler_clearRunningTasks", error);
            });
        }

So the first created container never becomes a leader. That may be why the summarizer never ran. This results in new document doesn't summarize without reloading or have other client join the session. @jatgarg re: detached create flow

@curtisman
Copy link
Member

curtisman commented Jan 8, 2021

After debugging further, I think it is a problem with the AgentScheduler itself, not necessary specific to detached create flow. In the detached create flow case, it is correct yp lose leadership when trying attach the container to storage. But the bug is that the leader isn't being reassign back after attach.

The problem with with how AgentScheduler clean up the value in the consensus collection. With multiple client, the task's clientId is clean up by other clients on removeMember event from the quorum. The last client doesn't clean up themselves, and rely on the next client that connect to clean it on when the AgentScheduler is reloaded (in AgentScheduler.initiallizeCore). The cleanup of the value is done after the attempt of reassignment of the task.

In the detached create flow case, AgentScheduler go thru multiple state transition when we attach the container (disconnect and then connect), and after some op is sent to get the container out of "view only" mode, the runtime finally connect. The AgentScheduler is then be receive the connect event and be active and call AgentScheduler.initializeCore again. However at that time, the clientId for the leader task is still assigned to the last clientId, and we won't reassign it again.

@vladsud
Copy link
Contributor

vladsud commented Jan 9, 2021

@curtisman, leader election is not used in determination of summarizer. It is not a consensus process, it's calculation process (oldest client in quorum is a summarizer). So bug in AgentScheduler is not affecting summarizer (and this issue)

@curtisman
Copy link
Member

Yep. Jump to conclusion too quickly. this is unrelated.

@curtisman
Copy link
Member

@arinwt I am not able to repro this directly in app itself. After creating a new document and type something, it always summarize after 15s (seeing GenerateSummary*, SummaryOp/SummaryAck messages using localStorage.debug="fluid:*").

Is there other repro step I should try?

@leeviana
Copy link
Contributor

I reproed this in OWA with the meeting notes flow. I created a new meeting notes section, waited for document attach, typed and waited 15 seconds+ after for a summary to happen.

Then I closed and reopened the same meeting over and over again. As long as I didn't leave the new meeting open for 15 seconds (which would have triggered another summary because OWA now adds an op on reopen), I could easily see that the collab space was replaying ops from document attach onward - no summary was taken.

@vladsud
Copy link
Contributor

vladsud commented Jan 11, 2021

I'd recommend to leave it longer (up to a minute) and see what happens. In my testing I quite often see summaries taking longer than 15 seconds for completely idle clients. It's possible that some ops are sent here and there that delay the timer (without any user interaction). As a general statement, system does not provide any specific guarantee for when summaries are running.

@leeviana
Copy link
Contributor

I had left it up to a minute and more :) I have also tested on subsequent loads that after typing and waiting 15 seconds that a summary DID happen.

@leeviana
Copy link
Contributor

I saw similar behavior in Teams as well and verified by using teams.microsoft.com with debugPurple = 1 filter logs by 'summary'. Waited for much more than 15 seconds and nothing came in.

@curtisman
Copy link
Member

I have tested with OWA (meetings), Teams on the Web (chat and meetings) and the editing apps. All seems to summarize around 15s mark. All of them shows version 0.31.1. I worked with @leeviana to repo, and she was able to repro with version 0.29 of the loader, but once it updated to 0.31 it doesn't repro either. Seems the issue has been fix I will close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants