-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change elected summarizer when not summarizing for too long #6335
Conversation
■ @fluidframework/base-host: No change
⯅ @fluid-example/bundle-size-tests: +1.63 KB
Baseline commit: 9758a97 |
@@ -200,20 +257,14 @@ export class SummaryManager extends EventEmitter implements IDisposable { | |||
return { shouldSummarize: false, stopReason: "parentShouldNotSummarize" }; | |||
} else if (this.disposed) { | |||
return { shouldSummarize: false, stopReason: "disposed" }; | |||
} else if (this.orderedClients.getSummarizerCount() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (this.orderedClients.getSummarizerCount() > 0) {
based on the comment i get why we need to remove shouldStart, but is the case handled elsewhere? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it that now refreshSummarizer will detect this state based on this.electedSummarizerManager.electedClientId, so while getShouldSummarizeState will return true, it won't be electect so it it won't run? if so, i wonder if we should rename getShouldSummarizeState to canSummarizeState, or supportsSumarizingState, as whether it should or not is really controlled by election
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you could move the newSummarizerClientId logic and event emitting from refreshSummarizer to getShouldSummarizeState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just no longer wait for the summarizer client(s) to leave the quorum.
@@ -320,7 +371,7 @@ export class SummaryManager extends EventEmitter implements IDisposable { | |||
|
|||
private tryRestart(): void { | |||
const shouldSummarizeState = this.getShouldSummarizeState(); | |||
if (shouldSummarizeState.shouldSummarize && shouldSummarizeState.shouldStart) { | |||
if (shouldSummarizeState.shouldSummarize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldSummarize
is only true if it is the client that should summarize (i.e. it is elected). So this still holds the same as before. Previously shouldStart
could be false if waiting for the previous lingering summarizer client to leave the quorum.
Still reviewing, but i think we should wait and check this in after the 0.41 release, so it has the full amount of time to bake in our stress runs. also sorry it took sooooo long to review here, been having trouble finding a solid chuck of time to really dig in here In reply to: 858809955 In reply to: 858809955 |
) { } | ||
public getAllEligibleClients(): ITrackedClient[] { | ||
return this.orderedClientCollection.getAllClients().filter(this.isEligibleFn); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if it would be easier to just keep a map in this class as well as it will also remove the need for this._eligibleCount. i tend to dislike any kind of ref counting, as it's easy to break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I have some question comments, but no major concerns. Please don't merge until after 0.41 cuts to ensure this has time to bake in stress.
@@ -604,11 +606,13 @@ export class ContainerRuntime extends TypedEventEmitter<IContainerRuntimeEvents> | |||
}; | |||
const chunks = await tryFetchBlob<[string, string[]][]>(chunksBlobName) ?? []; | |||
const metadata = await tryFetchBlob<IContainerRuntimeMetadata>(metadataBlobName); | |||
const electedSummarizer = await tryFetchBlob<ISerializedElection>(electedSummarizerBlobName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const electedSummarizer = await tryFetchBlob(electedSummarizerBlobName);
do we want a new blob here? would it make sense in metadata? is it fine due to aggregation? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. I've done it both ways for now (in a separate commit).. open to suggestions. @vladsud
this._hasAnySummarizersInQuorum = this.clientElection.getSummarizerCount() > 0; | ||
} | ||
|
||
public serialize(): ISerializedElection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole serialize/deserialize flow seems over complicated with deserialization decoupled in orderedClientElection and loadOrderedClients. can we just add a static create funciton to this class that takes the serialize form, and whatever else is needed, rather than spreading across muliple classes and funcitons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I can simplify it, which will probably make it easier to understand, but the intention in creating & loading both the OrderedClientCollection and all OrderedClientElections in a separate function is that it ensures they are instantiated together. It is important that they are created synchronously together, and this gives that direction via code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to just call all the constructors plainly in ContainerRuntime to show the difference, but I think it makes more sense to have the separate load function. The comparison is easy to see by looking at only this commit: 2a1dfb2
packages/runtime/container-runtime/src/summarizerClientElection.ts
Outdated
Show resolved
Hide resolved
packages/runtime/container-runtime/src/summarizerClientElection.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd still like the deserialization code cleaned up, but otherwise looks good.
Some high level thoughts:
|
Fixes #5222
Fixes #1791
The goal of the change is to have all clients unanimously and deterministically change their elected summarizer client whenever too many ops pass without a summary ack. In order to ensure eventual consistency here, the decision is purely based on inbound op processing: we use the sequence numbers of summary acks and other ops to figure it out.
The current behavior is that the oldest interactive client is responsible for summarizing. If 3000 inbound ops are seen since the last summary ack, that client will be marked as invalid, and the next oldest client will be elected. If no summary ack is seen since that client is elected, we use the first inbound op since they were elected as the baseline for the sequence number comparison.
Implementation Details
OrderedClientCollection - doubly linked list which tracks all the clients in the Quorum, sorted by their join sequence number.
OrderedClientElection - this is an adapter over the OrderedClientCollection which allows the caller to specify an
isEligibleFn
function to filter out non-interactive clients (or other criteria) and track a client as an "elected" client. It provides APIs to increment and reset the elected client. The rule is that the oldest eligible client is the elected client, and when increment is called the next oldest eligible client is elected instead (which means it should only be called deterministically for all clients to achieve eventual consistency). At any point, the elected client can be reset back to the oldest eligible client again, which should also be deterministically called.The above 2 classes achieve something similar to the old leader election or task picking, but without requiring any additional ops or signals to be used. This is because the information is based on other ops already being passed around: join op sequence numbers, summaryAck sequence numbers, and other op sequence numbers.
ElectedSummarizerManager - this is a helper class used by SummaryManager to handle electing which client is responsible for generating summaries. It leverages the above OrderedClientElection to achieve this, and decides to reelect after some number of ops have passed since the last successful summary. If all clients are elected, it directly resets and goes back to electing the oldest client again.
Added tests for all 3 above classes.