-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fluid:telemetry:InitialElectedClientNotFound errors in telemetry #6948
Comments
We should def fix the prefix here, looks like the raw logger is getting past rather than the summarize's or contianerRuntime's sub-logger |
After reviewing the telemetry, it looks like the bug is attributed to an older summary (at seq This needs to be fixed in the scribe lambda, I think rejecting summaries that are older than the current protocol state is a must, maybe a regression from #934? I still see similar code that now lives in SummaryWriter, but I'm not sure what is used with other server implementations. Additionally, I think we should fail faster, by recording the reference sequence number of the runtime in the .metadata blob and failing immediately on load if that doesn't match the one in the .protocol tree. Opened issue #7002 for this, and PR #7015 to address it. |
Is there something particular about the way the stress test issues summary nacks that could be bypassing the original fix in the scribe lambda? Or is it more likely that something (a race?) is causing that fix not to work? |
This seems to have dropped off the radar. No hits in current Kusto data. |
There are no events in Prod, our scalability tests or ODSP scalability tests. |
We do see these events in automation:
union office_fluid_ffautomation_*
| where Data_eventName contains "InitialElectedClientNotFound"
| summarize count() by Data_eventName
I see 3977 cases.
Plus this event should have some reasonable prefix, i.e. fluid:telemetry:OrderedClientElection:InitialElectedClientNotFound
The text was updated successfully, but these errors were encountered: