Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add telemetry on number of ops we are hoarding for delay-loaded data stores / DDSs #4616

Closed
vladsud opened this issue Dec 16, 2020 · 0 comments · Fixed by #6838
Closed

Add telemetry on number of ops we are hoarding for delay-loaded data stores / DDSs #4616

vladsud opened this issue Dec 16, 2020 · 0 comments · Fixed by #6838
Assignees
Labels
focus Items that engineers are focusing on now, but may not have any (coding) outcome in current milestone good first issue Good for newcomers perf telemetry
Milestone

Comments

@vladsud
Copy link
Contributor

vladsud commented Dec 16, 2020

Today, we hoard all the ops for unrealized data stores / DDSs. It may take a long time until we need to load data store / DDS and thus it's possible number of ops we need to process might be huge. It would be great to have statistics on two fronts:

  1. Max of total number of ops we are hoarding in a session (if possible, of approximation). This will tell the impact on memory footprint and potential need to stop doing it and "switch" to using more recent snapshots if possible.
  2. Number of ops we have to process when realizing data store / DDS, i.e. impact on perf (time) directly.

Here are some references to the code:
FluidDataStoreContext.pending is the array that tracks all ops for data store if data store is not realized.
We process these ops in bindRuntime() and it would be great at least add telemetry event here with number of ops we process if it's higher than some number (otherwise event would be too noisy).
Ops are added in process() method - similar it might be great to record every time it is multiple of 1000 or something like that.

Similar logic exists in RemoteChannelContext. Methods are processOp() & loadChannel() respectively.

I'd skip LocalChannelContext for that task as it's a corner case (serializing and loading back detached container, attaching that container, with some channels not yet realized?)

@vladsud vladsud added bug Something isn't working perf focus Items that engineers are focusing on now, but may not have any (coding) outcome in current milestone labels Dec 16, 2020
@vladsud vladsud added this to the January 2021 milestone Dec 16, 2020
@curtisman curtisman added telemetry and removed bug Something isn't working labels Jan 8, 2021
@curtisman curtisman modified the milestones: January 2021, February 2021 Jan 11, 2021
@curtisman curtisman modified the milestones: February 2021, March 2021 Feb 19, 2021
@curtisman curtisman modified the milestones: March 2021, Next Feb 27, 2021
@vladsud vladsud added the good first issue Good for newcomers label May 27, 2021
@vladsud vladsud modified the milestones: Next, August 2021 Jul 12, 2021
@andre4i andre4i modified the milestones: August 2021, July 2021 Jul 21, 2021
@andre4i andre4i linked a pull request Jul 21, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
focus Items that engineers are focusing on now, but may not have any (coding) outcome in current milestone good first issue Good for newcomers perf telemetry
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants