Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GC] Make getAttachGCData required on IFluidDataStoreChannel and make implementation consistent #21347

Merged
merged 2 commits into from
Jun 11, 2024

Conversation

agarwal-navin
Copy link
Contributor

@agarwal-navin agarwal-navin commented Jun 7, 2024

getAttachGCData was added to IFluidDataStoreChannel in 2.0.0-rc.2.0 4 months ago via #19275. This PR does the following:

  • Makes getAttachGCData required on IFluidDataStoreChannel.
  • Makes getAttachGCData logic consistent across channel collection, data store context, data store runtime and channel contexts. I found it confusing to follow the existing flow due to the implementation being different for various layers. This is how it works now:
    • ChannelCollection and FluidDataStoreRuntime both implements getAttachGCData which iterates through all it's children and gets their GC data by calling getAttachGCData.
    • FluidDataStoreContext and LocalChannelContext both implement getAttachGCData which returns the GC data for this node.
    • Decouples getAttachSummary and getAttachGCData in ChannelCollection:generateAttachMessage.

AB#8199

@agarwal-navin agarwal-navin requested review from a team, pragya91, markfields, jatgarg, tyler-cai-microsoft, kian-thompson and rajatch-ff and removed request for a team June 7, 2024 17:58
@github-actions github-actions bot added area: runtime Runtime related issues public api change Changes to a public API base: main PRs targeted against main branch labels Jun 7, 2024
@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Jun 7, 2024

@fluid-example/bundle-size-tests: +1.79 KB
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 457.21 KB 457.66 KB +461 Bytes
azureClient.js 557.47 KB 557.91 KB +456 Bytes
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 259.68 KB 260.13 KB +459 Bytes
fluidFramework.js 367.51 KB 367.51 KB No change
loader.js 134.78 KB 134.78 KB No change
map.js 39.83 KB 39.83 KB No change
matrix.js 142.48 KB 142.48 KB No change
odspClient.js 525.54 KB 525.98 KB +456 Bytes
odspDriver.js 98.29 KB 98.29 KB No change
odspPrefetchSnapshot.js 42.11 KB 42.11 KB No change
sharedString.js 159.28 KB 159.28 KB No change
sharedTree.js 367.49 KB 367.49 KB No change
Total Size 3.23 MB 3.23 MB +1.79 KB

Baseline commit: 15a26ae

Generated by 🚫 dangerJS against 1330409

.filter(
([key, _]) =>
// Take GC data of bounded data stores only.
!this.contexts.isNotBound(key),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr - I would strongly encourage you to use the visitor pattern (similar example) to share the structure of this do-while loop between both functions here.

Reasons/Thoughts:

  • The logic should be the same, but people won't necessarily know to update both places
  • To that point... you are already diverging the logic on this line. Shouldn't it check the same 3 conditions as above?
  • Are there other places the logic differs, and can we keep it the same so we can share the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a change for this but it looks slightly risky. Let me do this in a follow-up PR.

} while (notBoundContextsLength !== this.contexts.notBoundLength());

// Get the outbound routes (aliased data stores) and add a GC node for this channel.
builder.addNode("/", Array.from(this.aliasedDataStores));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here, do we have a way to get the absolute path of this channel collection? For nested data stores, it won't be "/".

Fine to track as a follow-up, if someone would just tell us where!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best way would be to create a handle for this channel and use it's path. Agreed on a follow-up.

@@ -526,9 +528,14 @@ export class ChannelCollection implements IFluidDataStoreChannel, IDisposable {

/** Package up the context's attach summary etc into an IAttachMessage */
private generateAttachMessage(localContext: IFluidDataStoreContextInternal): IAttachMessage {
const { attachSummary } = localContext.getAttachData(/* includeGCData: */ true);
// Get the attach summary.
const attachSummary = localContext.getAttachSummary();
const type = localContext.packagePath[localContext.packagePath.length - 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh funny, this code could have got it from the getAttachData call but didn't.

Having callers do that instead of returning it is good, it frees up this change. I think I never considered that!

@@ -244,6 +241,8 @@ describe("Data Store Context Tests", () => {
dataStoreAttributes.isRootDataStore,
"Local DataStore root state does not match",
);
const type =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth adding a free function somewhere to do this, it's worth naming and codifying in one place IMO.

Copy link
Member

@markfields markfields left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One refactor suggestion, otherwise looks good :)

@agarwal-navin agarwal-navin merged commit f08aa82 into microsoft:main Jun 11, 2024
30 checks passed
agarwal-navin added a commit that referenced this pull request Jun 14, 2024
…sitor pattern (#21379)

Follow up from this comment -
#21347 (comment)

During summarization or attachments, both GC and summarize runs and they
iterate over data stores and channel contexts during summarization. The
logic to filter and validate is duplicated for both which makes it prone
to get diverged.
This PR updates the following set of function to use the
[visitor](https://en.wikipedia.org/wiki/Visitor_pattern) pattern to
iterate over data stores / channel contexts:
- `summarize` and `getGCData`.
- `getAttachSummary` and `getAttachGCData`. 

One logic change in this PR:
- It updates `ChannelCollection::getAttachSummary` to use a visisted set
to ensure that a node is only summarized oncen similar to
`FluidDataStoreRuntime::getAttachSummary`. It earlier used the summary
tree to do this.

#8257
@agarwal-navin agarwal-navin deleted the gcGetAttachData branch July 15, 2024 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues base: main PRs targeted against main branch public api change Changes to a public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants