Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: optimize cluster identification #3309

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

tediou5
Copy link
Contributor

@tediou5 tediou5 commented Dec 11, 2024

Second attempt to close #2900

The first commit is meaningless; it simply renames cache_id to piece_cache_id

For the cache, everything is straightforward; it's just a matter of recording the corresponding relationships in the controller. However, things are a litter bit more complicated for the farmer. First, we check the identify message to see if the farmer is newly discovered and whether the fingerprint has changed. Based on the results, we decide whether to use the stream to fetch details.

The final commit is to ensure compatibility with previous approaches.

(I’m really sorry, actually, I finished it a long time ago, but I forgot about it and left it in a corner.)

Code contributor checklist:

@tediou5 tediou5 changed the title Tmp/opt/optimize cluster identification opt: optimize cluster identification Dec 11, 2024
@tediou5
Copy link
Contributor Author

tediou5 commented Dec 16, 2024

@nazar-pc additionally, during my actual development, this part of the code is not easy to test, and some scenarios are hard to cover(like farm FingerprintUpdated). At the very least, I need to start 3 components: nats, controller, and farmer/cache to do so. I was thinking maybe I could first submit a PR to extract the update logic for caches and farms and cover enough test cases?

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contribution and sorry it took this long to read into it. This is certainly the right direction, but it will cause issues for the way maintenance of caches and farms is done (prevent loop with select! from actually looping quickly, which was carefully avoided before).

I only left comments on cache, but similar comments apply to farmer side as well.

I also don't fully understand why the thing that we try to address here was sort of added back at the end, I'm confused.

And please rebase changes after further updates (if any) and squash changes to the same part of the codebase, it'll be easier to review that way.

@nazar-pc nazar-pc requested review from teor2345 and removed request for shamil-gadelshin January 14, 2025 02:07
Copy link
Contributor Author

@tediou5 tediou5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me adjust the code. Once I've completed my modifications, I'll take care of those annoying merges.

Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, once Nazar's comments have been addressed

@tediou5 tediou5 force-pushed the tmp/opt/optimize-cluster-identification branch from b633d33 to beb798c Compare January 15, 2025 09:18
@tediou5
Copy link
Contributor Author

tediou5 commented Jan 15, 2025

No changes were made, just squash changes through rebase. Additionally, I removed ClusterCacheIdentifyPieceCacheBroadcast (also for the Farmer), they were reintroduced in a separate commit, and I simply dropped that commit. Nazar's comments will be addressed in subsequent commits.

@tediou5 tediou5 force-pushed the tmp/opt/optimize-cluster-identification branch 3 times, most recently from 4246b58 to cda20e5 Compare January 20, 2025 05:49
@tediou5
Copy link
Contributor Author

tediou5 commented Jan 20, 2025

I’ve rearranged the commit order to make squashing easier later.

@nazar-pc I finished the cache implementation (it’s relatively straightforward), so you can review it for any potential issues.

91b3dd4: When a new cache appears, the system will collect the stream in the background and update KnownCaches once it’s done.

Before making changes to the farmer, perhaps I could submit a separate PR to parallelly add or remove farms? It doesn’t look too complex right now (and may even simplify the implementation).

@tediou5 tediou5 requested a review from nazar-pc January 20, 2025 05:59
@tediou5 tediou5 force-pushed the tmp/opt/optimize-cluster-identification branch from cda20e5 to 75b0755 Compare January 20, 2025 10:13
@tediou5
Copy link
Contributor Author

tediou5 commented Jan 20, 2025

The farmer's work turned out to be simpler than I imagined, and it's also done.

1704ed5 is refactoring and moving code, with no actual changes.

75ddfe6 is the actual modification, but the logic after refactoring hasn’t changed much—it’s just split into two parts, with no other changes.

@tediou5 tediou5 requested a review from teor2345 January 20, 2025 10:18
Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all seem fine to me, but Nazar knows this area much better than I do.

@tediou5 tediou5 force-pushed the tmp/opt/optimize-cluster-identification branch from 75b0755 to 3078439 Compare February 20, 2025 07:25
@tediou5
Copy link
Contributor Author

tediou5 commented Feb 20, 2025

Just rebase #3354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize farm and cache identification
3 participants