Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API/Worker] - Improve Sync Task Performance #1549

Open
adrian-codecov opened this issue Apr 10, 2024 · 0 comments
Open

[API/Worker] - Improve Sync Task Performance #1549

adrian-codecov opened this issue Apr 10, 2024 · 0 comments

Comments

@adrian-codecov
Copy link

The sync task often takes a long time because it's syncs all repos for all orgs upon a) login or b) pressing the sync button. This also leads to an increase in rate limits. It would be ideal This is an attempt to improve performance altogether.

Some context from a previous conversation:

"
The current REST approach loops around N repos that were synced in the sync_repos task and performs the sync_language task, calling the provider N times - these are repos we guaranteed have on our DB. The main difference with the GQL approach is it uses an owner to call M repos from the provider, where N and M could be different here (N being the repos synced in our DB, M being the repos in the provider). Because of that, we'd also have to make M DB calls to ensure the repo we see in the provider is also in our DB, then we update languages. We'd be trading some rate limit requests with some DB ones, imo not terrible but the task/code would be a bit different. Also would be a special case for gh separate from gl/bb. Not opposed to the refactoring, just that it's not a "replace rest w/ gql" change.

Other interesting things, mostly related to the sync_repos task + consequently to the sync_languages task:

TLDR, the sync_languages task currently gets the affected repoids from the sync_repo task. The sync_repo task has 2 variants: one for owners that have integrations (A), one for owners that don't (B). While they both sync_repos, variant B syncs a list of all repos a user has access to to - as an example, I have access to the repos from these owners (this is local data) {('github', 'adrian-codecov'), ('github', 'aviquez96'), ('github', 'codecov'), ('github', 'temp-org-test-123'), ('github', 'terry-codecov'), ('github', 'Turing-Corp')}. If you dig further into list_repos(), you'll see it behaves differently whether you supply a username, e.g. "codecov", which only looks for repos for that org, vs when you don't, which fetches all the repos from all the owners you have available. Now, I've played a little bit locally, and when you specify a username it's substantially faster; our current implementation I believe does what it does cause back in the day we used to have that "all repos" page right, but we've gotten rid of that and always specify an org, but this change didn't really reflect that. I think this is a HUGE opportunity for performance improvement around the sync task in general, and it might be that we just missed it in the echoes.

To be super explicit, when I click resync here, I'm resyncing for all of these even though I have Codecov selected. That imo can be arguably an undesired behavior
Screenshot 2024-03-01 at 8 20 45 PM.

Now, maybe syncing "everything" is still a desired functionality, and we can play around that if we want to, but I think this can be an opportunity to optimize calls here. I'm not sure if enterprise customers have only 1 org, in which case this wouldn't help too much, but if they did, that would be $$

Lastly! Referring to variant A of our sync_repo tasks, that syncs based on whether you're using integration, I have reason to believe we're not leveraging that functionality at all. Can expand more on this too, but basically all of the places that call the sync task have using_integration = False afaik
"

Worker

  • pass specific org to the git_list fns

API

  • accept params with specific org for select
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant