Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡ fetch org repositories in parallel #4970

Merged
merged 4 commits into from
Dec 12, 2024
Merged

Conversation

afiune
Copy link
Contributor

@afiune afiune commented Dec 11, 2024

Introducing an internal go package called workerpool that we can use to send parallel
requests when needed.

⚡ For this change, I am making the fetching of repositories for an organization faster.

Tested this code with an organization that has around 3k repositories

Before (~2 Minutes)

TRC logger.FuncDur> func=provider.github.repositories took=102803.621667

After (~5 seconds)

TRC logger.FuncDur> func=provider.github.repositories took=4567.576542

Signed-off-by: Salim Afiune Maya <afiune@mondoo.com>
Signed-off-by: Salim Afiune Maya <afiune@mondoo.com>
break
}

// send as many request as workers we have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we limit by number of workers? Doesn't this artificially throttle things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let me check that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaym I updated the PR

Copy link
Contributor

github-actions bot commented Dec 11, 2024

Test Results

3 169 tests  +7   3 168 ✅ +7   1m 57s ⏱️ +27s
  373 suites +1       1 💤 ±0 
   28 files   ±0       0 ❌ ±0 

Results for commit 4765ace. ± Comparison against base commit 8fafbd8.

♻️ This comment has been updated with latest results.

This will help us submit as many requests as we want without knowing
about the workers.

Signed-off-by: Salim Afiune Maya <afiune@mondoo.com>
@afiune afiune force-pushed the afiune/workerpool+github branch from 599b1bc to 93ca08d Compare December 11, 2024 22:15
}

func (c *collector[R]) RequestsRead() int64 {
return c.requestsRead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there might be a data race here on requestsRead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Thanks for pointing it out, Jay.


// PendingRequests returns the number of pending requests.
func (p *Pool[R]) PendingRequests() int64 {
return p.requestsSent - p.collector.RequestsRead()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not really familiar with the atomics api in go, but i would expect some sort of synchronized read on requestsSent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomic helps to do thread safe operations, yes, I missed the read calls, they are now there! 🙌🏽

Signed-off-by: Salim Afiune Maya <afiune@mondoo.com>
@afiune
Copy link
Contributor Author

afiune commented Dec 12, 2024

Used the data race detector from go to validate.

$ go test -race go.mondoo.com/cnquery/v11/internal/workerpool -v
=== RUN   TestPoolSubmitAndRetrieveResult
--- PASS: TestPoolSubmitAndRetrieveResult (0.01s)
=== RUN   TestPoolHandleErrors
--- PASS: TestPoolHandleErrors (0.01s)
=== RUN   TestPoolMultipleTasksWithErrors
--- PASS: TestPoolMultipleTasksWithErrors (0.01s)
=== RUN   TestPoolHandlesNilTasks
--- PASS: TestPoolHandlesNilTasks (0.00s)
=== RUN   TestPoolProcessing
--- PASS: TestPoolProcessing (0.06s)
=== RUN   TestPoolClosesGracefully
--- PASS: TestPoolClosesGracefully (0.11s)
=== RUN   TestPoolWithManyTasks
--- PASS: TestPoolWithManyTasks (15.03s)
PASS
ok  	go.mondoo.com/cnquery/v11/internal/workerpool	16.715s

@afiune afiune merged commit 949f0ce into main Dec 12, 2024
16 checks passed
@afiune afiune deleted the afiune/workerpool+github branch December 12, 2024 17:28
@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants