-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel package catalog processing #1355
Conversation
9b4d8fa
to
2f3f656
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a great idea! I wonder if it could be simplified a bit (see some initial comments). Also, I wonder if there would be a way to extract this to some more "generic" function like runInParallel(workers, func()...)
...
A friendly reminder that you'll need to |
a6f8b7e
to
702b5b1
Compare
Thanks for the initial review @kzantow (i think i may take this out of draft now) This is the concurrency pattern that i have been following. I will assume for it idiomatic? Summarise the approach / requirements: I think the
Are we aligned with the general steps? I'm open to exploring other concurrency patterns, however the waitgroup seems advantageous here due to wdyt? |
2929836
to
38767e6
Compare
My preference would be to introduce only an application configuration option first and not a CLI option. That is, keep the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question / asking for folks opinion: why introduce a number of logical concurrent workers? We could lean on the default GOMAXPROCS
value and simply start go routines for all catalogers without setting an artificial limit.
68e60d2
to
25be4fb
Compare
Do we want to remove the flag entriely?
This pr currently setting My (non maintainer) pov:
|
e0d57a2
to
b75e661
Compare
Signed-off-by: mikcl <mikesmikes400@gmail.com>
b75e661
to
6efbd57
Compare
Any updates on @wagoodman open question and interface comments above from the syft team. |
Signed-off-by: mikcl <mikesmikes400@gmail.com>
6efbd57
to
1c7490f
Compare
just bumping in case there are any new updates on this from syft team. cc @kzantow |
Sorry for the radio silence -- I thought a little more about this. I agree having a more explicit config option ( I do think we should make this an app-config-only update and still not introduce a CLI flag. That is, I see that you hid the flag, but ideally it wouldn't be on the CLI at all. I can help with those updates if you'd like? |
Signed-off-by: mikcl <mikesmikes400@gmail.com>
thanks for the update and thoughts on the direction @wagoodman
|
hny!, what will it take to get this across the line? |
Is this still progressing? happy with waiting, would be good to get any feedback if possible. understand that there may be other priorities so will leave this open for now :) |
Sorry for the radio silence -- I'm going to push a couple minor tweaks, then I think it'll be go. Nice work @Mikcl ! |
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice enhancement @Mikcl !
Signed-off-by: mikcl <mikesmikes400@gmail.com>
Head branch was pushed to by a user without write access
Thanks for the logging changes, looks good (some tests failed due to the logging change, made a commit to update it). I feel like it is useful to This last commit should fix the tests that failed. Please feel free to auto-enable-merge / merge when available if approved changes. |
this is green now 🚀 Thanks for your patience on this PR @wagoodman |
My bad, thanks for the fix! |
* catalog: run cataloggers concurrently Signed-off-by: mikcl <mikesmikes400@gmail.com> * frontend: expose workers as a configurable option Signed-off-by: mikcl <mikesmikes400@gmail.com> * fixup! frontend: expose workers as a configurable option Signed-off-by: mikcl <mikesmikes400@gmail.com> * update logging statements Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * test: assert for debug logging Signed-off-by: mikcl <mikesmikes400@gmail.com> Signed-off-by: mikcl <mikesmikes400@gmail.com> Signed-off-by: Alex Goodman <alex.goodman@anchore.com> Co-authored-by: Alex Goodman <alex.goodman@anchore.com>
Closes #1353
This introduces a new option
SYFT_PARALLELISM=N
environement variable and syft config file value. Which will use at mostN
workers to process the package catalogers in parallel. The default will be 1.sync
library to create a wait group, waiting for all package catalogers to finish before proceeding.syft
today, as the default is to create one worker.example performance benchmark comparision
Created an exmple directory with
venv
andnode_modules
1 worker:
13.29s user 1.75s system 114% cpu 13.179 total
4 workers:
14.16s user 1.94s system 174% cpu 9.235 total
For larger file systems and cpu's with more cores, this may be useful :)