-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taught import to be able to use less than all-available workers. #4160
Conversation
Tested and works. In draft because I want some eyes on this before we decide it's the Right Way to go. Note that we are separating this from actually adding IMPORT_WORKERS_PERCENT to settings.py to facilitate backporting the actual fix, from the Y-version of "added a config-option". |
If we can provide it as a hotfix for the one user that really needs it, we can take a bit more time with choosing the correct approach. Since the first import is obviously the biggest and most problematic, they should be freed up once they're through it. Also I'm fine with this as an initial approach regardless, but I would maybe keep it in tech preview so that we can possibly change the design later? e.g. a basic task priority system might be useful, but not something I think is justified quite yet. |
pulpcore/app/tasks/importer.py
Outdated
# pulp can continue to work while doing an import. We accomplish this by creating a | ||
# reserved-resource string for each repo-import-task based on that repo's index in | ||
# the dispatch loop, mod number-of-workers-to-consume. | ||
import_workers_percent = int(settings.get("IMPORT_WORKERS_PERCENT", 25)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is adding it as a setting.
edit: I mean there is no semantic difference between doing this and adding the (default) setting to pulpcore/app/settings.py.
Also i was wondering if we should add it to the import viewset so you can make the admin (only persona to use imports) decide on a case by case basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, also it is better to keep default settings all in one place. We're basically putting the default in the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What i was saying is that i thought we didn't want to make it a setting in this backportable hotfix, no?
Also i'm not sure if we want a setting in the end at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a fix, now, that is backportable. Changing the API isn't. I would rather have a flat %age in this PR if necessary, and then add configuration in a separate one, if that's what it will take to provide this to alleviate the current user-pain.
I think "provide a view param" as an option is a fine idea though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What i was saying is that i thought we didn't want to make it a setting in this backportable hotfix, no?
With the code looking for a setting (that we haven't advertised yet), we make it at least possible for installations with the fix to adjust their worker-use if absolutely necessary. Without it, Support has to tell customers "upgrade or patch the python code", which is...rude.
Is this a perfect solution? No. It's a pragmatic one that is trying to thread the needle of "fix a problem, while still making it possible for users to still give import All The Workers, in a way that can be backported to (very) previous versions of pulpcore."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also i'm not sure if we want a setting in the end at all.
Honestly, we do. In a lot of import-use-cases, when you're importing, you're importing All The Things, and aren't doing anything else in your system. In that workflow, maximum-concurrency is desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a fix, now, that is backportable
For hotfix purposes, this patch is fine as-is IMO. It's a little easier if they can use an official upstream release, but not actually necessary?
dest_repo_name = _get_destination_repo_name(importer, src_repo["name"]) | ||
|
||
# pulpcore-worker limiter | ||
worker_rsrc = f"import-worker-{index % import_workers}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a global worker limit. (Sounds fine...)
But all import tasks will start with dispatching "import-worker-0". Can we add some random offset here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth the complication? It's an issue exactly and only if you start multiple imports at the same time, which is a never-happen case for the admin.
I don't think we can provide something as a hotfix that isn't released. Also, "first and biggest" isn't necessarily a good indicator - "first import" of a set of repositories, but users import different sets at different times. Also, the less often imports happen, the larger they are, and observed-user-behavior is "imports quarterly at best" is A Thing.
I think we def need to consider a more global task-system review to provide control over the tasking system in general, and some way to provide "run this now" tasks that don't imply "cancel everything in waiting first". We're having discussions right now in that arena. This change is not, however, a change to the tasking-system - it's just a change to "the way pulpimport arranges for concurrency to happen". It's a very specific change to a single workflow, that has no impact on that workflow other than causing it to not-hog All The Workers. Especially since this is internal to pulpimport code, it's not something that needs tech-preview - we're not exposing an API or a workflow, the code's behavior is just changing, and we might change it differently going forward. |
So with this way of accessing the setting, you say we are free to delete/change it later? |
Yeah, I think adding it to settings.py in this PR is the right way to go. Working on updates now. |
d2c5f00
to
c1d8ced
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last ask.
Feel free to merge.
pulpcore/app/tasks/importer.py
Outdated
# pulp can continue to work while doing an import. We accomplish this by creating a | ||
# reserved-resource string for each repo-import-task based on that repo's index in | ||
# the dispatch loop, mod number-of-workers-to-consume. | ||
import_workers_percent = int(settings.get("IMPORT_WORKERS_PERCENT", 25)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import_workers_percent = int(settings.get("IMPORT_WORKERS_PERCENT", 25)) | |
import_workers_percent = settings.IMPORT_WORKERS_PERCENT |
I am not sure I feel comfortable with this approach. Once backported this setting even if undocumented will be available in the settings.py file accessible for manipulation and changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per my comment
As i tried to explain: Having it in the settings file is not what makes the setting officially available. And there is absolutely no difference between having the default in the settings.py or buried deep in the code. It is till "undocumented". |
Given that I understood your reply, that's why I am asking to not have the setting here at all - neither in settings.py nor burried in the code, so simply sticking to |
Great discussion all, thanks! Changing percent-used-by-default, based on this summing-up from matrix: "I'm a big fan of what I think your discussion leads to - "default to all-workers, allow users to set a percentage if they want/need to"." |
IMPORT_WORKERS_PERCENT is configurable in settings. We will document/expose this in a future PR, to keep this one maximally backportable. Default behavior remains "all workers". fixes pulp#4068.
c1d8ced
to
43acefc
Compare
43acefc
to
ccf30a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
If you give me a hotfix on top of 6.12.4.1, I'll be able to test in the LAB Satellite of my current CU. |
That's the next step to get it downstream. |
Backport to 3.18: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply b5cb1d1 on top of patchback/backports/3.18/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160 Backporting merged PR #4160 into main
🤖 @patchback |
Backport to 3.21: 💚 backport PR created✅ Backport PR branch: Backported as #4184 🤖 @patchback |
Backport to 3.22: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply b5cb1d1 on top of patchback/backports/3.22/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160 Backporting merged PR #4160 into main
🤖 @patchback |
Backport to 3.23: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply b5cb1d1 on top of patchback/backports/3.23/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160 Backporting merged PR #4160 into main
🤖 @patchback |
Backport to 3.28: cherry-pick succeededBackport PR branch: PR branch created, proceeding with making a PR. 🤖 @patchback |
IMPORT_WORKERS_PERCENT is configurable in settings. We will
document/expose this in a future PR, to keep this one maximally
backportable. Default behavior remains "all workers".
fixes #4068.