Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taught import to be able to use less than all-available workers. #4160

Merged
merged 1 commit into from
Jul 28, 2023

Conversation

ggainey
Copy link
Contributor

@ggainey ggainey commented Jul 26, 2023

IMPORT_WORKERS_PERCENT is configurable in settings. We will
document/expose this in a future PR, to keep this one maximally
backportable. Default behavior remains "all workers".

fixes #4068.

@ggainey ggainey requested review from ipanova, lubosmj and mdellweg July 26, 2023 20:08
@ggainey
Copy link
Contributor Author

ggainey commented Jul 26, 2023

Tested and works. In draft because I want some eyes on this before we decide it's the Right Way to go.

Note that we are separating this from actually adding IMPORT_WORKERS_PERCENT to settings.py to facilitate backporting the actual fix, from the Y-version of "added a config-option".

@dralley
Copy link
Contributor

dralley commented Jul 26, 2023

Tested and works. In draft because I want some eyes on this before we decide it's the Right Way to go.

If we can provide it as a hotfix for the one user that really needs it, we can take a bit more time with choosing the correct approach. Since the first import is obviously the biggest and most problematic, they should be freed up once they're through it.

Also I'm fine with this as an initial approach regardless, but I would maybe keep it in tech preview so that we can possibly change the design later?

e.g. a basic task priority system might be useful, but not something I think is justified quite yet.

# pulp can continue to work while doing an import. We accomplish this by creating a
# reserved-resource string for each repo-import-task based on that repo's index in
# the dispatch loop, mod number-of-workers-to-consume.
import_workers_percent = int(settings.get("IMPORT_WORKERS_PERCENT", 25))
Copy link
Member

@mdellweg mdellweg Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding it as a setting.
edit: I mean there is no semantic difference between doing this and adding the (default) setting to pulpcore/app/settings.py.

Also i was wondering if we should add it to the import viewset so you can make the admin (only persona to use imports) decide on a case by case basis.

Copy link
Contributor

@dralley dralley Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, also it is better to keep default settings all in one place. We're basically putting the default in the code here.

Copy link
Member

@mdellweg mdellweg Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i was saying is that i thought we didn't want to make it a setting in this backportable hotfix, no?
Also i'm not sure if we want a setting in the end at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a fix, now, that is backportable. Changing the API isn't. I would rather have a flat %age in this PR if necessary, and then add configuration in a separate one, if that's what it will take to provide this to alleviate the current user-pain.

I think "provide a view param" as an option is a fine idea though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i was saying is that i thought we didn't want to make it a setting in this backportable hotfix, no?

With the code looking for a setting (that we haven't advertised yet), we make it at least possible for installations with the fix to adjust their worker-use if absolutely necessary. Without it, Support has to tell customers "upgrade or patch the python code", which is...rude.

Is this a perfect solution? No. It's a pragmatic one that is trying to thread the needle of "fix a problem, while still making it possible for users to still give import All The Workers, in a way that can be backported to (very) previous versions of pulpcore."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also i'm not sure if we want a setting in the end at all.

Honestly, we do. In a lot of import-use-cases, when you're importing, you're importing All The Things, and aren't doing anything else in your system. In that workflow, maximum-concurrency is desirable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a fix, now, that is backportable

For hotfix purposes, this patch is fine as-is IMO. It's a little easier if they can use an official upstream release, but not actually necessary?

dest_repo_name = _get_destination_repo_name(importer, src_repo["name"])

# pulpcore-worker limiter
worker_rsrc = f"import-worker-{index % import_workers}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a global worker limit. (Sounds fine...)
But all import tasks will start with dispatching "import-worker-0". Can we add some random offset here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth the complication? It's an issue exactly and only if you start multiple imports at the same time, which is a never-happen case for the admin.

@ggainey
Copy link
Contributor Author

ggainey commented Jul 27, 2023

If we can provide it as a hotfix for the one user that really needs it, we can take a bit more time with choosing the correct approach. Since the first import is obviously the biggest and most problematic, they should be freed up once they're through it.

I don't think we can provide something as a hotfix that isn't released. Also, "first and biggest" isn't necessarily a good indicator - "first import" of a set of repositories, but users import different sets at different times. Also, the less often imports happen, the larger they are, and observed-user-behavior is "imports quarterly at best" is A Thing.

Also I'm fine with this as an initial approach regardless, but I would maybe keep it in tech preview so that we can possibly change the design later?

e.g. a basic task priority system might be useful, but not something I think is justified quite yet.

I think we def need to consider a more global task-system review to provide control over the tasking system in general, and some way to provide "run this now" tasks that don't imply "cancel everything in waiting first". We're having discussions right now in that arena. This change is not, however, a change to the tasking-system - it's just a change to "the way pulpimport arranges for concurrency to happen". It's a very specific change to a single workflow, that has no impact on that workflow other than causing it to not-hog All The Workers.

Especially since this is internal to pulpimport code, it's not something that needs tech-preview - we're not exposing an API or a workflow, the code's behavior is just changing, and we might change it differently going forward.

@mdellweg
Copy link
Member

So with this way of accessing the setting, you say we are free to delete/change it later?
I think the same goes for a setting in settings.py but not in the docs, making it particularly "undocumented".

@ggainey
Copy link
Contributor Author

ggainey commented Jul 27, 2023

So with this way of accessing the setting, you say we are free to delete/change it later? I think the same goes for a setting in settings.py but not in the docs, making it particularly "undocumented".

Yeah, I think adding it to settings.py in this PR is the right way to go. Working on updates now.

@ggainey ggainey force-pushed the 4068_throttle_import branch from d2c5f00 to c1d8ced Compare July 28, 2023 01:20
@ggainey ggainey marked this pull request as ready for review July 28, 2023 01:22
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last ask.
Feel free to merge.

# pulp can continue to work while doing an import. We accomplish this by creating a
# reserved-resource string for each repo-import-task based on that repo's index in
# the dispatch loop, mod number-of-workers-to-consume.
import_workers_percent = int(settings.get("IMPORT_WORKERS_PERCENT", 25))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import_workers_percent = int(settings.get("IMPORT_WORKERS_PERCENT", 25))
import_workers_percent = settings.IMPORT_WORKERS_PERCENT

@ipanova
Copy link
Member

ipanova commented Jul 28, 2023

So with this way of accessing the setting, you say we are free to delete/change it later? I think the same goes for a setting in settings.py but not in the docs, making it particularly "undocumented".

Yeah, I think adding it to settings.py in this PR is the right way to go. Working on updates now.

I am not sure I feel comfortable with this approach. Once backported this setting even if undocumented will be available in the settings.py file accessible for manipulation and changes.
@ggainey sorry for back and forth but can we in this PR drop this change https://github.com/pulp/pulpcore/pull/4160/files#diff-4e1146f075426039ac8e1350e88fe5736a3e50872a42d101bfc8bc1b7f483574R306 and this line hardcode to import_workers_percent =25 https://github.com/pulp/pulpcore/pull/4160/files#diff-06badc442eb525dc6f2de7f4e19a7762ce20a972b79b6a90cf1ee80211053ab5R516 and in separate followup PR bring back the setting https://github.com/pulp/pulpcore/pull/4160/files#diff-4e1146f075426039ac8e1350e88fe5736a3e50872a42d101bfc8bc1b7f483574R306 and this line https://github.com/pulp/pulpcore/pull/4160/files#diff-06badc442eb525dc6f2de7f4e19a7762ce20a972b79b6a90cf1ee80211053ab5R516 + document setting+ in CHANGES/.feature

Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my comment

@mdellweg
Copy link
Member

mdellweg commented Jul 28, 2023

As i tried to explain: Having it in the settings file is not what makes the setting officially available. And there is absolutely no difference between having the default in the settings.py or buried deep in the code. It is till "undocumented".
My worry is that we make it there-to-stay even if not official the moment we tell an support person to use it.

@ipanova
Copy link
Member

ipanova commented Jul 28, 2023

As i tried to explain: Having it in the settings file is not what makes the setting officially available. And there is absolutely no difference between having the default in the settings.py or buried deep in the code. It is till "undocumented". My worry is that we make it there-to-stay even if not official the moment we tell an support person to use it.

Given that I understood your reply, that's why I am asking to not have the setting here at all - neither in settings.py nor burried in the code, so simply sticking to import_workers_percent = 25 will do for the backport purpose. And in follow up PR will come the rest.

@ggainey
Copy link
Contributor Author

ggainey commented Jul 28, 2023

Great discussion all, thanks!

Changing percent-used-by-default, based on this summing-up from matrix:

"I'm a big fan of what I think your discussion leads to - "default to all-workers, allow users to set a percentage if they want/need to"."

IMPORT_WORKERS_PERCENT is configurable in settings. We will
document/expose this in a future PR, to keep this one maximally
backportable. Default behavior remains "all workers".

fixes pulp#4068.
@ggainey ggainey force-pushed the 4068_throttle_import branch from c1d8ced to 43acefc Compare July 28, 2023 14:20
CHANGES/4068.bugfix Outdated Show resolved Hide resolved
@ggainey ggainey force-pushed the 4068_throttle_import branch from 43acefc to ccf30a6 Compare July 28, 2023 14:25
@ggainey ggainey changed the title Taught import to throttle at 25% of available workers. Taught import to be able to use less than all-available workers. Jul 28, 2023
Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ElCoyote27
Copy link

If you give me a hotfix on top of 6.12.4.1, I'll be able to test in the LAB Satellite of my current CU.

@ipanova
Copy link
Member

ipanova commented Jul 28, 2023

If you give me a hotfix on top of 6.12.4.1, I'll be able to test in the LAB Satellite of my current CU.

That's the next step to get it downstream.

@ggainey ggainey merged commit b5cb1d1 into pulp:main Jul 28, 2023
@patchback
Copy link

patchback bot commented Jul 28, 2023

Backport to 3.18: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply b5cb1d1 on top of patchback/backports/3.18/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160

Backporting merged PR #4160 into main

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/pulp/pulpcore.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.18/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160 upstream/3.18
  4. Now, cherry-pick PR Taught import to be able to use less than all-available workers. #4160 contents into that branch:
    $ git cherry-pick -x b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67
    If it'll yell at you with something like fatal: Commit b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67 is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Taught import to be able to use less than all-available workers. #4160 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.18/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@patchback
Copy link

patchback bot commented Jul 28, 2023

Backport to 3.21: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.21/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160

Backported as #4184

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@patchback
Copy link

patchback bot commented Jul 28, 2023

Backport to 3.22: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply b5cb1d1 on top of patchback/backports/3.22/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160

Backporting merged PR #4160 into main

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/pulp/pulpcore.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.22/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160 upstream/3.22
  4. Now, cherry-pick PR Taught import to be able to use less than all-available workers. #4160 contents into that branch:
    $ git cherry-pick -x b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67
    If it'll yell at you with something like fatal: Commit b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67 is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Taught import to be able to use less than all-available workers. #4160 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.22/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@patchback
Copy link

patchback bot commented Jul 28, 2023

Backport to 3.23: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply b5cb1d1 on top of patchback/backports/3.23/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160

Backporting merged PR #4160 into main

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/pulp/pulpcore.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.23/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160 upstream/3.23
  4. Now, cherry-pick PR Taught import to be able to use less than all-available workers. #4160 contents into that branch:
    $ git cherry-pick -x b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67
    If it'll yell at you with something like fatal: Commit b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67 is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Taught import to be able to use less than all-available workers. #4160 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.23/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@patchback
Copy link

patchback bot commented Jul 28, 2023

Backport to 3.28: cherry-pick succeeded

Backport PR branch: patchback/backports/3.28/b5cb1d19a130b67aff5c818e1a8b1ce1e8655a67/pr-4160

PR branch created, proceeding with making a PR.

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@ggainey ggainey deleted the 4068_throttle_import branch February 2, 2024 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PulpImport can overload the task-queue
6 participants