Optimize ACS stage #4274

dralley · 2023-08-14T03:39:31Z

ACS uses a lot of memory, apparently: https://bugzilla.redhat.com/show_bug.cgi?id=2219885

dralley · 2023-08-14T03:41:42Z

pulpcore/plugin/stages/artifact_stages.py

+ RemoteArtifact.objects.acs()
+ .filter(**{f"{checksum_type}__in": batch_checksums[checksum_type]})
+ .only("url", checksum_type)
+ .select_related("remote")


TODO:

figure out how to get rid of this select if possible - currently we set the hydrated remote object below so I'm not sure if we can get away with just using the PK? Unless we fetch separately and cache?

sanity check overall logic

mdellweg · 2023-08-14T08:54:06Z

pulpcore/plugin/stages/artifact_stages.py

+ acs_query = AlternateContentSource.objects.filter(pulp_domain=self.domain)
+ acs_exists = await acs_query.aexists()
 async for batch in self.batches():
- acs_query = AlternateContentSource.objects.filter(pulp_domain=self.domain)
- acs_exists = await acs_query.aexists()


+1
Idea: Is it faster to not batch at all without acs? Or should we move this check out to the point, where we can decide wheter the pipeline should include this stage?

Or should we move this check out to the point, where we can decide wheter the pipeline should include this stage?

Is commit temp3 what you're looking for?

Exactly.

Also not grocking the concept of ACS properly... Are they neither tied to a repository, nor a remote? Is that extremely generic exists query the best we can do?

Looking at this line:

pulpcore/pulpcore/app/models/acs.py

Line 31 in 20afd1e

remote = models.ForeignKey("Remote", null=True, on_delete=models.PROTECT)

Why do we combine null=True with on_delete=PROTECTED? If we could say an ACS is always tied to a remote, i would understand a lot more.

I'm not sure, I think Pavel wrote all of this. @dkliban do you know?

For that matter, I wonder why it (according to the docs) only has an effect with on_demand remotes? I would think that you'd want the ACS to be available during immediate syncs too?

https://github.com/pulp/pulpcore/blob/3cb7ef4369fd612d17e2be7863845f6cc74c141f/docs/workflows/alternate-content-sources.rst#overview

I don't think the download_policy should matter.

ACS is not tight to any repo or remote, yet it should have a remote that it's going to use during the refresh action.
The idea was to create that remote always with on-demand policy so the refresh endpoint takes as little time as possible by creating only RA( look at it as indexing content only). Those RA could be used later on during repo syncs.
Each ACS is assumed to be global(per domain) so, every rpm sync will check with the rpm typed ACS content and prefer it's binary data over the original sync remote.

gerrod3 · 2023-08-14T22:48:52Z

pulpcore/plugin/stages/artifact_stages.py

+ d_artifact.urls = [
+ existing_ras_dict[checksum]["url"]
+ ] + d_artifact.urls
+ d_artifact.remote = existing_ras_dict[checksum]["remote"]


After reading through the code I am mega-confused on how ACS even works and reading the workflow docs did not clear it up for me. https://docs.pulpproject.org/pulpcore/workflows/alternate-content-sources.html

I think there is a big problem with just overwritting the d_artifact.remote field with the one from an existing RA that could be from a different remote! It means all the current urls on the d_artifact could be rendered useless since they won't have the correct remote to build the downloader from. I have a feeling this feature is not working entirely how we intend it to.

Yeah this isn't my strong suit either. If I understand correctly, the basic idea is to preferentially use a different remote instead of the original one for downloads. I believe what might make it OK, is that when it comes to saving remote artifacts, it's ultimately going to end up saving new remote artifacts rather than modifying existing ones. But in terms of the download process, it will be the new ones that used?

I think! It would be worthwhile to verify those assumptions and document the actual details while I'm working on this code anyway.

There's a lot of discussion in the original plan.io issue : https://pulp.plan.io/issues/7832

And some questions/answers in the hackmd : https://hackmd.io/@pulp/acs

ipanova · 2023-08-23T09:36:44Z

pulpcore/plugin/stages/artifact_stages.py

+ if getattr(d_artifact.artifact, checksum_type):
+ checksum = getattr(d_artifact.artifact, checksum_type)
+ if checksum in existing_ras_dict:
+ d_artifact.urls = [


Binging back for transparency some of the matrix conversation @dralley and I had.
I see few issue with the current design:
1)The original intention for this line was the declarative artifact to accept list of URLs which downloader will try in a loop to go trough. Those URLs would come from a mirrorlist and the remote would whether not need credentials at all or these credentials would be same for every URLs in that list. Not sure how worthy it is for the ACS stage to have a list of URLs - this would be a mix of them, one coming from original remote, most likely needing credentials and another from ACS( with or without credentials or different credentials?). The remote which is going to be used is the one from ACS.
So maybe this needs to be reverted back to d_artifact.url = existing_ras_dict[checksum]["url"]
2) Another problem is that in case the ArtifactDownload stage fails( namely download of binary data from the substituted ACS source) there is no fallback to the original remote downloader + url.
It is expected in normal circumstances the source for ACS being available because that's the preferred and fastest way how to get the binary data but no one can guarantee normal circumstances 100%.
3) We should first iterate through all available ACSes to download from and then fallback to the original remote but instead we provide just one ( this is currently pipeline limitation, where declarative artifact can reference just one remote).

stale · 2023-11-21T13:52:34Z

This pull request has been marked 'stale' due to lack of recent activity. If there is no further activity, the PR will be closed in another 30 days. Thank you for your contribution!

stale · 2023-12-21T17:07:53Z

This pull request has been closed due to inactivity. If you feel this is in error, please reopen the pull request or file a new PR with the relevant details.

stale · 2024-01-24T16:47:54Z

This pull request is no longer marked for closure.

stale · 2024-04-24T00:29:13Z

This pull request has been marked 'stale' due to lack of recent activity. If there is no further activity, the PR will be closed in another 30 days. Thank you for your contribution!

stale · 2024-05-24T06:32:38Z

This pull request has been closed due to inactivity. If you feel this is in error, please reopen the pull request or file a new PR with the relevant details.

stale · 2024-05-24T14:55:43Z

This pull request is no longer marked for closure.

Lift some queries that were performed per-batch outside of the loop, resolving an N+1 where N=number of batches. Also use .iterator() [noissue]

Instead of performing a gigantic AND-ed OR clause query, break up the list of remote artifacts by checksum type and perform one IN query per type of checksum, which ought to be easily indexable. [noissue]

Lift the ACS domain check out of the stage entirely. [noissue]

dralley force-pushed the acs-optimize branch from 2af4771 to e6e0742 Compare August 14, 2023 03:40

dralley commented Aug 14, 2023

View reviewed changes

dralley force-pushed the acs-optimize branch 2 times, most recently from a4ac583 to e9d81e5 Compare August 14, 2023 04:04

mdellweg reviewed Aug 14, 2023

View reviewed changes

dralley force-pushed the acs-optimize branch 2 times, most recently from 504a221 to 4b1930d Compare August 14, 2023 21:07

gerrod3 reviewed Aug 14, 2023

View reviewed changes

dralley force-pushed the acs-optimize branch from 4b1930d to edab9f2 Compare August 15, 2023 04:38

ipanova reviewed Aug 23, 2023

View reviewed changes

stale bot added the stale label Nov 21, 2023

stale bot closed this Dec 21, 2023

dralley reopened this Jan 24, 2024

github-actions bot added the multi-commit label Jan 24, 2024

stale bot removed the stale label Jan 24, 2024

stale bot added the stale label Apr 24, 2024

stale bot closed this May 24, 2024

dralley reopened this May 24, 2024

stale bot removed the stale label May 24, 2024

dralley force-pushed the acs-optimize branch from edab9f2 to 712bd03 Compare July 10, 2024 14:10

dralley force-pushed the acs-optimize branch from 712bd03 to 268513d Compare August 12, 2024 14:52

dralley added 2 commits August 12, 2024 19:49

Optimize ACS sync stage

5867258

Lift some queries that were performed per-batch outside of the loop, resolving an N+1 where N=number of batches. Also use .iterator() [noissue]

Optimize ACS sync stage

7d24ec4

Instead of performing a gigantic AND-ed OR clause query, break up the list of remote artifacts by checksum type and perform one IN query per type of checksum, which ought to be easily indexable. [noissue]

dralley force-pushed the acs-optimize branch from 268513d to 5b60c26 Compare August 12, 2024 23:51

dralley force-pushed the acs-optimize branch from 5b60c26 to 9305eb8 Compare August 13, 2024 00:34

Optimize ACS sync stage

516114f

Lift the ACS domain check out of the stage entirely. [noissue]

dralley force-pushed the acs-optimize branch from 9305eb8 to 516114f Compare August 13, 2024 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ACS stage #4274

Optimize ACS stage #4274

dralley commented Aug 14, 2023 •

edited

Loading

dralley Aug 14, 2023 •

edited

Loading

mdellweg Aug 14, 2023

dralley Aug 14, 2023

mdellweg Aug 15, 2023

mdellweg Aug 15, 2023

dralley Aug 16, 2023

dralley Aug 16, 2023

dkliban Aug 18, 2023

ipanova Aug 22, 2023 •

edited

Loading

gerrod3 Aug 14, 2023

dralley Aug 15, 2023

ggainey Aug 21, 2023 •

edited

Loading

ipanova Aug 23, 2023 •

edited

Loading

stale bot commented Nov 21, 2023

stale bot commented Dec 21, 2023

stale bot commented Jan 24, 2024

stale bot commented Apr 24, 2024

stale bot commented May 24, 2024

stale bot commented May 24, 2024

Optimize ACS stage #4274

Are you sure you want to change the base?

Optimize ACS stage #4274

Conversation

dralley commented Aug 14, 2023 • edited Loading

dralley Aug 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggainey Aug 21, 2023 • edited Loading

Choose a reason for hiding this comment

ipanova Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

stale bot commented Nov 21, 2023

stale bot commented Dec 21, 2023

stale bot commented Jan 24, 2024

stale bot commented Apr 24, 2024

stale bot commented May 24, 2024

stale bot commented May 24, 2024

dralley commented Aug 14, 2023 •

edited

Loading

dralley Aug 14, 2023 •

edited

Loading

ipanova Aug 22, 2023 •

edited

Loading

ggainey Aug 21, 2023 •

edited

Loading

ipanova Aug 23, 2023 •

edited

Loading