Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial source file and source indices support #295

Merged
merged 2 commits into from
Oct 23, 2023
Merged

Conversation

masselstine
Copy link
Contributor

fixes #8775
https://pulp.plan.io/issues/8775

Allows the upload of source files (orig.tar, debian.tar) and Debian
Source Control files (dsc files), as well as synchronizing with a
remote's source indices (sync_sources=True).

The source_files referenced in a dsc_file must be uploaded before the
dsc_file so a typical workflow might be

--
apt-get source bc

http --form post $BASE/pulp/api/v3/content/deb/source_files/
file@"bc_1.07.1.orig.tar.gz" name=bc_1.07.1.orig.tar.gz
http --form post $BASE/pulp/api/v3/content/deb/source_files/
file@"bc_1.07.1-2build1.debian.tar.xz" name=bc_1.07.1-2build1.debian.tar.xz
http --form post $BASE/pulp/api/v3/content/deb/dsc_files/
file@"bc/bc_1.07.1-2build1.dsc"

Attempting to upload the dsc file before its source files will result
validation errors.

Once uploaded source_files and dsc_files can be inspected via their
pulp_href or using the respective endpoints

http get $BASE/pulp/api/v3/content/deb/dsc_files/
http get $BASE/pulp/api/v3/content/deb/source_files/

Synchronizing with a remote with "sync_sources=True" will synchronize
the Sources files as well as download all associated source_files and
dsc_files, performing a high level of data validation and creating all
required associations. As with uploading inspecting contents after a
sync is complete can be done with the same endpoints as above.

To remain spec compliant with https://wiki.debian.org/DebianRepository
"md5" must be present in the ALLOWED_CONTENT_CHECKSUMS. In all cases
the use of md5 is supplemental so security concerns around this
addition, at least for use with this feature, should be minimal.

TODO: publishing - this work has been started and will be completed
over the coming days.

@masselstine masselstine changed the title Initial source file and source indices support WIP: Initial source file and source indices support Jun 15, 2021
@pulpbot
Copy link
Member

pulpbot commented Jun 15, 2021

Attached issue: https://pulp.plan.io/issues/8775

@quba42
Copy link
Collaborator

quba42 commented Jun 17, 2021

I had to approve running the tests...

It looks like the linter wants you to add a file named:

CHANGES/8775.feature

containing a short changelog entry, perhaps something like:

Added support for the synchronization and upload of source packages.

Feel free to amend the wording since you most likely know more about Debian source packages than I do. 😉

@quba42 quba42 added the .feature CHANGES/<issue_number>.feature label Jun 17, 2021
@mdellweg
Copy link
Member

I always figured, the dsc and the tarball to be part of a single two artifact content unit. Have you considered that?

@masselstine
Copy link
Contributor Author

I always figured, the dsc and the tarball to be part of a single two artifact content unit. Have you considered that?

AFAIK the orig.tar can be used by multiple revisions of a DSC. For example upstream releases foo version 1.2.3 and a package maintainer produces a Dsc foo_1.2.3-1.dsc, the package maintainer then refines their package and produces foo_1.2.3-2.dsc but the upstream project content doesn't change, so the orig.tar remains the same. As such we definitely can't include the orig.tar in the MultiArtifact. NOTE: (native) packages do not require an orig.tar and since 3.0 (quilt) there can be an arbitrary number of orig.tar.

This leaves just the debian.tar (format v3) or diff.tar (format v1). I agree that this does appear to be ripe for handling as a MultiArtifact. I believe that like the orig.tar that this file is optional but I can't actually dig up this information, would this cause any issue with using MultiArtifact? I admittedly had no experience with Django going into this and it could be this inexperience but I fail to see an way to implement this. The artifact handling should be easy enough, my difficulties are more about the additional meta-data handling that a simple artifact simply doesn't have. I have to go through my code today and cleanup the linter warning and such but I am 100% open to some assistance with taking what I have and making adjustments.

@evgeni
Copy link
Member

evgeni commented Jun 18, 2021

I always figured, the dsc and the tarball to be part of a single two artifact content unit. Have you considered that?

AFAIK the orig.tar can be used by multiple revisions of a DSC. For example upstream releases foo version 1.2.3 and a package maintainer produces a Dsc foo_1.2.3-1.dsc, the package maintainer then refines their package and produces foo_1.2.3-2.dsc but the upstream project content doesn't change, so the orig.tar remains the same. As such we definitely can't include the orig.tar in the MultiArtifact. NOTE: (native) packages do not require an orig.tar and since 3.0 (quilt) there can be an arbitrary number of orig.tar.

Isn't it even harder? I remember that foo-1.2.3-1 and foo-1.2.3-2 can't have different orig tar gzs? Or was that a different tool, not the archive enforcing that?

@masselstine
Copy link
Contributor Author

I always figured, the dsc and the tarball to be part of a single two artifact content unit. Have you considered that?

AFAIK the orig.tar can be used by multiple revisions of a DSC. For example upstream releases foo version 1.2.3 and a package maintainer produces a Dsc foo_1.2.3-1.dsc, the package maintainer then refines their package and produces foo_1.2.3-2.dsc but the upstream project content doesn't change, so the orig.tar remains the same. As such we definitely can't include the orig.tar in the MultiArtifact. NOTE: (native) packages do not require an orig.tar and since 3.0 (quilt) there can be an arbitrary number of orig.tar.

Isn't it even harder? I remember that foo-1.2.3-1 and foo-1.2.3-2 can't have different orig tar gzs? Or was that a different tool, not the archive enforcing that?

It makes sense but I have failed in my search of the documentation and may have to revert to reading code. For as good as the Debian docs can be, they talk around the topic of required/optional/enforced when it comes to the files included in the Debian Source Control file. At any rate, I am near 100% sure the only file that could be included in a MultiArtifact approach would be the debian.tar/diff.tar. Any orig.tar need to exist on their own.

@masselstine
Copy link
Contributor Author

I always figured, the dsc and the tarball to be part of a single two artifact content unit. Have you considered that?

AFAIK the orig.tar can be used by multiple revisions of a DSC. For example upstream releases foo version 1.2.3 and a package maintainer produces a Dsc foo_1.2.3-1.dsc, the package maintainer then refines their package and produces foo_1.2.3-2.dsc but the upstream project content doesn't change, so the orig.tar remains the same. As such we definitely can't include the orig.tar in the MultiArtifact. NOTE: (native) packages do not require an orig.tar and since 3.0 (quilt) there can be an arbitrary number of orig.tar.

Isn't it even harder? I remember that foo-1.2.3-1 and foo-1.2.3-2 can't have different orig tar gzs? Or was that a different tool, not the archive enforcing that?

It makes sense but I have failed in my search of the documentation and may have to revert to reading code. For as good as the Debian docs can be, they talk around the topic of required/optional/enforced when it comes to the files included in the Debian Source Control file. At any rate, I am near 100% sure the only file that could be included in a MultiArtifact approach would be the debian.tar/diff.tar. Any orig.tar need to exist on their own.

BTW I found an example of a package which does not have a debian.tar, 'apt'
http://ftp.ca.debian.org/debian/pool/main/a/apt/

So based on documentation and examples there can be 0, 1 or more orig.tar and these can be shared by multiple revisions of .dsc and .debian.tar/diff.tar and the debian.tar/diff.tar are optional.

I am about to push another update to the WIP with the final TODO items taken care of, code formatted and sanitized. Anyone can feel free to apply the work to their repo and give it a whirl and provide feedback. I will review the MultiArtifact over the weekend given the above constraints to hopefully be able to better answer this question.

@mdellweg
Copy link
Member

Not sure anymore, how useful it is, but you can first upload all the files as artifacts, and then upload a DSC file (which contains checksums for the others) and then the content unit is built containing all the artifacts.
Artifacts (aka tar-files) will be shared by multiple content units just fine.
The association which source package belongs to a repo version would imho be easier.

@masselstine
Copy link
Contributor Author

Not sure anymore, how useful it is, but you can first upload all the files as artifacts, and then upload a DSC file (which contains checksums for the others) and then the content unit is built containing all the artifacts.
Artifacts (aka tar-files) will be shared by multiple content units just fine.
The association which source package belongs to a repo version would imho be easier.

I am looking into this over the weekend and have it as a TODO on the latest commit log, so I am definitely exploring this. I can see how the associations would be easier so I agree it is worth the exploration.

@masselstine
Copy link
Contributor Author

Not sure anymore, how useful it is, but you can first upload all the files as artifacts, and then upload a DSC file (which contains checksums for the others) and then the content unit is built containing all the artifacts.
Artifacts (aka tar-files) will be shared by multiple content units just fine.
The association which source package belongs to a repo version would imho be easier.

I am looking into this over the weekend and have it as a TODO on the latest commit log, so I am definitely exploring this. I can see how the associations would be easier so I agree it is worth the exploration.

I didn't get the time over the weekend to look into this as much as I wanted, but did have some time today. In the process I discovered a few flaws in my approach and some ways to get us closer to using MulitpleArtifact if we choose to go that route. I have dropped the 'source_files' endpoint and now upload non-DSC source files as standard artifacts using the /pulp/api/v3/artifacts/ endpoint. I still have DscFile model and endpoint which gets associated with the artifacts based on the sha256 and size.

I am still not convinced MultipleArtifact is the way to go, but again this takes us closer to it so it can be explored. I will push an update tomorrow.

@quba42
Copy link
Collaborator

quba42 commented Jun 22, 2021

The linting can be somewhat nit-picky:

Run flake8 --config flake8.cfg
./pulp_deb/app/models/content.py:481:1: D107 Missing docstring in __init__
./pulp_deb/app/serializers/content_serializers.py:950:1: D102 Missing docstring in public method

@masselstine
Copy link
Contributor Author

The linting can be somewhat nit-picky:

Run flake8 --config flake8.cfg
./pulp_deb/app/models/content.py:481:1: D107 Missing docstring in __init__
./pulp_deb/app/serializers/content_serializers.py:950:1: D102 Missing docstring in public method

Yes it is. I have flake8 setup locally and it didn't catch this. I have corrected it but I have also changed a lot of code. I will push an update up tonight and see what comes up once you release the linter on it :)

@masselstine
Copy link
Contributor Author

Not sure anymore, how useful it is, but you can first upload all the files as artifacts, and then upload a DSC file (which contains checksums for the others) and then the content unit is built containing all the artifacts.
Artifacts (aka tar-files) will be shared by multiple content units just fine.
The association which source package belongs to a repo version would imho be easier.

I am looking into this over the weekend and have it as a TODO on the latest commit log, so I am definitely exploring this. I can see how the associations would be easier so I agree it is worth the exploration.

I didn't get the time over the weekend to look into this as much as I wanted, but did have some time today. In the process I discovered a few flaws in my approach and some ways to get us closer to using MulitpleArtifact if we choose to go that route. I have dropped the 'source_files' endpoint and now upload non-DSC source files as standard artifacts using the /pulp/api/v3/artifacts/ endpoint. I still have DscFile model and endpoint which gets associated with the artifacts based on the sha256 and size.

I am still not convinced MultipleArtifact is the way to go, but again this takes us closer to it so it can be explored. I will push an update tomorrow.

OK. I have dug deeper into this than I had planned. The MultipleArtifact, as far as I can tell, is for handling one or more Artifact+Content where the Content is homogeneous. We could treat all source files (.dsc, *tar) the same, ie. with one Content type, but we would still require a second Content type to host the .dsc file/paragraph meta-data. During synchronization or when associating multiple artifacts to the Content we would have to review all the Artifacts to figure out which one is the .dsc. On top of all this I see no way to reassemble to things when it comes time to publish a DSC paragraph. I will be the first to admit that I may be missing the forest for the trees, but I see no way to treat this in a single Content type. Just review the MultipleArtifactContentSerializer create() method

for relative_path, artifact in artifacts.items(): models.ContentArtifact.objects.create( artifact=artifact, content=content, relative_path=relative_path )

What I have done is made it so you can upload non-DSC files as an artifact and when you do upload the DSC file/or associate it via the dsc_file endpoint, the serialization code validates that the required artifacts are present and during create() it automatically associates the artifacts to the SourceFile content type (via ContentArtifact). This ensure the Artifact can't be deleted and I am working through the release and publishing workflows to ensure they can be handled correctly. Once I am happy with the changes I will post the code.

@quba42
Copy link
Collaborator

quba42 commented Jun 23, 2021

OK. I have dug deeper into this than I had planned. The MultipleArtifact, as far as I can tell, is for handling one or more Artifact+Content where the Content is homogeneous.

I am not sure if this is sufficient for the DSC file case, but we do have something similar for the ReleaseFile content unit, which is associated with multiple artifacts (any number of Release, InRelease, and Release.gpg), but one of them is retrievable as the "main artifact":

def main_artifact(self):

It is then up to the sync/upload code that creates or updates content units of this type to make sure that the correct artifact from the the list of artifacts can be retrieved as the "main artifact". Without having looked at it in too much detail, I could imagine something similar so as to keep the DSC artifact easily accessible from within a single multi artifact content unit. If that does not work, then it is probably better to create separate content units and associate them using foreign key relations.

@masselstine
Copy link
Contributor Author

OK. I have dug deeper into this than I had planned. The MultipleArtifact, as far as I can tell, is for handling one or more Artifact+Content where the Content is homogeneous.

I am not sure if this is sufficient for the DSC file case, but we do have something similar for the ReleaseFile content unit, which is associated with multiple artifacts (any number of Release, InRelease, and Release.gpg), but one of them is retrievable as the "main artifact":

def main_artifact(self):

It is then up to the sync/upload code that creates or updates content units of this type to make sure that the correct artifact from the the list of artifacts can be retrieved as the "main artifact". Without having looked at it in too much detail, I could imagine something similar so as to keep the DSC artifact easily accessible from within a single multi artifact content unit. If that does not work, then it is probably better to create separate content units and associate them using foreign key relations.

I don't believe this is the same. These fall into the homogeneous bucket as InRelease, Release, Release.gpg are the same across Content, Serializers and ViewSets.

@mdellweg
Copy link
Member

FWIW you will subclass the serializer anyway, and you can use the validate step to lookup and add all the other source file artifacts there and then ’super(). validate’ the rest. I have a strong suspicion this can work.
Btw I wouldn't call InRelease and Release homogenous.

@masselstine
Copy link
Contributor Author

As with many things I got busy with other work. I just wanted to drop a quick not to let folks know that this has not be forgotten. I should have time later this week to get back to this. Sorry for the delay.

@masselstine
Copy link
Contributor Author

FWIW you will subclass the serializer anyway, and you can use the validate step to lookup and add all the other source file artifacts there and then ’super(). validate’ the rest. I have a strong suspicion this can work.
Btw I wouldn't call InRelease and Release homogenous.

AFAICT I can overwrite the create() and deferred_validate() but I can't overwrite the field level validation in ContentArtifactsField.run_validation(). So an artifacts dict needs to be passed and can't be delayed until the dsc file contents are serialised. This provides a horrible UX as no only are all artifact URLs required but relative paths given instead of being derived. I can possibly kill the per-field validation but yet again that is defeating the point of using MultipleArtifact in the first place. I would really like to have someone take a stab at this as this might be chalked up to my lack of experience with Django.

@mdellweg
Copy link
Member

OK, i think, i know now where the confusion kicks in.
What you might want to do is something like this: https://github.com/pulp/pulp_gem/blob/master/pulp_gem/app/serializers.py#L48
The difference there being that the "spec_artifact" is produced in the serialitzation process, while for the dsc case it can rather be an artifact lookup.

And just to be sure, the serializers we are talking of here are really DRF (https://www.django-rest-framework.org/api-guide/serializers/), not plain django.

@masselstine
Copy link
Contributor Author

OK, i think, i know now where the confusion kicks in.
What you might want to do is something like this: https://github.com/pulp/pulp_gem/blob/master/pulp_gem/app/serializers.py#L48
The difference there being that the "spec_artifact" is produced in the serialitzation process, while for the dsc case it can rather be an artifact lookup.

And just to be sure, the serializers we are talking of here are really DRF (https://www.django-rest-framework.org/api-guide/serializers/), not plain django.

Thanks for pointing me at this. It actually comes very close to what I have attempted but there are a few differences that I will review to see if they resolve the complications I have hit. I had tried to find other examples in some of the other plugins but came up blank.

@quba42
Copy link
Collaborator

quba42 commented Aug 1, 2021

Some things we will need to do (possibly as follow on tasks once this is merged):

  • Update the plugin docs
  • Add some basic test coverage (will require new fixtures)
  • Is there a use case for synchronizing only source, but not binary packages from a remote repo? (Currently not possible)

I am just leaving these here for now, so I don't forget about it later. 😉

@daviddavis
Copy link
Contributor

@masselstine I haven't really worked with the Pulp fixtures before so I am not sure why there are duplicate fixtures in pulp-smash and pulpcore. I think @quba42 @hstct or @mdellweg might know.

@quba42
Copy link
Collaborator

quba42 commented Jun 26, 2023

I think others know the details better than me, but my understanding is that fixtures are moving into pulpcore, and we are working to get rid of all remaining dependencies to pulp-smash. => i.e. pulp-smash is deprecated.

@masselstine
Copy link
Contributor Author

I think others know the details better than me, but my understanding is that fixtures are moving into pulpcore, and we are working to get rid of all remaining dependencies to pulp-smash. => i.e. pulp-smash is deprecated.

Alright. I am going to have to prepare and submit my fixtures changes to pulpcore in that case then, as well as figure out how to adapt my Vagrant development box to use the pulpcore fixtures instead of pulp-smash in order to ensure everything is in order. Please standby, again.

@mdellweg
Copy link
Member

I think others know the details better than me, but my understanding is that fixtures are moving into pulpcore, and we are working to get rid of all remaining dependencies to pulp-smash. => i.e. pulp-smash is deprecated.

Alright. I am going to have to prepare and submit my fixtures changes to pulpcore in that case then, as well as figure out how to adapt my Vagrant development box to use the pulpcore fixtures instead of pulp-smash in order to ensure everything is in order. Please standby, again.

Yes, it's about stopping to use pulp-smash altogether. The issue is, until you do not even install smash anymore, there is no way to tell pytest which one to pick up. So we need these fixes in both, i believe.

@masselstine
Copy link
Contributor Author

I think others know the details better than me, but my understanding is that fixtures are moving into pulpcore, and we are working to get rid of all remaining dependencies to pulp-smash. => i.e. pulp-smash is deprecated.

Alright. I am going to have to prepare and submit my fixtures changes to pulpcore in that case then, as well as figure out how to adapt my Vagrant development box to use the pulpcore fixtures instead of pulp-smash in order to ensure everything is in order. Please standby, again.

Yes, it's about stopping to use pulp-smash altogether. The issue is, until you do not even install smash anymore, there is no way to tell pytest which one to pick up. So we need these fixes in both, i believe.

I put down my keyboard on the weekend when I wasn't quickly able to determine how the pulpcore fixtures were being picked up vs. the pulp-smash ones. I was hoping someone would have some wisdom to share on the subject. Obviously I won't plan on reverting my change I already have merged into pulp-smash, but I do need to test the change when applied to pulpcore. I am not picky on how, so if removing pulp-smash from my dev env does the trick, that will be what I do.

@mdellweg
Copy link
Member

mdellweg commented Jun 26, 2023

Removing pulp_smash from the plugin ci will need a bit more work than just not installing it, I'm sorry. But if we have a pulp-smash fixture with a bugfix, we may want the same in pulpcore for sure.

Sorry for the inconvenience. When we planned to susset pulp-smash we didn't anticipate for it to take years.

@masselstine
Copy link
Contributor Author

Removing pulp_smash from the plugin ci will need a bit more work than just not installing it, I'm sorry. But if we have a pulp-smash fixture with a bugfix, we may want the same in pulpcore for sure.

Sorry for the inconvenience. When we planned to susset pulp-smash we didn't anticipate for it to take years.

Up to this point I was only able to work for 15 minutes at a time with the CI container on GH in a tmate session. Now that I know the root cause I will be able to reproduce this in my dev env and test changes to the pulpcore fixtures. Attempting to just set the Content-Type as I did in pulp-smash raised another issue, so further work needs to be done before I can submit a fix to pulpcore. I will go ahead and raise a defect and try to get a fix out soon, and link it to this PR.

@masselstine
Copy link
Contributor Author

Removing pulp_smash from the plugin ci will need a bit more work than just not installing it, I'm sorry. But if we have a pulp-smash fixture with a bugfix, we may want the same in pulpcore for sure.
Sorry for the inconvenience. When we planned to susset pulp-smash we didn't anticipate for it to take years.

Up to this point I was only able to work for 15 minutes at a time with the CI container on GH in a tmate session. Now that I know the root cause I will be able to reproduce this in my dev env and test changes to the pulpcore fixtures. Attempting to just set the Content-Type as I did in pulp-smash raised another issue, so further work needs to be done before I can submit a fix to pulpcore. I will go ahead and raise a defect and try to get a fix out soon, and link it to this PR.

For anyone following along. It is possible to reproduce this issue using the vagrant development box. It comes down to pytest plugin load order. In the vagrant box pulp_smash is coming after pulpcore, in the CI test runs pulp_smash is coming before pulpcore. The last plugin included gets the last word, or in this case the implementation of functions such as add_recording_route().

This probably deserves a defect of its own as a somewhat random plugin load order such as this is going to result in extremely frustrating problem solving such as what has had to be done here. We should most likely attempt to enforce a known order.

Should you want to reproduce the issue being hit in CI run the following (note the -p option)

pytest -v -r sx --color=yes -p pulp_smash --suppress-no-test-exit-code pulp_deb/tests/functional/api/test_sync.py::test_sync -m parallel

The -p will act as an early load mechanism for the pulp_smash plugin, an thus pulpcore will be forced to be loaded after pulp_smash. And since pulpcore gets the last word and has yet to have been patched to properly handle Accept-Encoding (see #742) we hit the checksum mismatch failure.

@quba42
Copy link
Collaborator

quba42 commented Jul 6, 2023

@masselstine Can we simply add your fix for pulp-smash to pulpcore as well? Is there already a PR for that?

I suspect that the long term solution to the load order issue you describe will be to drop pulp-smash (which is planned, but we are not their yet)...

@mdellweg
Copy link
Member

mdellweg commented Jul 6, 2023

@masselstine Can we simply add your fix for pulp-smash to pulpcore as well? Is there already a PR for that?

I suspect that the long term solution to the load order issue you describe will be to drop pulp-smash (which is planned, but we are not their yet)...

I absolutely agree. Dropping pulp-smash was the plan all along and we of thought the transition would be easier if we copied the testing facilities over to pulpcore instead of giving them new names. Just the fact that the transition is taking much too long and the implementations started to diverge in the meantime is what troubles you. I am very sorry about that.
Not sure i really want to suggest that: Whatever is in the local conftest.py will win over every single pytest plugin...
So in theory reimplementing the add_recording_route fixture there may or may not solve your problem.

@masselstine
Copy link
Contributor Author

@quba42 yes. Now that I have a local reproducer I can prepare the same or similar fix for pulpcore. The change is as valid there as it was in pulp-smash, ie. no web server should be ignoring the Accept-Encoding of a client. With the reproducer available I can ensure my change is fully vetted before I send it for merge.

Look for me to raise the issue and send a PR over the next week or so to the pulpcore project.

adamsanaglo pushed a commit to adamsanaglo/pulpcore that referenced this pull request Jul 13, 2023
This PR adds the initial support for deb source packages, only the support for uploading and listing the sources packages is added in this PR, A later PR will add the support for publishing source packages.
This PR is based on the [upstream PR](pulp/pulp_deb#295), Creating a patch based on the changes, and adding the patch to our docker container image.
Here's a few examples of how the ccli can be used to manipulate source packages:

```
- pmc package upload ../../hello_sources/hello_2.10-2ubuntu2.dsc --source-artifact ../../hello_sources/
- pmc package deb_src list
- pmc package show --details content-deb-source_packages-f63f0696-919d-41e2-b023-fd7c9c995b19
```

A later PR will update the documentation, once we have the other part of the changes merged.

Related work items: #13082573
@masselstine
Copy link
Contributor Author

Folks. Sorry for the silence. Summer vacations and other obligations. I will be back on this shortly.

@hstct
Copy link
Contributor

hstct commented Aug 10, 2023

Folks. Sorry for the silence. Summer vacations and other obligations. I will be back on this shortly.

@masselstine no worries, but just a heads up. We have two PRs I expect to get merged possibly next week. One that will remove the pulp-smash dependency (#851) and another one that will refactor the tests (#857). These will probably be the last changes to the tests for (hopefully) a while so you will need to adapt yours a little bit.

I actually thought on helping out and port your change from pulp-smash to pulpcore and I have the change locally but had to work on other things and couldn't quite test it yet. I can continue to work on this eventually (but will take some time). Let me know if I should or if you will continue working on it. However whoever manages to open up a PR for pulpcore it would great if we can convince the team to backport this change to at least pulpcore 3.28 so we don't have to change the plugin requirements yet again 🙂

@masselstine
Copy link
Contributor Author

Folks. Sorry for the silence. Summer vacations and other obligations. I will be back on this shortly.

@masselstine no worries, but just a heads up. We have two PRs I expect to get merged possibly next week. One that will remove the pulp-smash dependency (#851) and another one that will refactor the tests (#857). These will probably be the last changes to the tests for (hopefully) a while so you will need to adapt yours a little bit.

I actually thought on helping out and port your change from pulp-smash to pulpcore and I have the change locally but had to work on other things and couldn't quite test it yet. I can continue to work on this eventually (but will take some time). Let me know if I should or if you will continue working on it. However whoever manages to open up a PR for pulpcore it would great if we can convince the team to backport this change to at least pulpcore 3.28 so we don't have to change the plugin requirements yet again 🙂

I am not fuss on who ports the change, so @hstct if you want to go for it, feel free. I will most likely get to this next week should you not get something sent by then.

@hstct
Copy link
Contributor

hstct commented Sep 26, 2023

@masselstine I finally got around to open the PR in pulpcore. pulp/pulpcore#4494

I tested it locally and hopefully this will resolve the test issue.

I had to build my own test data with source packages since we updated the /debian/ test data repository at some point. I got a workflow for creating a dummy source package so in that case we don't need to include a random ubuntu package to the repository.

@masselstine
Copy link
Contributor Author

@masselstine I finally got around to open the PR in pulpcore. pulp/pulpcore#4494

I tested it locally and hopefully this will resolve the test issue.

I had to build my own test data with source packages since we updated the /debian/ test data repository at some point. I got a workflow for creating a dummy source package so in that case we don't need to include a random ubuntu package to the repository.

Thank you so much for preparing that PR. I have been completely absent here. I wanted to get things done but simply had no time. At any rate, no excuse and I apologize for this.

I will find time to rebase my change and see about using your new dummy source package. Thanks for hopefully unsticking this for the last time (I will believe it when I see it). I assume once your PR is merged we might have to do some version updates to ensure the CI tests use the updates?

@hstct
Copy link
Contributor

hstct commented Sep 26, 2023

Thank you so much for preparing that PR. I have been completely absent here. I wanted to get things done but simply had no time. At any rate, no excuse and I apologize for this.

No worries :)

I will find time to rebase my change and see about using your new dummy source package.

I will open a PR with updated test data probably tomorrow which will include a source package repo.

Thanks for hopefully unsticking this for the last time (I will believe it when I see it). I assume once your PR is merged we might have to do some version updates to ensure the CI tests use the updates?

The change needs to be backported and released in pulpcore but that usually happens regularly nowadays. Otherwise we need to update the pulpcore requirements. I'm hoping for the first option but both will work.

@daviddavis
Copy link
Contributor

Awesome, looks like pulp/pulpcore#4494 is merged and has been backported to 3.28. 🎉

@hstct
Copy link
Contributor

hstct commented Sep 28, 2023

Awesome, looks like pulp/pulpcore#4494 is merged and has been backported to 3.28. 🎉

That's true but it needs some maintenance in pulp deb before the CIs are running again properly :)

As well I opened up a PR that adds source package test data #904

This will add two source packages to the standard test repository (/debian/) so the test included here probably need slight adaption.

@hstct
Copy link
Contributor

hstct commented Oct 16, 2023

@masselstine the test data PR #904 is merged. Let's hope the github CI will not find anything else to complain 🤞

fixes pulp#409
pulp#409
(was https://pulp.plan.io/issues/8775)

Allows the addition of Debian Source Packages, DSC files and any
associate source files (.orig.tar.gz, .debian.tar.xz, ...), as well as
synchronizing with a remote's source indices (sync_sources=True).
Publishing a distribution will make available DSC files, sources and
source indices in the repository, compliant with the Debian
rempository format.

The source files referenced in a dsc_file must be uploaded as
artifacts before the dsc_file so a typical workflow might be

--
apt-get source bc

http --form post $BASE/pulp/api/v3/artifacts/ \
  file@"/tmp/bc_1.07.1-2build1.debian.tar.xz"
http --form post $BASE/pulp/api/v3/artifacts/ \
  file@"/tmp/bc_1.07.1.orig.tar.gz"
http --form post $BASE/pulp/api/v3/artifacts/ \
  file@"/tmp/bc_1.07.1-2build1.dsc"
http --form post $BASE/pulp/api/v3/content/deb/source_packages/ \
  artifact=/pulp/api/v3/artifacts/3094de7e-da61-4cf6-ae75-a35b74472d9a/
--

Attempting to create the source_packages content from a DSC file
artifact before its source files are present as artifacts will result
validation errors.

Once source_packages content has been created it can be inspected via
its pulp_href or using the source_packages endpoint

http get $BASE/pulp/api/v3/content/deb/source_packages/

Synchronizing with a remote with "sync_sources=True" will synchronize
the Source Indicies files inspecting each contained paragraph to
download all associated DSC files and referced source files (note:
source file download may be deferred if remote policy="on_demand"),
performing a high level of data validation and creating all required
associations. As with uploading inspecting contents after a sync is
complete can be done with the same endpoints as above.

To remain spec compliant with https://wiki.debian.org/DebianRepository
"md5" must be present in the ALLOWED_CONTENT_CHECKSUMS. In all cases
the use of md5 is supplemental so security concerns around this
addition, at least for use with this feature, should be minimal.
[noissue]

related to:
pulp#409
(was https://pulp.plan.io/issues/8775)

The 'test_sync' test which already does sync tests on a remote repo
with and without installer (udeb) packages is extended to perform a
new round of testing which will include source packages.

This is a minimum set of tests for the new source package
support. Future commits will expand the testing of source package
support.

Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
Copy link
Contributor

@hstct hstct left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems everything is in order now 🎆

Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did the following using this state in oci-env:

alias http='http --auth admin:password'
PULP_URL='http://localhost:5001'

NAME='source-package-test'
REMOTE_OPTIONS=(
  url=http://ftp.de.debian.org/debian/
  distributions=bookworm
  components=contrib
  architectures=amd64
  sync_sources=true
  policy=on_demand
)
http post ${PULP_URL}/pulp/api/v3/remotes/deb/apt/ ${REMOTE_OPTIONS[@]} name=${NAME}
# pulp deb remote create --name=${NAME} --sync-sources ${REMOTE_OPTIONS[@]}
pulp deb repository create --name=${NAME} --remote=${NAME}
pulp deb repository sync --name=${NAME}
pulp deb publication --type=verbatim create --repository=${NAME}
pulp deb distribution create --name=${NAME}_verbatim --base-path=${NAME}_verbatim --repository=${NAME}
pulp deb publication create --repository=${NAME}
pulp deb distribution create --name=${NAME} --base-path=${NAME} --repository=${NAME}

Looking over the resultant Pulp repo version, as well as the published repos, everything looks sane to me. In the interest of getting this done, I think we should merge and handle anything further via follow up tasks. Possible follow up tasks include:

  • Adding a --sync-sources parameter for pulp-cli-deb remote creation.
  • Updating the documentation with something like the above example.

@quba42
Copy link
Collaborator

quba42 commented Oct 23, 2023

Reminder that this PR was first opened on Jun 15, 2021!
Thanks for persevering everyone!

@hstct hstct merged commit 464fc0a into pulp:main Oct 23, 2023
8 checks passed
@hstct
Copy link
Contributor

hstct commented Oct 23, 2023

Great job! 🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.feature CHANGES/<issue_number>.feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support "Source" indices and source level packages (*.orig.tar.gz, *.debian.tar.xz, *.dsc)
8 participants