-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial source file and source indices support #295
Conversation
Attached issue: https://pulp.plan.io/issues/8775 |
I had to approve running the tests... It looks like the linter wants you to add a file named:
containing a short changelog entry, perhaps something like:
Feel free to amend the wording since you most likely know more about Debian source packages than I do. 😉 |
11a608a
to
00aad33
Compare
I always figured, the dsc and the tarball to be part of a single two artifact content unit. Have you considered that? |
AFAIK the orig.tar can be used by multiple revisions of a DSC. For example upstream releases foo version 1.2.3 and a package maintainer produces a Dsc foo_1.2.3-1.dsc, the package maintainer then refines their package and produces foo_1.2.3-2.dsc but the upstream project content doesn't change, so the orig.tar remains the same. As such we definitely can't include the orig.tar in the MultiArtifact. NOTE: (native) packages do not require an orig.tar and since 3.0 (quilt) there can be an arbitrary number of orig.tar. This leaves just the debian.tar (format v3) or diff.tar (format v1). I agree that this does appear to be ripe for handling as a MultiArtifact. I believe that like the orig.tar that this file is optional but I can't actually dig up this information, would this cause any issue with using MultiArtifact? I admittedly had no experience with Django going into this and it could be this inexperience but I fail to see an way to implement this. The artifact handling should be easy enough, my difficulties are more about the additional meta-data handling that a simple artifact simply doesn't have. I have to go through my code today and cleanup the linter warning and such but I am 100% open to some assistance with taking what I have and making adjustments. |
Isn't it even harder? I remember that foo-1.2.3-1 and foo-1.2.3-2 can't have different orig tar gzs? Or was that a different tool, not the archive enforcing that? |
It makes sense but I have failed in my search of the documentation and may have to revert to reading code. For as good as the Debian docs can be, they talk around the topic of required/optional/enforced when it comes to the files included in the Debian Source Control file. At any rate, I am near 100% sure the only file that could be included in a MultiArtifact approach would be the debian.tar/diff.tar. Any orig.tar need to exist on their own. |
BTW I found an example of a package which does not have a debian.tar, 'apt' So based on documentation and examples there can be 0, 1 or more orig.tar and these can be shared by multiple revisions of .dsc and .debian.tar/diff.tar and the debian.tar/diff.tar are optional. I am about to push another update to the WIP with the final TODO items taken care of, code formatted and sanitized. Anyone can feel free to apply the work to their repo and give it a whirl and provide feedback. I will review the MultiArtifact over the weekend given the above constraints to hopefully be able to better answer this question. |
Not sure anymore, how useful it is, but you can first upload all the files as artifacts, and then upload a DSC file (which contains checksums for the others) and then the content unit is built containing all the artifacts. |
I am looking into this over the weekend and have it as a TODO on the latest commit log, so I am definitely exploring this. I can see how the associations would be easier so I agree it is worth the exploration. |
I didn't get the time over the weekend to look into this as much as I wanted, but did have some time today. In the process I discovered a few flaws in my approach and some ways to get us closer to using MulitpleArtifact if we choose to go that route. I have dropped the 'source_files' endpoint and now upload non-DSC source files as standard artifacts using the /pulp/api/v3/artifacts/ endpoint. I still have DscFile model and endpoint which gets associated with the artifacts based on the sha256 and size. I am still not convinced MultipleArtifact is the way to go, but again this takes us closer to it so it can be explored. I will push an update tomorrow. |
The linting can be somewhat nit-picky:
|
Yes it is. I have flake8 setup locally and it didn't catch this. I have corrected it but I have also changed a lot of code. I will push an update up tonight and see what comes up once you release the linter on it :) |
OK. I have dug deeper into this than I had planned. The MultipleArtifact, as far as I can tell, is for handling one or more Artifact+Content where the Content is homogeneous. We could treat all source files (.dsc, *tar) the same, ie. with one Content type, but we would still require a second Content type to host the .dsc file/paragraph meta-data. During synchronization or when associating multiple artifacts to the Content we would have to review all the Artifacts to figure out which one is the .dsc. On top of all this I see no way to reassemble to things when it comes time to publish a DSC paragraph. I will be the first to admit that I may be missing the forest for the trees, but I see no way to treat this in a single Content type. Just review the MultipleArtifactContentSerializer create() method
What I have done is made it so you can upload non-DSC files as an artifact and when you do upload the DSC file/or associate it via the dsc_file endpoint, the serialization code validates that the required artifacts are present and during create() it automatically associates the artifacts to the SourceFile content type (via ContentArtifact). This ensure the Artifact can't be deleted and I am working through the release and publishing workflows to ensure they can be handled correctly. Once I am happy with the changes I will post the code. |
I am not sure if this is sufficient for the DSC file case, but we do have something similar for the pulp_deb/pulp_deb/app/models/content.py Line 62 in 784cf71
It is then up to the sync/upload code that creates or updates content units of this type to make sure that the correct artifact from the the list of artifacts can be retrieved as the "main artifact". Without having looked at it in too much detail, I could imagine something similar so as to keep the DSC artifact easily accessible from within a single multi artifact content unit. If that does not work, then it is probably better to create separate content units and associate them using foreign key relations. |
I don't believe this is the same. These fall into the homogeneous bucket as InRelease, Release, Release.gpg are the same across Content, Serializers and ViewSets. |
FWIW you will subclass the serializer anyway, and you can use the validate step to lookup and add all the other source file artifacts there and then ’super(). validate’ the rest. I have a strong suspicion this can work. |
As with many things I got busy with other work. I just wanted to drop a quick not to let folks know that this has not be forgotten. I should have time later this week to get back to this. Sorry for the delay. |
AFAICT I can overwrite the create() and deferred_validate() but I can't overwrite the field level validation in ContentArtifactsField.run_validation(). So an artifacts dict needs to be passed and can't be delayed until the dsc file contents are serialised. This provides a horrible UX as no only are all artifact URLs required but relative paths given instead of being derived. I can possibly kill the per-field validation but yet again that is defeating the point of using MultipleArtifact in the first place. I would really like to have someone take a stab at this as this might be chalked up to my lack of experience with Django. |
OK, i think, i know now where the confusion kicks in. And just to be sure, the serializers we are talking of here are really DRF (https://www.django-rest-framework.org/api-guide/serializers/), not plain django. |
Thanks for pointing me at this. It actually comes very close to what I have attempted but there are a few differences that I will review to see if they resolve the complications I have hit. I had tried to find other examples in some of the other plugins but came up blank. |
b2f3d3c
to
872a505
Compare
Some things we will need to do (possibly as follow on tasks once this is merged):
I am just leaving these here for now, so I don't forget about it later. 😉 |
@masselstine I haven't really worked with the Pulp fixtures before so I am not sure why there are duplicate fixtures in pulp-smash and pulpcore. I think @quba42 @hstct or @mdellweg might know. |
I think others know the details better than me, but my understanding is that fixtures are moving into pulpcore, and we are working to get rid of all remaining dependencies to pulp-smash. => i.e. pulp-smash is deprecated. |
Alright. I am going to have to prepare and submit my fixtures changes to pulpcore in that case then, as well as figure out how to adapt my Vagrant development box to use the pulpcore fixtures instead of pulp-smash in order to ensure everything is in order. Please standby, again. |
Yes, it's about stopping to use pulp-smash altogether. The issue is, until you do not even install smash anymore, there is no way to tell pytest which one to pick up. So we need these fixes in both, i believe. |
I put down my keyboard on the weekend when I wasn't quickly able to determine how the pulpcore fixtures were being picked up vs. the pulp-smash ones. I was hoping someone would have some wisdom to share on the subject. Obviously I won't plan on reverting my change I already have merged into pulp-smash, but I do need to test the change when applied to pulpcore. I am not picky on how, so if removing pulp-smash from my dev env does the trick, that will be what I do. |
Removing pulp_smash from the plugin ci will need a bit more work than just not installing it, I'm sorry. But if we have a pulp-smash fixture with a bugfix, we may want the same in pulpcore for sure. Sorry for the inconvenience. When we planned to susset pulp-smash we didn't anticipate for it to take years. |
Up to this point I was only able to work for 15 minutes at a time with the CI container on GH in a tmate session. Now that I know the root cause I will be able to reproduce this in my dev env and test changes to the pulpcore fixtures. Attempting to just set the Content-Type as I did in pulp-smash raised another issue, so further work needs to be done before I can submit a fix to pulpcore. I will go ahead and raise a defect and try to get a fix out soon, and link it to this PR. |
For anyone following along. It is possible to reproduce this issue using the vagrant development box. It comes down to pytest plugin load order. In the vagrant box pulp_smash is coming after pulpcore, in the CI test runs pulp_smash is coming before pulpcore. The last plugin included gets the last word, or in this case the implementation of functions such as add_recording_route(). This probably deserves a defect of its own as a somewhat random plugin load order such as this is going to result in extremely frustrating problem solving such as what has had to be done here. We should most likely attempt to enforce a known order. Should you want to reproduce the issue being hit in CI run the following (note the -p option)
The -p will act as an early load mechanism for the pulp_smash plugin, an thus pulpcore will be forced to be loaded after pulp_smash. And since pulpcore gets the last word and has yet to have been patched to properly handle Accept-Encoding (see #742) we hit the checksum mismatch failure. |
@masselstine Can we simply add your fix for pulp-smash to pulpcore as well? Is there already a PR for that? I suspect that the long term solution to the load order issue you describe will be to drop pulp-smash (which is planned, but we are not their yet)... |
I absolutely agree. Dropping pulp-smash was the plan all along and we of thought the transition would be easier if we copied the testing facilities over to pulpcore instead of giving them new names. Just the fact that the transition is taking much too long and the implementations started to diverge in the meantime is what troubles you. I am very sorry about that. |
@quba42 yes. Now that I have a local reproducer I can prepare the same or similar fix for pulpcore. The change is as valid there as it was in pulp-smash, ie. no web server should be ignoring the Accept-Encoding of a client. With the reproducer available I can ensure my change is fully vetted before I send it for merge. Look for me to raise the issue and send a PR over the next week or so to the pulpcore project. |
This PR adds the initial support for deb source packages, only the support for uploading and listing the sources packages is added in this PR, A later PR will add the support for publishing source packages. This PR is based on the [upstream PR](pulp/pulp_deb#295), Creating a patch based on the changes, and adding the patch to our docker container image. Here's a few examples of how the ccli can be used to manipulate source packages: ``` - pmc package upload ../../hello_sources/hello_2.10-2ubuntu2.dsc --source-artifact ../../hello_sources/ - pmc package deb_src list - pmc package show --details content-deb-source_packages-f63f0696-919d-41e2-b023-fd7c9c995b19 ``` A later PR will update the documentation, once we have the other part of the changes merged. Related work items: #13082573
Folks. Sorry for the silence. Summer vacations and other obligations. I will be back on this shortly. |
@masselstine no worries, but just a heads up. We have two PRs I expect to get merged possibly next week. One that will remove the I actually thought on helping out and port your change from |
I am not fuss on who ports the change, so @hstct if you want to go for it, feel free. I will most likely get to this next week should you not get something sent by then. |
@masselstine I finally got around to open the PR in pulpcore. pulp/pulpcore#4494 I tested it locally and hopefully this will resolve the test issue. I had to build my own test data with source packages since we updated the |
Thank you so much for preparing that PR. I have been completely absent here. I wanted to get things done but simply had no time. At any rate, no excuse and I apologize for this. I will find time to rebase my change and see about using your new dummy source package. Thanks for hopefully unsticking this for the last time (I will believe it when I see it). I assume once your PR is merged we might have to do some version updates to ensure the CI tests use the updates? |
No worries :)
I will open a PR with updated test data probably tomorrow which will include a source package repo.
The change needs to be backported and released in pulpcore but that usually happens regularly nowadays. Otherwise we need to update the pulpcore requirements. I'm hoping for the first option but both will work. |
Awesome, looks like pulp/pulpcore#4494 is merged and has been backported to 3.28. 🎉 |
That's true but it needs some maintenance in pulp deb before the CIs are running again properly :) As well I opened up a PR that adds source package test data #904 This will add two source packages to the standard test repository ( |
@masselstine the test data PR #904 is merged. Let's hope the github CI will not find anything else to complain 🤞 |
fixes pulp#409 pulp#409 (was https://pulp.plan.io/issues/8775) Allows the addition of Debian Source Packages, DSC files and any associate source files (.orig.tar.gz, .debian.tar.xz, ...), as well as synchronizing with a remote's source indices (sync_sources=True). Publishing a distribution will make available DSC files, sources and source indices in the repository, compliant with the Debian rempository format. The source files referenced in a dsc_file must be uploaded as artifacts before the dsc_file so a typical workflow might be -- apt-get source bc http --form post $BASE/pulp/api/v3/artifacts/ \ file@"/tmp/bc_1.07.1-2build1.debian.tar.xz" http --form post $BASE/pulp/api/v3/artifacts/ \ file@"/tmp/bc_1.07.1.orig.tar.gz" http --form post $BASE/pulp/api/v3/artifacts/ \ file@"/tmp/bc_1.07.1-2build1.dsc" http --form post $BASE/pulp/api/v3/content/deb/source_packages/ \ artifact=/pulp/api/v3/artifacts/3094de7e-da61-4cf6-ae75-a35b74472d9a/ -- Attempting to create the source_packages content from a DSC file artifact before its source files are present as artifacts will result validation errors. Once source_packages content has been created it can be inspected via its pulp_href or using the source_packages endpoint http get $BASE/pulp/api/v3/content/deb/source_packages/ Synchronizing with a remote with "sync_sources=True" will synchronize the Source Indicies files inspecting each contained paragraph to download all associated DSC files and referced source files (note: source file download may be deferred if remote policy="on_demand"), performing a high level of data validation and creating all required associations. As with uploading inspecting contents after a sync is complete can be done with the same endpoints as above. To remain spec compliant with https://wiki.debian.org/DebianRepository "md5" must be present in the ALLOWED_CONTENT_CHECKSUMS. In all cases the use of md5 is supplemental so security concerns around this addition, at least for use with this feature, should be minimal.
[noissue] related to: pulp#409 (was https://pulp.plan.io/issues/8775) The 'test_sync' test which already does sync tests on a remote repo with and without installer (udeb) packages is extended to perform a new round of testing which will include source packages. This is a minimum set of tests for the new source package support. Future commits will expand the testing of source package support. Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems everything is in order now 🎆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did the following using this state in oci-env:
alias http='http --auth admin:password'
PULP_URL='http://localhost:5001'
NAME='source-package-test'
REMOTE_OPTIONS=(
url=http://ftp.de.debian.org/debian/
distributions=bookworm
components=contrib
architectures=amd64
sync_sources=true
policy=on_demand
)
http post ${PULP_URL}/pulp/api/v3/remotes/deb/apt/ ${REMOTE_OPTIONS[@]} name=${NAME}
# pulp deb remote create --name=${NAME} --sync-sources ${REMOTE_OPTIONS[@]}
pulp deb repository create --name=${NAME} --remote=${NAME}
pulp deb repository sync --name=${NAME}
pulp deb publication --type=verbatim create --repository=${NAME}
pulp deb distribution create --name=${NAME}_verbatim --base-path=${NAME}_verbatim --repository=${NAME}
pulp deb publication create --repository=${NAME}
pulp deb distribution create --name=${NAME} --base-path=${NAME} --repository=${NAME}
Looking over the resultant Pulp repo version, as well as the published repos, everything looks sane to me. In the interest of getting this done, I think we should merge and handle anything further via follow up tasks. Possible follow up tasks include:
- Adding a
--sync-sources
parameter for pulp-cli-deb remote creation. - Updating the documentation with something like the above example.
Reminder that this PR was first opened on Jun 15, 2021! |
Great job! 🎆 |
fixes #8775
https://pulp.plan.io/issues/8775
Allows the upload of source files (orig.tar, debian.tar) and Debian
Source Control files (dsc files), as well as synchronizing with a
remote's source indices (sync_sources=True).
The source_files referenced in a dsc_file must be uploaded before the
dsc_file so a typical workflow might be
--
apt-get source bc
http --form post $BASE/pulp/api/v3/content/deb/source_files/
file@"bc_1.07.1.orig.tar.gz" name=bc_1.07.1.orig.tar.gz
http --form post $BASE/pulp/api/v3/content/deb/source_files/
file@"bc_1.07.1-2build1.debian.tar.xz" name=bc_1.07.1-2build1.debian.tar.xz
http --form post $BASE/pulp/api/v3/content/deb/dsc_files/
file@"bc/bc_1.07.1-2build1.dsc"
Attempting to upload the dsc file before its source files will result
validation errors.
Once uploaded source_files and dsc_files can be inspected via their
pulp_href or using the respective endpoints
http get $BASE/pulp/api/v3/content/deb/dsc_files/
http get $BASE/pulp/api/v3/content/deb/source_files/
Synchronizing with a remote with "sync_sources=True" will synchronize
the Sources files as well as download all associated source_files and
dsc_files, performing a high level of data validation and creating all
required associations. As with uploading inspecting contents after a
sync is complete can be done with the same endpoints as above.
To remain spec compliant with https://wiki.debian.org/DebianRepository
"md5" must be present in the ALLOWED_CONTENT_CHECKSUMS. In all cases
the use of md5 is supplemental so security concerns around this
addition, at least for use with this feature, should be minimal.
TODO: publishing - this work has been started and will be completed
over the coming days.