Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Backing up an offline Build/Sources of a recipe/package #6876

Closed
1 task done
prince-chrismc opened this issue Apr 17, 2020 · 41 comments
Closed
1 task done
Assignees
Labels

Comments

@prince-chrismc
Copy link
Contributor

How could I go about creating an entire backup of the `conan install .. --build` process?

To be more precise, The entirety of the conan cache, including the short_paths location, and the build dependencies.

My requirement: In 8 years, be able to bug fix the source code an open-source project we are leveraging, re-compile for a short list of platforms, then use the result of what was generated with conan.

I would like to be able to download the backup, edit the source code run conan commands like build, package, export, upload and end up with the bug fixed package I am able to use.

I am installing almost uniquely packages from conan-center-index, with a few provided from the bincrafters remote (until my PRs get merged 😄 : ).

What I've seen in the docs, there's are a large number of issues with the same message, run a local Artifactory or conan server to store/host the packages. Which is perfect for any real world use case and everyday requirements. It meets my needs as I am able to build and deliver my software even if the internet would be shutoff or more realistically there are service disruptions.

After several weeks, we have not been able to develop an internal tool for our needs. Hopefully someone from the Conan.io team is able to provide some guidance.

@memsharded
Copy link
Member

This is a use case that has been going around, and we have been implementing different strategies according to what the users were requesting.

  • First, the sources were embedded in the recipes. You put the conanfile.py in the repo you want to package, do exports or exports_sources, and that "snapshot" the sources inside the recipe. Problem solved, that recipe and package in the server is reproducible for ever
  • Then users said that they didn't want to snapshot the sources inside the package, that it was using extra storage in the server. So the source() method that can "git clone" or "tools.download()" sources from elsewhere was done, so the conanfile.py recipe and the sources were in different locations.
  • Then users also said that wanted to have the same, but with the recipe in the same repo as the source code again, but not "snapshotting" and embedding the sources in the recipe, but just capturing the git commit. The scm functionality is able to do that.
  • For recipes in ConanCenter, that they are pointing to another location of the sources, the tools.download() or git clone strategies are used. But then users wanted to have their own copy of the sources in-house in a fork or own http server, to guarantee their isolation and independence. So we moved the data for those operations to the conandata.yml file, so users could use that file to just replace the URLs to the source code, and make it simpler to point to their own location.

There are users of ConanCenter that wouldn't like the idea that those packages are being built from sources that do not come from the "official" library git repository or homepage, and they would be unhappy that suddenly the packages are built from a copy hosted by ConanCenter. And yes, on the other hand, there is this request that ConanCenter packages should provide a backup mechanism in case the original source code is removed from the internet.

IMHO, it is a bit out of scope for a package manager 😅 It is very difficult, not to say impossible to satisfy sometimes completely opposite requirements: some want robustness at any price, while others prefer to use the official sources and be able to have their forks.

But I understand this question keeps coming from time to time. We need to improve the way in recipes from ConanCenter are adopted for more enterprise cases. So my suggestion would be:

  • If you depend directly on ConanCenter, and do not have your own packages in your own server, you can rely on the community of contributors. In case some sources are removed from the internet, they will be fixing the recipes in ConanCenter.
  • If you have your own packages, built from recipes from ConanCenter that you build yourself and upload to your own server, and are concern of reproducibility and sources being removed from the internet:
    • Grab a copy of the source artifacts from the internet and put them in a generic repository in Artifactory, or a fork in your Git server.
    • Change your conandata.yml url address to point to your server.

Wouldn't this be enough for your use case? Please tell me if this helps.

We will be discussing this idea of a sources backup when we find some time to study the enterprise adoption of recipes from ConanCenter, but this will take some time. Thanks for the feedback!

@prince-chrismc
Copy link
Contributor Author

I greatly appreciate you taking the time to answer. Conan has become a corner stone for us to start questioning the way we are working, and over coming issues like "we are 100% covered in all scenarios" no matter how (un)reasonable the ask are crucial for us to more forward.

IMHO, it is a bit out of scope for a package manager It is very difficult, not to say impossible to satisfy sometimes completely opposite requirements: some want robustness at any price

I could not agree more, perhaps a commercial JFrog product could fill this gap? An Artifactory extension to backup sources of sorts? This would certainly have a market.

Grab a copy of the source artifacts from the internet and put them in a generic repository in Artifactory

👍 Never crossed my mind but that would certainly fill part of the demands. The challenge with this is knowing how to get the sources. To my knowledge there's no conan command to download the conandata.yml. I was able to download the recipe with [conan get](https://docs.conan.io/en/latest/reference/commands/consumer/get.html) but re-reading this I could get the data as well?

A: 💯 Yes!

>conan get zlib/1.2.11 conandata.yml
sources:
  1.2.11:
    sha256: c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1
    url: https://zlib.net/zlib-1.2.11.tar.gz

a fork in your Git server.

Sadly there is not an easy way to tie a recipe revision to a commit in that fork. It would be very daunting to re-calculate the revision over every commit or at least very time consumming.

We absolutely require the rrev, sometimes the packages from a updated recipe are not available right away, other times the updates require a newer version of conan which has not been deployed.

You put the conanfile.py in the repo you want to package, do exports or exports_sources, and that "snapshot" the sources inside the recipe.

It would be brilliant if this was a command conan offline zlib/1.2.11 which produced a tar.gz for instance and modify the recipe to use the current directory. Uncompromising that artifact and calling conan create . would get me a local copy I could use in my build tree. Even better, id have the sources locally to perform the "dooms day" bug fix before creating a new package.

If you could point me to an example where this was done in a recipe, I would love to try out this idea. Just the repo will be enough, I'll dig through it's history. 🚧

I see a few calls to conan get, conan source, conan export... I'll have everything locally.

@memsharded
Copy link
Member

It would be brilliant if this was a command conan offline zlib/1.2.11 which produced a tar.gz for instance and modify the recipe to use the current directory. Uncompromising that artifact and calling conan create . would get me a local copy I could use in my build tree. Even better, id have the sources locally to perform the "dooms day" bug fix before creating a new package.

I am thinking that the exports() method or exports_sources() method that have been requested as a more powerful syntax for the exports and exports_sources() attribute would probably be useful for that. If in those methods we put a tools.download() or git clone that will certainly capture the sources in the recipe in Artifactory. That might be an idea worth exploring, labeling this for discussion with the team.

@memsharded memsharded added this to the 1.26 milestone Apr 20, 2020
@prince-chrismc
Copy link
Contributor Author

Thank you very much for taking this onto your plate. Being stuck at home, I have some extra cycles to spare, more then willing to help 😄

@memsharded
Copy link
Member

Thank you very much for taking this onto your plate. Being stuck at home, I have some extra cycles to spare, more then willing to help 😄

Nice to know. This wouldn't be a very straightforward issue, but if you are willing to give it a try, that would be very welcome!

The heavy part should probably be in conans/client/cmd/export.py, but feel free to ask for guidance any time if you finally try 😄

@solvingj
Copy link
Contributor

Hi @prince-chrismc , I would like to work with you on this feature. Please let me know if you've started anything already.

@prince-chrismc
Copy link
Contributor Author

Alright, so Ive been playing with this a bit and I've come up with this so far. I start with a simple recipe package to do my testing against.

I added an option to conan export which will trigger the "offline-ing process"

  1. modify the recipe ( converts to an local source variant )
    • adds version
    • adds exports_sources
    • removes original source command
  2. export new recipe variant
  3. calls original source() method
  4. moves source_subfolder

I did however hardcode the 'dst' folder since I was not able to figure out how to add a new folder to the package_layout object.

your leads were excellent =)

@prince-chrismc
Copy link
Contributor Author

@memsharded , It would be nice if you could take a few minutes when you have the time and share your thoughts on "editing recipes" with the Conan's API.

Thanks in advance.

@memsharded
Copy link
Member

Hi @prince-chrismc

I have had a look to the diff, and I have some feedback:

  • It is important NOT to do changes to the conanfile.py recipe. We did this once (for the scm feature) and it was a lot of pain. We have introduced a new way to capture the scm data into the conandata.yml file instead. Please do not modify the recipe.
  • There shouldn't be an argument --offline. The functionality is enabled by defining new methods in the recipe, lets call them at the moment do_exports() and do_exports_sources() to avoid conflict with the attributes, we can discuss later about the naming
  • Conan is already capturing and uploading the sources, so whatever is put into the exports_sources folder will be zipped and uploaded to the server in a conan_sources.tgz. There is no need for a new OFFLINE folder.
  • The "recipes" folder cannot be part of a PR. The functionality should be tested in adequate, as minimal as possible unit tests. As this might require a bit more of knowledge, if it is too much, we can help with that.

@solvingj has been also working on this, I suggest to get in touch and share efforts.

Thanks very much for your effort!

@solvingj
Copy link
Contributor

solvingj commented May 1, 2020

Hi @prince-chrismc I started looking into this a few days ago and I reviewed your implementation. After speaking with Conan team, it seemed that adding a new function was being considered as an option, so that obviously led me to a very different implementation from what you had. Thanks very much for exploring the approach that you did. It was very resourceful. As @memsharded said, some of the tactics are unsupportable, but it's good to think outside the box and try things when you are new.

So, in summary, the new approach just lets you do tools.download() inside the new method, and puts the sources inside the export_sources directory. Everything else works exactly the same as exports_sources . If you haven't used that before, it just means that the sources are always uploaded/downloaded with the conanfile.py (in a slightly different folder however). When build method is run, they are copied in at the start (same behavior as if you were using a source method.

Let us know if this new implementation will probably work for you based on that info.

@prince-chrismc
Copy link
Contributor Author

Thank you very much for taking the time to answer.

I am glad I broke so many rules. 🤣 I have been trying to figure out a quick solution for my team to use. Which is manually editing the recipes and keeping those in a local repo. Playing in the source also helps with writing recipes for CCI, very worth while.

I certainly learnt a lot (considering how tiny this one section is) hopefully I will be more helpful in future work.

Checking out the references PR I much better understand

The functionality is enabled by defining new methods

I appreciate the positive feedback!

I will take the time to try out the new features to see if I can obtain my goal. Hopefully it an provide some feedback 👍

@solvingj
Copy link
Contributor

solvingj commented May 2, 2020

if you are not in slack already, i suggest joining. it's a good place to ask a lot of fundamental questions when you're trying to bootstrap your knowledge about a tool and engineer a real-world solution with it at the same time. https://cpplang.now.sh #conan channel

@prince-chrismc
Copy link
Contributor Author

I was starting my work on an external tool, and i discovered python_requires I made a test that very positive.

Do you think extending a recipe would be a possible solution?

@memsharded
Copy link
Member

Do you think extending a recipe would be a possible solution?

It is possible to extend a Conanfile class contained in a python_requires with the python_requires_extend functionality. What is not possible is to extend a full recipe/package, because there are things that cannot be "extended" like the packaged sources, references to SCM. So python_requires are limited to pure python code. The python_requires recipes should be just exported with conan export command, but not create a full package with conan create.

@prince-chrismc
Copy link
Contributor Author

Ahhh, thank you for the insight!

@memsharded
Copy link
Member

We have merged #6943, which provide support for methods. but please read: #6474 (comment)

As the changes might be more part of the infrastructure of ConanCenter at the moment (the complexity of the feature, that involves changing server-side and protocols, would be excessive for the value), it is impossible to do more for 1.26, so removing milestone atm.

@memsharded memsharded removed this from the 1.26 milestone May 27, 2020
@prince-chrismc
Copy link
Contributor Author

prince-chrismc commented May 27, 2020

I agree, meeting this use case is very complex. I believe the work from #6943 is a great improvement. I am sure in the future this issue will be closed but for now removing the 1.26 milestone is good.

export() methods do not have access to settings/options. They are not defined yet at export time, as at that time, only the bare recipe is processed, and profile/settings/options are not even an argument, as the recipe must be common to all configurations

I think the only impediment, for many of the related issues, is the absence of settings/options. Without knowing the version there is little option for downloading sources.

However with the increased functionality, with the FileCopier in particular, the packages export have the sources which removes some burden on offline locations.

I suspect this might be a V2.0 idea, but a "reusable" recipe would be the continuation of the work done for 1.26. Having the option to "re-use" the source that have already been downloaded or had previously been exported would make an "installed/created" package usable offline.

@memsharded memsharded added this to the 1.29 milestone Aug 3, 2020
@memsharded
Copy link
Member

I have a potential idea. I had forgot about the Conan download cache. If we could build a remote backend for this cache, this could possibly work amazingly. Conan 1.29 is too close, lets try to look into it a bit in next iteration.

@memsharded memsharded modified the milestones: 1.29, 1.30 Aug 31, 2020
@ytimenkov
Copy link
Contributor

I don't see how it can work... Usually build servers don't have internet access it needs to go through artifactory anyways.

Another thing with sources it's straight-forward to audit the code. If it's hidden behind an obscure hash it's trickier...

@memsharded
Copy link
Member

I don't see how it can work... Usually build servers don't have internet access it needs to go through artifactory anyways.

Well, if it is a backup of the original sources (lets say a github release .zip), it should be backuped somewhere else. If the build servers don't have internet access, how will they download the original sources in the first place?

The download cache has all the information, including the URLs, the hashes, etc. Of course that information needs to be part of the remote backend of the download cache.

Lets clarify what I understood we are trying to provide here:

  • We want to achieve reproducibility of packages, being able to re-build them (possibly many years) later.
  • I assume that the recipes are already backed-up. They are source and they will live in git somewhere.
  • The possible point of failure that the OP is pointing is a recipe that downloads from an external URL, like a download link of a .zip file from a Github release of the original open source library, something that is fully out of control of the users. The library authors might take down the repo, and then the sources URLs would be broken.
  • If the conan create and conan install processes, whenever they download something from the internet, they can store the URL and the artifact somewhere, which they could intercept later, then the problem would be solved.
  • This is already implemented by the file download cache. If we build a backend to this, then problem solved. I am not saying it would be easy, or even if it is possible, we need to check. But conceptually seems like a nice solution. The remote backend part would store and make transparent the original URL, no need to hide it.

@memsharded memsharded modified the milestones: 1.32, 1.33 Nov 18, 2020
@memsharded memsharded modified the milestones: 1.32, 1.33 Nov 29, 2020
@jgsogo
Copy link
Contributor

jgsogo commented Dec 22, 2020

Hi! We are doing some progress related to this feature request. I wanted to share it with you to see if our proposal makes sense or we need to pivot and use a different approach.

We think that creating a backup of the sources used by a recipe is probably an enterprise feature, for the same reason you want to build all the binaries from sources, you want to store the sources themselves. In a company, this backup is probably performed only by one privileged user (or the CI) itself: only one upload packages to the server, only one backup the files,... all the developers consume these uploaded packages and these stored files.


Our first proposal takes advantage of the download cache (#8211).

From the developer point of view:

  • developer configures its Conan client (typically they will use a shared conan config install <shared-configuration>):
    • it will contain an entry for storage.sources_backup in `conan.conf``
    • or, activate it manually: conan config set storage.sources_backup=<https://some-remote/storage>
  • developer builds something from sources: conan install zlib/1.2.11@ --build, without modifying recipes at all, Conan will look for sources in the configured https://some-remote/storage before retrieving them from internet.

Usage is totally transparent for the developers. We could add some safety-check to raise if any file cannot be found in the internal remote (not implemented right now).

Privileged user (maybe the CI), the one that can upload packages and files to the internal server:

  • configures Conan: it can used the shared configuration with conan config install ...:
    • activates download cache: conan config set storage.download_cache=<local-folder>.
    • activates remote storage: conan config set storage.sources_backup=<https://some-remote/storage>
  • builds packages from sources: conan install zlib/1.2.11@ --build
  • uploads packages to remotes to make them available to other colleage developers.
  • Now an extra step, files need to be uploaded explicitly to the internal server. In the <local-folder> there is a file that will help us, ATM, we will implement a JFrog CLI plugin to provide a command like jfrog conan backup-sources --repo=remote-repo that will upload files in the local cache to a generic Artifactory repository.

No need to modify recipes at all, sources will be stored in the internal server.


We think that requiring this extra step is ok, we think that the backup process can be something explicit, and it is something done only by one person (or a centralized server) with enough privileges and then every user/consumer takes advantage of it. If source files and/or contents need to be audited, then it is ok if the upload is not a built-in feature in the Conan client, and using JFrog CLI is a very convenient way to implement it (handling credentials, other Artifactory related features,...).

This should work out-of-the-box with recipes from ConanCenter as they are only downloading tarballs, but this process is not suitable for recipes that are cloning a repository (this will wait for a second iteration).


What do you think about this approach? does it satisfy what you have in mind? Is it far away from your scenarios? We are still designing the feature and any feedback is welcome. Thanks!

@prince-chrismc
Copy link
Contributor Author

I really appreciate the community focused approach and I enjoy discussion solutions.

✔️ on the extra step, I'd argue it's better. By breaking up the local development flow from the release/tag workflow (where backups are required). Backups are certainly not something that are required on every single commit.

This conceptually matches with what's been deployed, however there's a few technical challenges which we encountered.

  • graph resolution from --build all
  • mapping the data from conandata.yml to the remote server (generic Artifactory)

I assume Conan can handle the graph since it creates it. From our perspective we are trying to leverage the base lockfile where ever possible but we have not revisited the backups.
However the jfrog-cli would just upload everything? how does it know what is from the last conan install and what's from last week's run?

Does the download cache preserve the file path of the remote sources?
The first challenge we hit was duplicates, where by the filename was the same. Since nearly everything comes from GitHub we have not had any problems to date but as more OSS gets pulled in two different "internet sources" can have the same path.
Trying it locally did not have files as I had expected it, so I am not sure how it uploaded.

@jgsogo
Copy link
Contributor

jgsogo commented Dec 23, 2020

Actually, the download cache does most of the job and it is already working. This download-cache uses a hash algorithm that computes sha256 of url+checksum, the resulting hash is the filename used to store the files in the download-cache-folder. Here, Conan caches everything downloaded from internet: sources, recipe files, package files,... This new internal-server backup/cache will use that same_ hashed_ filenames to look for the artifacts in the internal server.

This internal-server-cache will be used only for the tools.get/download tools, the ones that retrieve sources, not for the calls that Conan is doing to Conan servers (those listed in conan remote list). I mean, it will try to retrieve from the internal-server only files that come from the internet, not files that come from Conan servers.

#8211 adds one file (.cached_files) to the download cache, it contains a mapping between those local hashed filenames and the original URL. JFrog CLI command will use this .cached_files file to know which files to upload: only those that come from servers that are not listed with conan remote list. The current implementation will upload all files that match this criterion, of course, the JFrog CLI command could check timestamp and so on if needed (we will make it OSS once we have a first working version ready).

@ytimenkov
Copy link
Contributor

@jgsogo I think my major disagreement is with your statement:

In a company, this backup is probably performed only by one privileged user (or the CI) itself: only one upload packages to the server, only one backup the files,..

Even though CI can backup files it's not practical:

  1. CI simply don't have access to the internet. 🤷‍♂️
  2. Even though CI can upload cache it's bit tedious because for third-party libraries CI only builds a matrix so there will be a number of similar jobs, unclear which one should update the cache or how to address concurrency... So even if there is only one user it's not a single job.
  3. How to differentiate whether to use cache or not? If package is (re)built later and something happens to be downloaded - the situation we want to avoid to not have inconsistencies with previously built packages...

Also download cache doesn't address (nicely) case if sources need to be patched.

Another thoughts on the subject (based on how we use Conan):

  1. There is no conan-center remote defined, only private ones.
  2. It is usually developer who tries the package/recipe first locally: to both check that package builds, works and solves the task.
  3. Based on my previous experience third-party libraries may go through Legal review (or similar) before can be used in production. (There is a shortcut as a separate "staging" / "uncleared" repo to unblock development, but eventually before merging the update all TPS must be approved for use. Some packages may require more thorough scanning if different components or files are licensed differently). So there must be a convenient way for human to inspect final snapshot of sources.
  4. Based on the above introducing a recipe is a process to be automated and patching conandata.yml and uploading sources into a generic repository looked like a good solution.
  5. JFrog CLI is another dependency which needs to be taken into account. So far it was convenient that Conan could be the only requirement which bootstraps everything else.
  6. What about non-Artifactory remotes? So far Conan was open in this regard...
  7. If files are uploaded have names like in cache (sha256 sums) how to track which package uses which file? This is required to cleanup orphaned files (e.g. discarded experiments) or promote packages into different repository (it will be an unpleasent surprise if recipe and binaries are moved to "release" repository but sources are left behind and got removed).

@prince-chrismc
Copy link
Contributor Author

#8211 adds one file (.cached_files) to the download cache, it contains a mapping between those local hashed filenames and the original URL. JFrog CLI command will use this .cached_files file to know which files to upload: only those that come from servers that are not listed with conan remote list.

This is the detail that I was missing, .cached_files mechanism certainly address the shortcomings we experienced. It's very feasible to start with a clean download_cache and upload everything not from a conan remote.

Recovery procedure would be as follows:

  • download the entire download_cache to /some/path (there's no easy way to know which sha is required 🤔 )
  • configure client with conan config set storage.download_cache=/some/path
  • unplug all the ethernet connections
  • run conan install conanfile.py [...] --build all everything should be consumed locally

Correct me if I am wrong!

Also download cache doesn't address (nicely) case if sources need to be patched.

That's very true.

It's far more likely we will need to add compiler support to a project in 5-7 years that was not originally compiled and/or supported. I think this is a first step where Conan can actually gather everything locally and reuse it at a later time.

How does a consumer "unpack" the ~.conan/cache to edit a recipe? Perhaps the next topic.

@jgsogo
Copy link
Contributor

jgsogo commented Dec 23, 2020

First, answering @prince-chrismc

This is not a recovery procedure. Every developer can have conan config set storage.sources_backup=<https://some-remote/storage> in their configuration, Conan will grab the sources from the internal-server before going to the internet to get them. There is no drawback if you have it activated always.

Patching sources: Conan workflow doesn't change, Conan will get the sources (from the internal-server or from the internet) and then will apply the patches (patches exported together with the recipe or patches retrieved from the internal-server or from the internet).

The implementation of this internal-server-cache is pretty straightforward: Conan will try with the internal-server before going to look for those files on the internet.

So, after someone has uploaded the files to the internal-servers and the recipes to your internal Conan remote, you can switch-off the internet and run conan install zlib/1.2.11@ --profile=any-config -r my-conan-server --build=zlib and it will succeed.

@ytimenkov
Copy link
Contributor

How does a consumer "unpack" the ~.conan/cache to edit a recipe? Perhaps the next topic.

I see that many recipes in cci just apply all patches from the source folder. So in "patching conandata.yml" case it's a matter of dropping more patches. Of course you need to rebuild, but it's fairly straightforward and the recipe itself is untouched (as I understand is the main purpose is to avoid modifying conanfile.py's source section). And can be easily automated (updating yaml doesn't compare to patching python code).

@jgsogo
Copy link
Contributor

jgsogo commented Dec 23, 2020

@ytimenkov here there are different topics, I will answer some of them inline:

@jgsogo I think my major disagreement is with your statement:

In a company, this backup is probably performed only by one privileged user (or the CI) itself: only one upload packages to the server, only one backup the files,..

Even though CI can backup files it's not practical:

  1. CI simply don't have access to the internet. 🤷‍♂️
  2. Even though CI can upload cache it's bit tedious because for third-party libraries CI only builds a matrix so there will be a number of similar jobs, unclear which one should update the cache or how to address concurrency... So even if there is only one user it's not a single job.

My main point was that only one person needs to run this backup upload process and everyone else will benefit from it (only download). For sure, there is a different scenario at every different home. If there is some validation process, the backup should be a final stage of it: after we know these sources can be used, we backup them, everyone (CI included) will use them from our internal server and we know that they won't be changed.

  1. How to differentiate whether to use cache or not? If package is (re)built later and something happens to be downloaded - the situation we want to avoid to not have inconsistencies with previously built packages...

If a package is rebuilt and all the sources/artifacts are not downloaded from the internal server (and the recipe hasn't changed), then it's been a problem on the validation+backup process done before: maybe they didn't review all the sources or they forgot to upload them. We might consider a feature in Conan to fail if sources are not found in the internal-server, it is something easy, but not core to this feature. If you feel like it would be useful, say it and we will add it to the backlog (probably implement it right away if approved).

Also download cache doesn't address (nicely) case if sources need to be patched.

There is nothing different in this scenario: if the patches are retrieved from the internet, they will be cached locally and in the internal-server the same way sources are. This feature is not making a backup of the sources after running the source() method, the backup will store the same tarball it is downloaded from the internet.

Another thoughts on the subject (based on how we use Conan):

  1. There is no conan-center remote defined, only private ones.
  2. It is usually developer who tries the package/recipe first locally: to both check that package builds, works and solves the task.

Nothing changes, Conan won't find the files in the internal-server and will get them from the internet. Probably it is what happens right now when some developer tries the recipe locally.

  1. Based on my previous experience third-party libraries may go through Legal review (or similar) before can be used in production. (There is a shortcut as a separate "staging" / "uncleared" repo to unblock development, but eventually before merging the update all TPS must be approved for use. Some packages may require more thorough scanning if different components or files are licensed differently). So there must be a convenient way for human to inspect final snapshot of sources.
  2. Based on the above introducing a recipe is a process to be automated and patching conandata.yml and uploading sources into a generic repository looked like a good solution.

As said before, this will backup the sources as they are downloaded from the internet, same tarball, same checksum. It doesn't require to modify the recipe or the conandata.yml at all. Your company can review the original tarball and approve it or not, but the if you need to modify it, you will upload it to some server and you will need to modify the recipe or the conandata.yml to point to the new location.

  1. JFrog CLI is another dependency which needs to be taken into account. So far it was convenient that Conan could be the only requirement which bootstraps everything else.
  2. What about non-Artifactory remotes? So far Conan was open in this regard...

When using the feature as a developer, Conan is doing a GET request to whatever is written in the storage.sources_backup config variable, it can be any URL, it doesn't need to be Artifactory. Any server will work and probably other protocols besides HTTP will work too (I haven't tried it, but FTP should work). From the Conan client, there is no locking to any technology.

In order to show the full feature, we needed to write something to upload files somewhere. We choose to write it as a JFrog CLI command and use Artifactory (for obvious reasons 😅 ). We can help with other scripts, but it should be easy enough to write something in python/bash/ps1 that iterates local files and does a PUT request to a server (filenames don't change).

  1. If files are uploaded have names like in cache (sha256 sums) how to track which package uses which file? This is required to cleanup orphaned files (e.g. discarded experiments) or promote packages into different repository (it will be an unpleasent surprise if recipe and binaries are moved to "release" repository but sources are left behind and got removed).

Right now (current implementation), during the upload, JFrog CLI command associates each file to the original URL (because Artifactory supports assigning properties to uploaded files), other servers may use other approaches depending on what they support.

So, IMHO, your request depends on the capabilities of the server. How to associate a file in the server to a recipe, or to the original URL will be very different if you are using Artifactory or an FTP (or a shared drive), Conan client only need access to these shared resource.

I will think about how to do it with Artifactory and JFrog CLI command, but it won't satisfy other scenarios... and it won't be needed to use these feature (it is just something convenient to run a purge).


Hope this adds some light to my previous comments, please, keep asking if something is not clear or you think some more details can be added around this feature. It is in a very early stage even if it works, still we need feedback to match expectations.

Thanks @ytimenkov !

@prince-chrismc
Copy link
Contributor Author

I think you meant to say...

- This is not a recovery procedure.
+ This is not an intended recovery procedure.

🤣 but that's how it's going to be used!

ux

Summary of the technical challenge

⚡ The legal/contractual obligation to provide support for an LTS is the driving force here, we need the Conan recipes (including patches and sources) in a state where we can edit them in the far future.

When consuming from CCI (or any third party remote), the binaries are precompiled and so developers do not need to download the sources (development workflow). Everything is through Artifactory so it's always cached on-site and re-using it is a breeze.

What happens when we need to add platform support? What if we need to support a different compiler? What happens if there's a bug?

  • What happens if a binary is missing?
  • What happens if we need to change a binary?

TL;DR

The linked PR fixes the first of the two bullet points above.

How can we guarantee our ability to rebuild package on demand?

The ideal scenario would be the outputs of the conan client would be used create new packages.

As a fairly new consumer, I hear you guys with the pre-conception

I assume that the recipes are already backed-up. They are source and they will live in git somewhere.

When I originally posted this issue that was not the case.


Let's take a step back and explain how we are handling this with non-Conan OSS.

Today what we have is cloned/mirrored repository, we have the source code, it's integrated into our legacy build procedure already, we just commit to that repository and push. This is a very simple and elegant solution, we have everything in our hands and it's very easy bug fix third party dependencies.

old

Now with the introduction of Conan, but most specifically CCI, we now only have the binaries. We can not easily edit machine code to apply a bug fix. Moreover, we need to create a new binary, upload that new revision, and specify that one to install and build against.

So what we really need is a backup of everything CCI used to create the original package.

How can we obtain a copy of everything needed to build a package offline?

Just purely with the Conan client. No external workflow.

What we currently have:

  • Recipe
  • conandata.yml

What's missing:

  • the exports
  • the source code!

↪️ I think the PR you pointed to and the download_cache get us exactly that, and even better Conan can re-use that information later when it's needed!


Patching sources: Conan workflow doesn't change, Conan will get the sources (from the internal-server or from the internet) and then will apply the patches (patches exported together with the recipe or patches retrieved from the internal-server or from the internet).

I am looking at this from a "higher" perspective, that all still applies to how Conan works, however for the enterprise to fix a bug in a dependency that is consumed from Conan. There's a separate intentional manual action that needs to be taken.

How do we add patches to an existing Conan recipe?

We are using zlib/1.2.11#1a67b713610ae745694aa4df1725451d and it applies 3 patches to date and it's in my ~/.conan/data folder. Let's say I discover a CVE for a recursion stack overflow bug and a patch is published but in production this is causes a DoS and it's seriously affecting us require an immediate fix.

Can I edit the files in my Conan cache

  1. add a new patch to the conandata.yml
  2. C:\Users\Chris.conan\data\zlib\1.2.11__\export>conan create conanfile.py 1.2.11@prod/critical

The cmake wrapper and patches are missing locally but they can be recovered via the download cache? I suspect the layout of the cache is not inline with the recipe which is why it fails.

  1. C:\Users\Chris\my-project> conan create conanfile.py 1.0.2@prod/critical --build=zlib

This again fails from missing exports

Can I clone and reverse lookup the commit from the reciepe revision?

Yes. but I regret doing it. That was a while ago... now with the activity we have it's near impossible.

Than I can simply do my C:\Users\Chris\cci\recipes\zlib\1.2.11>conan create conanfile.py 1.2.11@prod/critical and even better with the download cache I have the sources locally and every works flawlessly.

Can I rebuild a package without needing a copy of CCI?

To date. no. 🤷‍♂️

Separate issue? probably.

@jgsogo
Copy link
Contributor

jgsogo commented Dec 24, 2020

I read many times ago about the desire lines and I love them. We can build a new path or we can move the one we were planning to build. 😃

I get two takeaways from your comment:

  • It would be nice if we can go from a recipe-revision to the sources. This is something I'm thinking about in the context of ConanCenter, I want to get a recipe from ConanCenter to my working directory in order to apply some extra patch or modify the recipe a little bit and create/update new packages to your internal server... I can imagine scenarios where this is preferrable over iterating the latest commits in the repository looking for the one that matches a specific recipe-revision.
    My plan is to start adding another property to conan packages in Artifactory, the scm commit, so we can go from the recipe-revision to the repository... of course, we need to think about how to publish this information (with Conan or some external tool, or the webpage), so users can fork recipes this way. It will take time, but the seed is there.

  • (Given the previous point, or a recipe you own with all its exports/exports-sources). In the future, you will need to apply some extra patch to an existing recipe whose sources have been cached using the mechanism described in this PR. IMO it should work out of the box, you need to create the patch and modify the recipe so it takes into account the new patch, but when you run conan create Conan will find the existing files in your internal-server-cache instead of retrieving them from internet.
    Files in this internal-server-cache are not associated with any recipe-revision or reference, they are the same file downloaded from internet and it is associated only with that URL. As long as the recipe tries to retrieve the file from the same URL, the cache will work and the file will be fetched from the internal-server-cache.
    Your modified recipe will get the original sources from the cache and it will apply existing patches and the new one, and you will generate your new packages.

@ytimenkov
Copy link
Contributor

@jgsogo I think we did some implicit assumptions: it seems you suggest using cci as remote while I thought to continue using own repo for recipes (we have one similar to cci github) so main goal was to avoid modifying the recipe when adding it to that repo (which we do now). Therefore patching conandata.yml at this step was but a problem (now besides replacing recipes source() we also find and download sources and placing them so exports_sources can find tarball).

💡 I also didn't think about sha sums of source tarballs which eliminate tampering.

Now I'd like to build on your idea of using conan-center remote.

  • If uploading download cache (🙄) is combined with conan upload (for recipe) makes perfect sense: before it grabbed whatever was in exports_sources now it will do equivalent thing.
    💡 My major concern was that download cache contains a lot of files on developer machine, but if only relevant pieces are taken it's fine
    💡 Also if appropriate json produced as an output to could solve open tooling problem. E.g. if it has a source URL and a destination (or hashsum) metadata could be assigned by external tooling.
    💡It was not clear to me how cache will be uploaded and organized, but if it's a simple (or simply) PUT request to the configured URL with sha, this can definitely work with any server. Especially if augmented with hooks.
  • To cover patch scenario a something like conan import (doing the opposite to conan export, probably with reference revision) can be handy: this should create a working copy suitable for further conan export.
  • Also conan source command could be used to recreate "source tree to be built" if needed for inspection / scanning. (Again, it should probably accept reference with revision).

I think the major benefit of using cci remote directly is that a separate git repo simply not needed (well, maybe as you mentioned recipe can contain a reference to it's original location and revision, but it doesn't play a significant role).

So then my workflow would look like:

  1. (re)enable CCI remote
  2. add reference to a new package / update
  3. `conan install --build missing|pkg"
  4. Try it out
  5. conan upload -r my-remote --upload-sources
  6. trigger CI to build all relevant binary packages for that reference/revision.
  7. disable CCI remote

when promoting (by a tool):

  1. copy recipe and all binary packages to "released" conan repo.
  2. copy sources (by shas) to "released" generic repo (I suppose they should be somehow a part of manifest or conandata.yml).

If a patching needed...

  1. conan import <ref>#rev
  2. conan source orig
  3. conan source patched
  4. hack in patched/
  5. diff -ur orig patched > 999-my.patch
  6. (probably add patch to conandata.yml
  7. conan export . ref (and then as in original workflow).

🎅🎄 to everyone.

@jgsogo
Copy link
Contributor

jgsogo commented Dec 24, 2020

More comments

  • If I want to promote packages from one repository to another, how to carry with me those sources that are cached? We thought that the internal-server-cache would be one single big bucket, of course we asked ourselves how to purge that cache from files no longer used. We don't have an answer yet (we don't have the feature, neither the problem), but your suggestion is something to take into consideration: maybe it is better to promote those sources together with the reference/package or think something around it.

  • The same source file or tarball can be consumed by different references. For example, some recipe might be retrieving the same LICENSE file for all the versions (very close to this recipe), or different libraries might download the same tarball (who knows).

    While uploading we can write in the internal-server-cache, we can assign properties... consumers with read-only permission will fetch files from the cache, but they won't be able to write that a new reference is using that file that is already in the cache.

  • Why decoupling the upload process from Conan client? There are several reasons:

    • we think that it is expected that only one privileged user uploads these artifacts to the cache after they are validated. It is one extra step, but only once and only for one user whose work is very related to it.
    • servers: when implementing the upload different servers could use different languages, we cannot maintain all of them and we didn't want to use only Artifactory for it. Decoupling it from the Conan client looks like the best way to avoid locking.
    • credentials: uploading requires new credentials, different protocols might require different credentials (it is not the same Conan remote server). Here there are two problems: storing them in a safe way and the ability to model the storage and consumption for those different authentication processes.

    I understand the value of conan upload -r my-remote --upload-sources, in fact it was the first implementation, but we realize those problems and decided to decouple it, at least for the POC. We always can implement it in the future.

  • That conan import command doing the opposite from conan export. It is something we have in mind, the problem is that right now the export (or the self.copy) commands are not reversible, you cannot know where the files come from and you cannot restore an existing working directory. This is a feature request with some challenges (it requires a file with the mapping)

Happy Christmas! 🎄

@memsharded
Copy link
Member

Implemented in #13461 for 2.0.3, closing, feedback welcome!

@samkearney
Copy link

Hi, sorry for the late notification, but I am evaluating Conan for enterprise usage and I've been thinking a lot about reproducibility, which brought me to this family of issues. I'm thinking about the approach I will recommend for introducing Conan to our enterprise workflow.

By the way, Conan is an awesome tool and very powerful, kudos to the team.

In general it seems like the approach of using export() / export_sources() in method form is suitable for full source reproducibility. The approach I was thinking of to go with to ensure reproducibility is this:

  • For first-party packages, use a recipe-in-source approach and export the sources directly in the recipe.
  • For third-party packages, maintain our own index of recipes and an Artifactory server, similar to what is recommended in the Devops guide
  • Use code review or automated tooling on this index repository to make sure that all recipes are using export_sources() in method form to cache sources with the recipe, and none of them are just creating references to sources on external servers

I'm curious in your professional opinion if this sounds like a valid approach.

I see the method implemented using the download cache, but it is not yet documented and I can't quite wrap my mind around how it works from the discussion here. Also, on this page we have:

Only documented features in https://docs.conan.io/ are considered part of the public interface of Conan. Private implementation details, and everything not included in the documentation is subject to change.

Which indicates to me that I probably shouldn't use this download cache method yet.

In general I'm happy that the Devops guide page exists, and I'm wondering if it would be possible to extend that page with some discussion of this particular issue and the possible approaches to solving it?

@memsharded
Copy link
Member

Hi @samkearney

By the way, Conan is an awesome tool and very powerful, kudos to the team.

Thanks for your kind words!

We are about to launch the "backup sources" feature, to automatically store in your own server all the sources downloaded from the internet, I think this is the missing piece you might be looking for. It is already running in ConanCenter in production. I think it will be launched in the following weeks.

Maybe the best would be to create a new ticket when this happens, and discuss the details there? Thanks!

@samkearney
Copy link

@memsharded Thanks for the quick reply! Sounds good, so I'll look for this feature in the Conan release notes and will open a new issue if I still need clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants