-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid downloading large package metadata #3431
Comments
Sadly, no. It's only implemented in |
Is this completed? Or it's a "wontfix" kinda thing? |
So at the moment it's not implemented in |
For some context, conda/ceps#75 formalizes sharded repodata, an improved indexing solution for It has been accepted, but it needs to be implemented by forges (including conda-forge), and by packages managers (such as conda and mamba/micromamba). |
conda/conda#14060 is the main requirement which is being implemented by conda/conda-index#161. |
I would prioritize it, since UX-wise this is what cause latency and fatigue for users. |
Good to mention that the conda-forge and bioconda mirrors (but any channel really) at https://prefix.dev/ fully support shared repodata. So I think an implementation is not blocked on server support. |
Yes, mamba can implement the support of the sharded repodata now without waiting for conda-lock to implement it. |
I am not sure about |
Im not entirely sure how you "feed" the data to libsolv but we also have this working in rattler with our libsolv backend. We preprocess the input specs and recursively fetch the packages for which we encounter depedencies. This is still an eager process but most of the time fetches only a fraction of the total number of packages from the channel. It does include all the records needed for a solve. As an example, given just the spec "python", we start by fetching all records for python. We then iterate over all dependencies of all python records and see which package names we encounter. For instance You can also implement some smarts tricks here to shrink the total search space. Rattler also dispatch requests for different packages in parallel (make sure to use http/2) and we can aggressively cache the individual shards. This is what makes sharded repodata so fast. You could also create Let me know if you want more details, I can point you to where this happens in the code. |
As of now in mamba, entire channels' |
I see. Yeah that's a bit of a shortcoming as that won't scale very well. It means reading all the data ever created even if that is not needed at all. I also assume that will use much more memory than needed.. |
In an IRL conversation with @wolfv and others he mentioned the infra for avoiding the download of the massive package metadata when running
micromamba
commands is there, but I can't find a way to use it. Is there a way already to do that with amicromamba
command?To be specific, this is what I want to avoid:
The text was updated successfully, but these errors were encountered: