Avoid downloading large package metadata #3431

adrinjalali · 2024-09-04T10:44:11Z

In an IRL conversation with @wolfv and others he mentioned the infra for avoiding the download of the massive package metadata when running micromamba commands is there, but I can't find a way to use it. Is there a way already to do that with a micromamba command?

To be specific, this is what I want to avoid:

$ mmamba install jupyter
conda-forge/noarch                                  16.3MB @   7.9MB/s  2.1s
conda-forge/linux-64                                37.6MB @   9.8MB/s  3.9s

...

The text was updated successfully, but these errors were encountered:

wolfv · 2024-09-04T11:41:26Z

Sadly, no. It's only implemented in rattler / pixi for now. There you can point to https://fast.prefix.dev/conda-forge to get a speedy version.

adrinjalali · 2024-09-19T09:33:47Z

Is this completed? Or it's a "wontfix" kinda thing?

Hind-M · 2024-09-19T09:39:34Z

So at the moment it's not implemented in mamba/micromamba.
It could be at some point, but not in the short/medium term unfortunately.
We can also mark it as a feature request.

jjerphan · 2024-10-08T13:15:51Z

For some context, conda/ceps#75 formalizes sharded repodata, an improved indexing solution for conda channels.

It has been accepted, but it needs to be implemented by forges (including conda-forge), and by packages managers (such as conda and mamba/micromamba).

jjerphan · 2024-10-09T15:39:05Z

conda/conda#14060 is the main requirement which is being implemented by conda/conda-index#161.

jjerphan · 2024-12-10T16:38:40Z

I would prioritize it, since UX-wise this is what cause latency and fatigue for users.

baszalmstra · 2025-01-15T17:43:44Z

Good to mention that the conda-forge and bioconda mirrors (but any channel really) at https://prefix.dev/ fully support shared repodata. So I think an implementation is not blocked on server support.

jjerphan · 2025-01-15T17:48:10Z

Yes, mamba can implement the support of the sharded repodata now without waiting for conda-lock to implement it.

jjerphan · 2025-01-20T17:24:17Z

I am not sure about libsolv being usable with the sharded repodata since libsolv has to load all the metadata of packages upfront AFAIK.

baszalmstra · 2025-01-20T18:25:42Z

Im not entirely sure how you "feed" the data to libsolv but we also have this working in rattler with our libsolv backend.

We preprocess the input specs and recursively fetch the packages for which we encounter depedencies. This is still an eager process but most of the time fetches only a fraction of the total number of packages from the channel. It does include all the records needed for a solve.

As an example, given just the spec "python", we start by fetching all records for python. We then iterate over all dependencies of all python records and see which package names we encounter. For instance libsqlite. We fetch all records of libsqlite and do the same there. This will crawl the entire search space that libsolv would needs to solve the input spec.

You can also implement some smarts tricks here to shrink the total search space. Rattler also dispatch requests for different packages in parallel (make sure to use http/2) and we can aggressively cache the individual shards. This is what makes sharded repodata so fast.

You could also create .solv files for the individual shards for possibly even more speed! (We didnt implement this)

Let me know if you want more details, I can point you to where this happens in the code.

jjerphan · 2025-01-21T11:50:46Z

As of now in mamba, entire channels' repodata.json are loaded in libsolv's DataBase, then a problem is constructed and solved as far as I remember.

baszalmstra · 2025-01-24T15:34:12Z

I see. Yeah that's a bit of a shortcoming as that won't scale very well. It means reading all the data ever created even if that is not needed at all. I also assume that will use much more memory than needed..

Hind-M added the type::question Further information is requested label Sep 19, 2024

Hind-M closed this as completed Sep 19, 2024

Hind-M reopened this Sep 19, 2024

Hind-M added the type::feature-request New feature proposal label Sep 19, 2024

jjerphan mentioned this issue Dec 9, 2024

Mamba keep downloading the repodata for every install #3257

Closed

3 tasks

AnonymousCoward128746 mentioned this issue Jan 14, 2025

Conda should embrace best-practices regarding metadata synchronization conda/conda#14491

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid downloading large package metadata #3431

Avoid downloading large package metadata #3431

adrinjalali commented Sep 4, 2024

wolfv commented Sep 4, 2024

adrinjalali commented Sep 19, 2024

Hind-M commented Sep 19, 2024

jjerphan commented Oct 8, 2024

jjerphan commented Oct 9, 2024 •

edited

Loading

jjerphan commented Dec 10, 2024

baszalmstra commented Jan 15, 2025

jjerphan commented Jan 15, 2025

jjerphan commented Jan 20, 2025

baszalmstra commented Jan 20, 2025

jjerphan commented Jan 21, 2025

baszalmstra commented Jan 24, 2025

Avoid downloading large package metadata #3431

Avoid downloading large package metadata #3431

Comments

adrinjalali commented Sep 4, 2024

wolfv commented Sep 4, 2024

adrinjalali commented Sep 19, 2024

Hind-M commented Sep 19, 2024

jjerphan commented Oct 8, 2024

jjerphan commented Oct 9, 2024 • edited Loading

jjerphan commented Dec 10, 2024

baszalmstra commented Jan 15, 2025

jjerphan commented Jan 15, 2025

jjerphan commented Jan 20, 2025

baszalmstra commented Jan 20, 2025

jjerphan commented Jan 21, 2025

baszalmstra commented Jan 24, 2025

jjerphan commented Oct 9, 2024 •

edited

Loading