Per-project EnvForest, or just a global one? #13

njsmith · 2023-01-29T09:26:58Z

Should we have one EnvForest per project, or one per user?

Benefits of one-per-project:

Deleting the project directory means everything is gone
...copying the project directory means you copy the environment? not sure if that matters.
Potentially easier to detect stale/unneeded cache entries and clean them up (e.g. anything that's no longer mentioned in the project's lockfile can go). With a global cache we're stuck with e.g. time-based reclamation (though... I suppose we could keep a best-effort record of where all the projects we're aware of are on disk, and have GC check all the lock files we can. though maybe if you have a project that you aren't working with anymore, you want its storage to be reclaimed. And for that matter, people switch branches all the time, and it would be annoying if switching to a different branch and then back again meant you had to refill your cache because some packages were briefly unlocked -- so I think we'd want some kind of time-based component anyway?)

Benefits of one-per-user:

Only one copy of any given package, no matter how many projects use it
- Corollary: spinning up environments for a new project may be delightfully quick, if we already have common packages in cache

Interesting side note: if we have a global per-user EnvForest, then it may not make sense to cache artifacts (the ones we have in hash_cache currently), because they're redundant: we only need the artifact once while we're installing it into the EnvForest. Maybe this even applies to locally built wheels? Though, I guess sdists are a bit more complicated... we might want to use an sdist once to get metadata and then a second time to build a wheel (but if those generally happen within a single invocation maybe it's fine?) And in some cases we might want to use an sdist several times to build for several architectures, but that might be rare enough that no-one will really care.

Prior art

I believe conda effectively shares data globally per-user? And has something of a reputation for ending up with huge caches that you have to manually conda --clean to reclaim?

IIRC there have been contentious debates about this for pipenv/poetry/etc. We should figure out why people were invested in this on each side. (Does anyone reading this happen to know?)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-project EnvForest, or just a global one? #13

Per-project EnvForest, or just a global one? #13

njsmith commented Jan 29, 2023

Per-project EnvForest, or just a global one? #13

Per-project EnvForest, or just a global one? #13

Comments

njsmith commented Jan 29, 2023

Prior art