Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-project EnvForest, or just a global one? #13

Open
njsmith opened this issue Jan 29, 2023 · 0 comments
Open

Per-project EnvForest, or just a global one? #13

njsmith opened this issue Jan 29, 2023 · 0 comments

Comments

@njsmith
Copy link
Owner

njsmith commented Jan 29, 2023

Should we have one EnvForest per project, or one per user?

Benefits of one-per-project:

  • Deleting the project directory means everything is gone
  • ...copying the project directory means you copy the environment? not sure if that matters.
  • Potentially easier to detect stale/unneeded cache entries and clean them up (e.g. anything that's no longer mentioned in the project's lockfile can go). With a global cache we're stuck with e.g. time-based reclamation (though... I suppose we could keep a best-effort record of where all the projects we're aware of are on disk, and have GC check all the lock files we can. though maybe if you have a project that you aren't working with anymore, you want its storage to be reclaimed. And for that matter, people switch branches all the time, and it would be annoying if switching to a different branch and then back again meant you had to refill your cache because some packages were briefly unlocked -- so I think we'd want some kind of time-based component anyway?)

Benefits of one-per-user:

  • Only one copy of any given package, no matter how many projects use it
    • Corollary: spinning up environments for a new project may be delightfully quick, if we already have common packages in cache

Interesting side note: if we have a global per-user EnvForest, then it may not make sense to cache artifacts (the ones we have in hash_cache currently), because they're redundant: we only need the artifact once while we're installing it into the EnvForest. Maybe this even applies to locally built wheels? Though, I guess sdists are a bit more complicated... we might want to use an sdist once to get metadata and then a second time to build a wheel (but if those generally happen within a single invocation maybe it's fine?) And in some cases we might want to use an sdist several times to build for several architectures, but that might be rare enough that no-one will really care.

Prior art

I believe conda effectively shares data globally per-user? And has something of a reputation for ending up with huge caches that you have to manually conda --clean to reclaim?

IIRC there have been contentious debates about this for pipenv/poetry/etc. We should figure out why people were invested in this on each side. (Does anyone reading this happen to know?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant