Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opam.ocaml.org repository takes 10+ minutes to download on Windows #5741

Open
jonahbeckford opened this issue Nov 24, 2023 · 8 comments
Open

Comments

@jonahbeckford
Copy link
Contributor

jonahbeckford commented Nov 24, 2023

This is a long-standing issue. But I realize I haven't communicated it so perhaps it could be tracked ...

The following takes a little over 10 minutes on GitLab SaaS Windows hardware (almost same timing on my desktop):

opam repository set-url default https://opam.ocaml.org --yes --all

Ditto for opam init.


Job: https://gitlab.com/dkml/distributions/dkml/-/jobs/5608126061 (search for repository set-url default)

@kit-ty-kate
Copy link
Member

This is indeed a long known standing issue. The problem appears when the cache needs to be rebuilt (when i remove it manually locally, opam repo list takes 4m21s on my local Windows machine), which involves reading the content of all the files in the opam repositories you have set, which, for some reason, is very fast on Unixes but takes a really long time on Windows.

What is the way to read a lot (~ 30_000) of files in quick successions on Windows?

@jonahbeckford
Copy link
Contributor Author

I personally don't know of a way to read a lot of files; I am a Windows user who is not very familiar with the nitty-gritties of the (modern) Windows APIs.

However, given that in a former life one of things I had to do was design distributed databases, I would not rely on any file system to read 30,000 files in quick succession within a user-facing experience. I'd:

  • have the caching be a background process
  • have the data pre-indexed or at least have a snapshot + delta where the snapshot was pre-indexed (ex. apt package manager https://wiki.debian.org/DebianRepository/Format)
  • use a space-efficient on-disk data structure (ex. git packfiles)

Or we could avoid all of the entire filesystem-as-a-database problem and use a real database (ex. sqlite3).

Understand that none of those are easy solutions, which is probably why this is a long-standing issue.

@Alizter
Copy link

Alizter commented Dec 12, 2023

For the record, in Dune package management we are using git as a revision store for opam repositories and it seems to be working well on windows. That is, it is reasonably performant.

@kit-ty-kate
Copy link
Member

@Alizter does it load/read all the packages in opam-repository once fetched? opam needs to preload every packages, i don’t think this is the case for dune so I’m not surprised the issue is not encountered there.

@Alizter
Copy link

Alizter commented Dec 12, 2023

@kit-ty-kate We don't load all the files. We have a revision store mechanism that 0install uses to fetch opam files during solving. This lets us use a single repository to store all opam repositories at all revisions as needed. You're correct that we have no need to fetch all the files in an opam repository at once, only the ones the solver requests.

@kit-ty-kate
Copy link
Member

Update on this issue: I had initially hoped to have a fix before the release of opam 2.2.0 but the required change (#5648) is big enough that i am currently restarting in my 3rd fresh branch for the fix since January. #5824 would also help this issue but seems equally – if even more – hard to fix.

I'm still working on it in the background but it seems increasingly unlikely to make the cut, although it will most certainly be in the next release after.

@kit-ty-kate
Copy link
Member

I tried one last experiment in #5966 which worked at reducing the time opam init/update takes in half on Windows (which helps a lot but still isn't as fast as Unix), but given the amount of untested change that would have to go through I don't think it's very wise to merge at the moment.

@NicoNekoru
Copy link

Not sure if this is a related issue or expected behavior, but all opam installs take around 20 minutes at best and up to an hour at worst for me on Windows on 2.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants