You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I do "opam install X1...Xn", opam starts by downloading sources for packages X1..Xn, or fetching them from the download cache. Then, each archive (Xi.tar.gz) is extracted into <opamroot>/<switch>/.opam-switch/sources/<pkg.ver>.
Even if all the archives are in the download cache, extracting all the archives takes a lot of time if there are many packages.
If one kills the "opam install" invocation (with ^C), either during the "archive unpacking phase" or during the build, then restarting it will not reuse the already-extracted sources, and will instead start by unpacking again all the archives from scratch.
To get more incrementality, one (brittle) solution would be to use whatever files there are in a sources directory if it is present, instead of unpacking the archive. However, if opam is killed while unpacking an archive, then the corresponding sources directory will be in a corrupted state (not containing all the files most likely), and will corrupt the next run of opam.
A more robust solution would be to memorize a mapping <directory where sources are extracted> -> <hash of the archive they come from>. A new entry would be added to this mapping after an archive has been successfully extracted (and only after it has been fully extracted). Then, after having downloaded an archive, and before extracting it, one would be able to check whether the source directory already corresponds to that archive -- in which case one could simply avoid extracting the archive again.
Does that sound like a reasonable idea in principle, and would this "more robust" solution work?
If yes, where would this mapping be stored?
The text was updated successfully, but these errors were encountered:
This feels like a complicated solution to a (relatively) small problem. If I understand correctly, the core idea here is that if opam was aborted during an installation, then there should be a series of extracted source directories which have been extracted, but not yet built and so could be reused by a subsequent install command.
How about this scheme:
Package archive is extracted to .opam-switch/sources/<archive-hash>.tmp
At the end of the extraction, .opam-switch/sources/<archive-hash>.tmp is renamed to .opam-sources/<archive-hash>
Just before building, .opam-switch/sources/<archive-hash> is renamed to .opam-sources/<pkg.ver>, as before
The idea is that step 2 is atomic, which means before extracting the tarball (which opam knows the hash of) opam can check to see if .opam-switch/sources/<archive-hash> exists and skip extraction. .opam-switch/sources/<archive-hash>.tmp and .opam-switch/sources/<pkg.ver> will always be in an unknown state and so would be erased by any subsequent install invocation. In this scheme, opam clean -s would need updating to clean both <archive-hash> and <archive-hash>.tmp directories.
I think this scheme achieves the same thing but with two benefits: it's simpler (no hashing, no stored configuration) but it's also always used (i.e. it skips steps if there's an existing extracted directly rather than having to activate extra steps to check whether it can be used)
If I do "opam install X1...Xn", opam starts by downloading sources for packages X1..Xn, or fetching them from the download cache. Then, each archive (Xi.tar.gz) is extracted into
<opamroot>/<switch>/.opam-switch/sources/<pkg.ver>
.Even if all the archives are in the download cache, extracting all the archives takes a lot of time if there are many packages.
If one kills the "opam install" invocation (with ^C), either during the "archive unpacking phase" or during the build, then restarting it will not reuse the already-extracted sources, and will instead start by unpacking again all the archives from scratch.
To get more incrementality, one (brittle) solution would be to use whatever files there are in a sources directory if it is present, instead of unpacking the archive. However, if opam is killed while unpacking an archive, then the corresponding sources directory will be in a corrupted state (not containing all the files most likely), and will corrupt the next run of opam.
A more robust solution would be to memorize a mapping
<directory where sources are extracted> -> <hash of the archive they come from>
. A new entry would be added to this mapping after an archive has been successfully extracted (and only after it has been fully extracted). Then, after having downloaded an archive, and before extracting it, one would be able to check whether the source directory already corresponds to that archive -- in which case one could simply avoid extracting the archive again.Does that sound like a reasonable idea in principle, and would this "more robust" solution work?
If yes, where would this mapping be stored?
The text was updated successfully, but these errors were encountered: