Feature request: reuse extracted sources #4056

Armael · 2020-01-09T16:08:40Z

If I do "opam install X1...Xn", opam starts by downloading sources for packages X1..Xn, or fetching them from the download cache. Then, each archive (Xi.tar.gz) is extracted into <opamroot>/<switch>/.opam-switch/sources/<pkg.ver>.

Even if all the archives are in the download cache, extracting all the archives takes a lot of time if there are many packages.

If one kills the "opam install" invocation (with ^C), either during the "archive unpacking phase" or during the build, then restarting it will not reuse the already-extracted sources, and will instead start by unpacking again all the archives from scratch.

To get more incrementality, one (brittle) solution would be to use whatever files there are in a sources directory if it is present, instead of unpacking the archive. However, if opam is killed while unpacking an archive, then the corresponding sources directory will be in a corrupted state (not containing all the files most likely), and will corrupt the next run of opam.

A more robust solution would be to memorize a mapping <directory where sources are extracted> -> <hash of the archive they come from>. A new entry would be added to this mapping after an archive has been successfully extracted (and only after it has been fully extracted). Then, after having downloaded an archive, and before extracting it, one would be able to check whether the source directory already corresponds to that archive -- in which case one could simply avoid extracting the archive again.

Does that sound like a reasonable idea in principle, and would this "more robust" solution work?
If yes, where would this mapping be stored?

The text was updated successfully, but these errors were encountered:

rjbou · 2020-01-27T15:36:56Z

related #3741

dra27 · 2021-07-09T07:58:04Z

This feels like a complicated solution to a (relatively) small problem. If I understand correctly, the core idea here is that if opam was aborted during an installation, then there should be a series of extracted source directories which have been extracted, but not yet built and so could be reused by a subsequent install command.

How about this scheme:

Package archive is extracted to .opam-switch/sources/<archive-hash>.tmp
At the end of the extraction, .opam-switch/sources/<archive-hash>.tmp is renamed to .opam-sources/<archive-hash>
Just before building, .opam-switch/sources/<archive-hash> is renamed to .opam-sources/<pkg.ver>, as before

The idea is that step 2 is atomic, which means before extracting the tarball (which opam knows the hash of) opam can check to see if .opam-switch/sources/<archive-hash> exists and skip extraction. .opam-switch/sources/<archive-hash>.tmp and .opam-switch/sources/<pkg.ver> will always be in an unknown state and so would be erased by any subsequent install invocation. In this scheme, opam clean -s would need updating to clean both <archive-hash> and <archive-hash>.tmp directories.

I think this scheme achieves the same thing but with two benefits: it's simpler (no hashing, no stored configuration) but it's also always used (i.e. it skips steps if there's an existing extracted directly rather than having to activate extra steps to check whether it can be used)

Armael · 2021-07-09T08:56:23Z

This sounds much better than what I was proposing indeed! I would be very happy with that solution.

rjbou added the KIND: FEATURE WISH label May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: reuse extracted sources #4056

Feature request: reuse extracted sources #4056

Armael commented Jan 9, 2020 •

edited

Loading

rjbou commented Jan 27, 2020

dra27 commented Jul 9, 2021

Armael commented Jul 9, 2021

Feature request: reuse extracted sources #4056

Feature request: reuse extracted sources #4056

Comments

Armael commented Jan 9, 2020 • edited Loading

rjbou commented Jan 27, 2020

dra27 commented Jul 9, 2021

Armael commented Jul 9, 2021

Armael commented Jan 9, 2020 •

edited

Loading