Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache: parallel tar and upload/unpack and download #357

Open
bertptrs opened this issue Aug 2, 2022 · 2 comments
Open

Cache: parallel tar and upload/unpack and download #357

bertptrs opened this issue Aug 2, 2022 · 2 comments

Comments

@bertptrs
Copy link

bertptrs commented Aug 2, 2022

Cache operations can take up a long time. For our builds, the do in fact take up the majority of the time. Right now that works roughly as follows:

  • On restore
    • A compressed archive is downloaded to a location
    • After that is complete, the tarball is unpacked into position
  • On store
    • A compressed archive is created and written to a temporary location
    • That tempfile is then uploaded to the cache

For example, this is the code that actually does the restore:

func downloadAndUnpackKey(storage storage.Storage, metricsManager metrics.MetricsManager, key string) {
downloadStart := time.Now()
fmt.Printf("Downloading key '%s'...\n", key)
compressed, err := storage.Restore(key)
utils.Check(err)
downloadDuration := time.Since(downloadStart)
info, _ := os.Stat(compressed.Name())
fmt.Printf("Download complete. Duration: %v. Size: %v bytes.\n", downloadDuration.String(), files.HumanReadableSize(info.Size()))
publishMetrics(metricsManager, info, downloadDuration)
unpackStart := time.Now()
fmt.Printf("Unpacking '%s'...\n", compressed.Name())
restorationPath, err := files.Unpack(metricsManager, compressed.Name())
utils.Check(err)
unpackDuration := time.Since(unpackStart)
fmt.Printf("Unpack complete. Duration: %v.\n", unpackDuration)
fmt.Printf("Restored: %s.\n", restorationPath)
err = os.Remove(compressed.Name())
if err != nil {
fmt.Printf("Error removing %s: %v", compressed.Name(), err)
}
}

The archives are a tar file, which supports streaming (de)compression. Thus, it should be possible to interleave the download and unpacking (as well as packing and upload) resulting in less overall latency.

In the bash version this would've been as simple as piping the sftp output to tar -x and vice-versa for uploads; in Go this might be slightly more tricky but in general possible and an easy win for faster builds.

@VeljkoMaksimovic
Copy link
Contributor

Hi @bertptrs. Thanks for reaching out, and sorry for the late reply.
We are looking into possible solutions for parallel downloading and unpacking. We cant just execute Unix commands such as stfp and tar because cache cli needs to work on windows machines as well, but we are exploring viable alternatives within the go language. As soon as this is implemented I will ping you.

@lucaspin
Copy link
Contributor

lucaspin commented Jan 5, 2023

@bertptrs before parallelizing the download/decompression, we have to remove the need for shelling out to host CLIs. This has been initially implemented in #379.

By default, the cache CLI will still shell out (we will slowly roll out this change to every organization). But you can use the SEMAPHORE_CACHE_ARCHIVE_METHOD environment variable to use the new archiving methods if it hasn't been automatically set for your organization yet.

Once that has been enabled for every organization without issues, we'll look into parallelizing the compression/decompression with the upload/download.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants