-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vcpkg] RFC: Binarycaching #11204
[vcpkg] RFC: Binarycaching #11204
Conversation
|
||
However, we notably do not currently track the compiler used. This is critical for all cross-machine scenarios, as the environment is likely to change incompatibly from machine to machine. We propose hashing the compiler that will used by CMake. This can be accomplished either by reimplementing the logic of CMake or running some partial project and extracting the results. For performance reasons, we will prefer first using heuristics to approximate the CMake logic with accompanying documentation for users that fall outside those bounds. | ||
|
||
Another aspect of the environment we don't currently track is the CRT version on Linux systems. Currently, we believe this will not cause as many problems in most practices (thus not suitable for an MVP), since the compiler will (generally) link against the system CRT and should sufficiently reflect any differences. This can also be easily worked around by the user with documentation – the toolchain file can simply have a comment such as "# this uses muslc", which will cause it to hash differently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that stuff like the glibc
version may differ on different distributions unrelated to the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure, the hash is produced from the toolchain file as a plain file, without any structuring, correct? To reuse the same binaries you'd need to have exactly the same toolchain files?
For example, if you want to enable a different library feature - hashes won't clash, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, the whole file is hashed. If there's any difference you won't end up with the same hash for the package.
Thanks a lot for working on this. This is probably out of scope of the initial MVP, but have you consider the possibility of optionally download a binary with a different hash then what it would be build locally? I am thinking of cases like #10522 , in which a port does not compile under a given Visual Studio version for a Visual Studio regression, and then a binary compiled with a different version can be downloaded instead. |
One core concept of our binarycaching approach is that this is only intended as an accelerant for what could be done locally; it's not intended to enable developers to (for example) not have a compiler toolchain at all. In the case you mentioned, the workaround would be to have both toolchains available locally and use something like
This does have some problems (absolute paths to VS instances aren't portable), but it illustrates the overall direction where we would look for a solution. |
After seeing what the space consumption for our CI looks like, we also probably need to think about an eviction policy, like 'delete items from the cache which are not current hashes and have not been accessed in over $timeframe. |
Will be there a way to remove older versions of the packages from the binary archive? If I get it right, for now, if a port got updated (or a new CMake version etc...), a new binary archive will be built for it, but the older archives still remain in the archive directory, am I right? |
In this initial implementation, we are not building any specific cache invalidation functionality into vcpkg. Depending on what backend you use, there might be an option on the provider's side to perform access-time based garbage collection (for example, Azure DevOps Artifacts has that for NuGet feeds). We think that for most users, a once-in-a-while delete of |
Note change to XDG directory structure. Replace 'upload' config keyword with more flexible read/write/readwrite keywords.
Changes to the binary caching spec made as comments over at microsoft#11204 (review)
Changes to the binary caching spec made as comments over at #11204 (review)
--write-nuget-packages-config
Changes to the binary caching spec made as comments over at microsoft/vcpkg#11204 (review)
We are currently working on enabling caching and reuse of binaries to accelerate CI and per-project vcpkg instances; this our current working draft of the specification, ready for public commentary!
A limited form of this spec is already implemented and available in the tool today, via either
--binarycaching
orset VCPKG_FEATURE_FLAGS=binarycaching
which will default to using$vcpkg_root/archives
.We would love feedback about whether this feature is useful to you or any additional scenarios you'd like to see covered!