-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: file cache: max size #945
base: main
Are you sure you want to change the base?
Feature: file cache: max size #945
Conversation
# Conflicts: # include/vcpkg/base/messages.h # src/vcpkg/base/messages.cpp
# Conflicts: # src/vcpkg/binarycaching.cpp
# Conflicts: # src/vcpkg-test/metrics.cpp # src/vcpkg/binarycaching.cpp
# Conflicts: # include/vcpkg/base/files.h # src/vcpkg-test/files.cpp # src/vcpkg/base/files.cpp # src/vcpkg/binarycaching.cpp
This is not 100% ready for merge, but ready for review. When I get a design approval I will handle all edge case errors and do localization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits.
Co-authored-by: Thomas1664 <46387399+Thomas1664@users.noreply.github.com>
# Conflicts: # include/vcpkg/base/message-data.inc.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0. Every instance has a unique "sync" file in "<root>/sync/<random_number>".
1. Append a line to sync file of the format "<name_of_object>;<file_size>\n"
2. Read the new entries from other sync files
3. Check if files must be deleted to respect the limits
I do not believe this is correct. There is nothing to stop 2 instances from concurrently reading all the sync files, deciding that something needs to be deleted, writing their intent to delete to their own sync files, and now there is no way to resolve the ambiguity on who 'won' that race.
In particular, this is assuming that writes or appends within a file will be atomic, which is a feature most file systems do not provide and several network file systems extremely do not provide. The only operation which we can assume is atomic is the creation or removal of a file system entry; as in rename
.
I think you need something like a read/write oplock here, where instances trying to remove entries from the cache are writers, and instances trying to do anything else are readers. Only one instance should be trying to delete out of the cache at a time, and if any instance is doing so, we need to make sure we don't delete a cache entry that other instances are potentially touching.
src/vcpkg/binarycaching.cpp
Outdated
cache.own_sync_file = get_own_sync_file(cache.sync_root_dir); | ||
if (cache.folder_settings.delete_policy != FolderSettings::DeletePolicy::None) | ||
{ | ||
std::unordered_map<std::string, uint64_t> file_sizes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that in general this works correctly, due to concurrent insertions into the cache. Cache entries are not meaningfully part of the cache until they have been renamed into place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a cache so that I don't have to call fs.file_size()
for every cache entry. If a file is not in the cache anymore, this cache size is not used.
src/vcpkg/binarycaching.cpp
Outdated
} | ||
} | ||
} | ||
|
||
size_t push_success(const BinaryPackageWriteInfo& request, MessageSink& msg_sink) override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think 'during push_success
' is the right time to be doing this. It probably should be a one time pass that looks at the cache(s) after all cache operations this particular vcpkg instance will do on this run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then the cache could be larger then its max size in the meantime?
The instances only write to the sync files what they want to add, not what they want to delete.
No this code does not assumes this. I explicitly handle this case ... I just realize that I wanted to implement this but haven't done this yet ... 🤦 🙈 Edit: Now implemented Start:
|
# Conflicts: # src/vcpkg/binarycaching.cpp
# Conflicts: # locales/messages.json
Has there any progress been made on this direction? it's becoming quite an effort to cleanup manually the vcpkg cache directory from time to time. Thank you in advance 🙏🏻 |
# Conflicts: # include/vcpkg/base/message-data.inc.h # locales/messages.json
Fixes microsoft/vcpkg#19452
If no settings are found a new settings file is created. You can set the following properties:
vcpkg-tool/docs/file-cache-settings.schema.json
Lines 8 to 32 in 5ab77d3
To ensure that the limits are respected when multiple instances are running the following is done:
0. Every instance has a unique "sync" file in "/sync/<random_number>".