You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the cache_file_limit is set to a large value, e.g. 10k, calls to StorageManager.get_local_copy gets extremely slow, even if all the files are already available in the cache.
By profiling, it seems that this call to iterdir() is the main bottleneck. If there are a lot of small files in cache, and get_local_copy is called for each of them, iterating over all the files on each call is too slow.
To reproduce
Set StorageManager.set_cache_file_limit(10_000)
Download multiple files with StorageManager.get_local_copy to fill up the cache
Run again
Expected behaviour
If all the files are already available in cache, the second run should almost be immediate. Instead it can take minutes.
Since iterating over the files seems to be needed only for deleting old files if the cache is full, maybe there could be a parameter to disable this logic and another method to trigger it manually.
Environment
Server type: self hosted
ClearML SDK Version: 1.16.5
ClearML Server Version: 1.16.2-502
Python Version: 3.11
OS: Debian 10
The text was updated successfully, but these errors were encountered:
Describe the bug
When the
cache_file_limit
is set to a large value, e.g. 10k, calls toStorageManager.get_local_copy
gets extremely slow, even if all the files are already available in the cache.By profiling, it seems that this call to
iterdir()
is the main bottleneck. If there are a lot of small files in cache, andget_local_copy
is called for each of them, iterating over all the files on each call is too slow.To reproduce
StorageManager.set_cache_file_limit(10_000)
StorageManager.get_local_copy
to fill up the cacheExpected behaviour
If all the files are already available in cache, the second run should almost be immediate. Instead it can take minutes.
Since iterating over the files seems to be needed only for deleting old files if the cache is full, maybe there could be a parameter to disable this logic and another method to trigger it manually.
Environment
The text was updated successfully, but these errors were encountered: