You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background threads write data asynchnously to an in-memory structure;
Client threads only read and can issue hundreds of thousands of small reads
(TFRecords).
Discussion
After placement there will be no more insert/delete operations, when eviction doesn't take place.
Knowing this, we can stop the locking mechanism after the first epoch, removing the concurrency control from the existing buffer, making it read-only.
We know what files are going to be read (Metadata Container Service), but not the order (for now).
With this, we can make a map with static entries, i.e., instead of inserting entries at the time of request (prefetching for large files), the map starts with all the entries already inserted. And values are simply inserted latter on. There is no need for a key-level locking, since reader threads only access the content mapped by a key after the placement.
The problem of static entries is that Monarch does not assume the type of data that is used (e.g, raw images or TFRecords), we can end up with a map filled with keys that have no value, due to the storage quota. This can be relevant when reading raw images and not TFRecords. The solution is to predefine what files we are going to cache (an additional initialization step), instead of following the requests order until reaching the storage quota.
Instead of a map we can use a pre-allocated array.
Pros: Avoids an additional search by key and allows easier implementations of a sample-chunking mechanism (possible future work) Cons: We have to store additional metadata for each file (offsets)
With these solution pre-determine the samples that are going to cached can be optimal (e.g., using an heuristic) instead of filling the array in order of request arrival
The text was updated successfully, but these errors were encountered:
Environment
(TFRecords).
Discussion
After placement there will be no more insert/delete operations, when eviction doesn't take place.
Knowing this, we can stop the locking mechanism after the first epoch, removing the concurrency control from the existing buffer, making it read-only.
We know what files are going to be read (Metadata Container Service), but not the order (for now).
With this, we can make a map with static entries, i.e., instead of inserting entries at the time of request (prefetching for large files), the map starts with all the entries already inserted. And values are simply inserted latter on. There is no need for a key-level locking, since reader threads only access the content mapped by a key after the placement.
The problem of static entries is that Monarch does not assume the type of data that is used (e.g, raw images or TFRecords), we can end up with a map filled with keys that have no value, due to the storage quota. This can be relevant when reading raw images and not TFRecords. The solution is to predefine what files we are going to cache (an additional initialization step), instead of following the requests order until reaching the storage quota.
Instead of a map we can use a pre-allocated array.
Pros: Avoids an additional search by key and allows easier implementations of a sample-chunking mechanism (possible future work)
Cons: We have to store additional metadata for each file (offsets)
With these solution pre-determine the samples that are going to cached can be optimal (e.g., using an heuristic) instead of filling the array in order of request arrival
The text was updated successfully, but these errors were encountered: