New Memory Buffer #2

Dantas198 · 2022-12-05T19:42:36Z

Environment

Write-many and read-many;
Background threads write data asynchnously to an in-memory structure;
Client threads only read and can issue hundreds of thousands of small reads
(TFRecords).

Discussion

After placement there will be no more insert/delete operations, when eviction doesn't take place.

Knowing this, we can stop the locking mechanism after the first epoch, removing the concurrency control from the existing buffer, making it read-only.
We know what files are going to be read (Metadata Container Service), but not the order (for now).

With this, we can make a map with static entries, i.e., instead of inserting entries at the time of request (prefetching for large files), the map starts with all the entries already inserted. And values are simply inserted latter on. There is no need for a key-level locking, since reader threads only access the content mapped by a key after the placement.
The problem of static entries is that Monarch does not assume the type of data that is used (e.g, raw images or TFRecords), we can end up with a map filled with keys that have no value, due to the storage quota. This can be relevant when reading raw images and not TFRecords. The solution is to predefine what files we are going to cache (an additional initialization step), instead of following the requests order until reaching the storage quota.
Instead of a map we can use a pre-allocated array.

Pros: Avoids an additional search by key and allows easier implementations of a sample-chunking mechanism (possible future work)
Cons: We have to store additional metadata for each file (offsets)

With these solution pre-determine the samples that are going to cached can be optimal (e.g., using an heuristic) instead of filling the array in order of request arrival

Dantas198 self-assigned this Dec 5, 2022

Dantas198 added the enhancement New feature or request label Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Memory Buffer #2

New Memory Buffer #2

Dantas198 commented Dec 5, 2022 •

edited

Loading

New Memory Buffer #2

New Memory Buffer #2

Comments

Dantas198 commented Dec 5, 2022 • edited Loading

Environment

Discussion

Dantas198 commented Dec 5, 2022 •

edited

Loading