Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Memory Buffer #2

Open
Dantas198 opened this issue Dec 5, 2022 · 0 comments
Open

New Memory Buffer #2

Dantas198 opened this issue Dec 5, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@Dantas198
Copy link
Collaborator

Dantas198 commented Dec 5, 2022

Environment

  • Write-many and read-many;
  • Background threads write data asynchnously to an in-memory structure;
  • Client threads only read and can issue hundreds of thousands of small reads
    (TFRecords).

Discussion

  • After placement there will be no more insert/delete operations, when eviction doesn't take place.

    Knowing this, we can stop the locking mechanism after the first epoch, removing the concurrency control from the existing buffer, making it read-only.

  • We know what files are going to be read (Metadata Container Service), but not the order (for now).

    With this, we can make a map with static entries, i.e., instead of inserting entries at the time of request (prefetching for large files), the map starts with all the entries already inserted. And values are simply inserted latter on. There is no need for a key-level locking, since reader threads only access the content mapped by a key after the placement.
    The problem of static entries is that Monarch does not assume the type of data that is used (e.g, raw images or TFRecords), we can end up with a map filled with keys that have no value, due to the storage quota. This can be relevant when reading raw images and not TFRecords. The solution is to predefine what files we are going to cache (an additional initialization step), instead of following the requests order until reaching the storage quota.

  • Instead of a map we can use a pre-allocated array.

    Pros: Avoids an additional search by key and allows easier implementations of a sample-chunking mechanism (possible future work)
    Cons: We have to store additional metadata for each file (offsets)

With these solution pre-determine the samples that are going to cached can be optimal (e.g., using an heuristic) instead of filling the array in order of request arrival

@Dantas198 Dantas198 self-assigned this Dec 5, 2022
@Dantas198 Dantas198 added the enhancement New feature or request label Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant