Skip to content

Conversation

guptask
Copy link
Contributor

@guptask guptask commented Jun 15, 2022

Added file-based SHM segment option in SHM module and CacheAllocator. This PR allows any memory-mapped file to be used as a memory tier in CacheLib. However, these changes only allow use of a single tier currently.
This is first half of the changes needed to enable multi-tiering in CacheLib. The multi-tier config APIs are in a separate PR (#138). These config APIs are needed to enable multi-tiering. Once that PR is merged, the second half of the multi-tier changes can be sent upstream via a separate PR.

igchor and others added 4 commits June 15, 2022 00:58
It's implementation is mostly based on PosixShmSegment.

Also, extend ShmManager and ShmSegmentOpts to support this new
segment type.
After introducing file segment type, nameToKey_ does not provide
enough information to recover/remove segments on restart.

This commit fixes that by replacing nameToKey_ with nameToOpts_.

Previously, the Key from nameToKey_ map was only used in a single
DCHECK().
It wrongly assumed that the only possible segment type is
PosixSysV segment.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 15, 2022
@guptask guptask force-pushed the upstream_file_shm_segment branch from ede2aae to 1fab591 Compare June 15, 2022 08:32
@guptask guptask marked this pull request as ready for review June 15, 2022 09:36
@guptask
Copy link
Contributor Author

guptask commented Jul 12, 2022

Did anyone get a chance to review this PR ?

@haowu14
Copy link
Contributor

haowu14 commented Jul 13, 2022

I'm in the process of reviewing. I am curious that do we have to associate a path and a name with each file segment? Can we use only name to identify a file segment, like we do with Posix?
If yes, we can greatly simplify the change and avoid a cold restart.

@guptask
Copy link
Contributor Author

guptask commented Jul 18, 2022

I'm in the process of reviewing. I am curious that do we have to associate a path and a name with each file segment? Can we use only name to identify a file segment, like we do with Posix? If yes, we can greatly simplify the change and avoid a cold restart.

Are you referring to the addition of PosixSysVSegmentOpts argument in removeShm ? If not, can you please add more details in your query ?

@haowu14
Copy link
Contributor

haowu14 commented Jul 18, 2022

It's this and why we need to have each FileSegment have a path instead of just a name and figure out the path from the name.

Just like Posix always writes segments to /dev/shm, we can have FileSegments always writes to some base path and FileSegment can be located at base path + function(name). This way, we just need to change usePosix_ into an enum and the majority of the code structure can be unchanged.

@igchor
Copy link
Contributor

igchor commented Jul 19, 2022

@haowu14 the idea is that each FileSegment can represent a different memory type, and so each will have a different base path. For example, our most common use case is to have a segment on /mnt/pmem/tier1 and a second one on /dev/shm/tier0

@haowu14
Copy link
Contributor

haowu14 commented Jul 19, 2022

Thanks for clarifying. Is it correct that in the future it will be possible for the same shmManager to mount different types of segments (tier 1 is file, tier 0 is posix)?

@guptask
Copy link
Contributor Author

guptask commented Jul 19, 2022

Thanks for clarifying. Is it correct that in the future it will be possible for the same shmManager to mount different types of segments (tier 1 is file, tier 0 is posix)?

Yes, the changes were done with this flexibility in mind. The Shm segments can be Posix, SysV or File and can be tiered in any order.

@haowu14
Copy link
Contributor

haowu14 commented Jul 20, 2022

Hi! Thank you for making the change. We discussed internally on how we can bring FileShmSegment into cachelib and allow compatibility. We'd like to suggest the following change (your PR already have most of them, I just listed them down here so that we have a full picture):

Breaking this change into two PRs:

The first PR

  • Create a class containing Id and Type of a segment called ShmTypeOpts. The type is an enum. The ID is a string. Put ShmTypeOpts into ShmSegmentOpts.
  • ShmBase is initialized solely with ShmSegmentOpts since name_ can now be initialized with the ID in ShmSegmentOpts.ShmTypeOpts.
  • Posix/SysVShmSegment class (subclasses of ShmBase) will be new-ed/attached without an extra name input.
  • ShmSegment class (the class that wraps a ShmBase) will be new-ed/attached without name or usePosix because that can be retrieved with ShmSegmentOpts.ShmTypeOpts).
  • In ShmManager, we keep a map from name to ShmTypeOpts called nameToType_ (replacing nameToKey_, but we will need to keep nameToKey_ for compatibility reason).
  • When creating a new segment via ShmManager, we expect a name and ShmTypeOpts to be provided. If the type indicates SysV or Posix, we ignore the id field in ShmTypeOpts and fill the id field with uniqueIdForName. We use this ShmTypeOpts to create the segment then save the segment into segments_ and save the ShmTypeOpts to nameToType_.
  • When attaching/removing a segment via ShmManager, we expect only the name provided. ShmManager looks up the segment and the ShmTypeOpts from nameToType_ and try to attach/remove with the ShmTypeOpts retrieved.
  • When removing a segment via the static function of ShmManager, we expect ShmTypeOpts to be provided directly.

After the first diff, the usePosix_ field inside ShmManager will be noop since each segment know what type they are from nameToType_.

The second PR

Introduce FileShmSegment. Most of the logic wouldn't require change except that we won't overwrite to the id field in ShmTypeOpts if the type indicates file. FileShmSegments will interpret the id in ShmTypeOpts as the path of the file to access (just like PosixShmSegment will take the id and write to /dev/shm/id).

Does the above change satisfy the need to introduce FileShmSegments? This is mostly aligned with what you have, other than making ShmTypeOpts something uniform for all the types.

If it looks good, there's some additional comments regarding compatibility. To avoid dropping the cache when this new code is deployed, we suggest the following:
In the first PR, increment the kCacheLibVersion in cachelib/allocator/CacheVersion.h. Still keep nameToOps_ map and update it together when we update nameToType_. Still keep the usePosix_ field. When updating the thrift file, add one more field instead of modify the existing nameToKey_.

After that lands, CacheLib team will make a clean up diff to remove the duplicated logic for compatibility.

Again thank you for making this change.

@guptask
Copy link
Contributor Author

guptask commented Jul 26, 2022

Hi! Thank you for making the change. We discussed internally on how we can bring FileShmSegment into cachelib and allow compatibility. We'd like to suggest the following change (your PR already have most of them, I just listed them down here so that we have a full picture):

Breaking this change into two PRs:

The first PR

  • Create a class containing Id and Type of a segment called ShmTypeOpts. The type is an enum. The ID is a string. Put ShmTypeOpts into ShmSegmentOpts.
  • ShmBase is initialized solely with ShmSegmentOpts since name_ can now be initialized with the ID in ShmSegmentOpts.ShmTypeOpts.
  • Posix/SysVShmSegment class (subclasses of ShmBase) will be new-ed/attached without an extra name input.
  • ShmSegment class (the class that wraps a ShmBase) will be new-ed/attached without name or usePosix because that can be retrieved with ShmSegmentOpts.ShmTypeOpts).
  • In ShmManager, we keep a map from name to ShmTypeOpts called nameToType_ (replacing nameToKey_, but we will need to keep nameToKey_ for compatibility reason).
  • When creating a new segment via ShmManager, we expect a name and ShmTypeOpts to be provided. If the type indicates SysV or Posix, we ignore the id field in ShmTypeOpts and fill the id field with uniqueIdForName. We use this ShmTypeOpts to create the segment then save the segment into segments_ and save the ShmTypeOpts to nameToType_.
  • When attaching/removing a segment via ShmManager, we expect only the name provided. ShmManager looks up the segment and the ShmTypeOpts from nameToType_ and try to attach/remove with the ShmTypeOpts retrieved.
  • When removing a segment via the static function of ShmManager, we expect ShmTypeOpts to be provided directly.

After the first diff, the usePosix_ field inside ShmManager will be noop since each segment know what type they are from nameToType_.

The second PR

Introduce FileShmSegment. Most of the logic wouldn't require change except that we won't overwrite to the id field in ShmTypeOpts if the type indicates file. FileShmSegments will interpret the id in ShmTypeOpts as the path of the file to access (just like PosixShmSegment will take the id and write to /dev/shm/id).

Does the above change satisfy the need to introduce FileShmSegments? This is mostly aligned with what you have, other than making ShmTypeOpts something uniform for all the types.

If it looks good, there's some additional comments regarding compatibility. To avoid dropping the cache when this new code is deployed, we suggest the following: In the first PR, increment the kCacheLibVersion in cachelib/allocator/CacheVersion.h. Still keep nameToOps_ map and update it together when we update nameToType_. Still keep the usePosix_ field. When updating the thrift file, add one more field instead of modify the existing nameToKey_.

After that lands, CacheLib team will make a clean up diff to remove the duplicated logic for compatibility.

Again thank you for making this change.

Thanks for the detailed breakdown. I'm working on refactoring the existing PR into two separate PRs based on the list you have chalked out.

@guptask guptask closed this Sep 26, 2022
@guptask
Copy link
Contributor Author

guptask commented Sep 26, 2022

Closing this PR due to priority re-alignment. NUMA support for CXL is now the primary focus in memory-tiering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants