Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define monitor unsubscribe_on_delete #10687

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

iziemba
Copy link
Contributor

@iziemba iziemba commented Jan 8, 2025

Not all memory montiors have a 1-N relationship between subscribe and the MR cache entries. Some memory monitors, such as kdreg2, have a 1-1 relationship and require unsubscribe to be called when the corresponding MR cache entry is deleted.

To meet this requirement, unsubscribe_on_delete is defined. When true for a memory monitor, ofi_monitor_unsubscribe() will be called on MR cache being freed.

@iziemba iziemba requested a review from j-xiong January 8, 2025 18:15
@iziemba iziemba force-pushed the mem_monitor_updates branch from 4ff05b7 to 20a72a4 Compare January 8, 2025 18:16
@j-xiong
Copy link
Contributor

j-xiong commented Jan 8, 2025

Please add the new files to Windows build file list.

ofi_monitor_unsubscribe(monitor, entry->info.iov.iov_base,
entry->info.iov.iov_len,
&entry->hmem_info);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done within cache->delete_region()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't. kdreg2 is designed to work with any provider. Putting this in delete_region() would result in the ofi_monitor_unsubscribe() being called for a specific provider delete_region() implementation.

In addition, ofi_monitor_subscribe() is called outside of cache->add_region(). Because of this, ofi_monitor_unsubscribe() should be called outside of cache->delete_region().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

But the general architecture we have for the cache and monitors is to use callbacks, some of which may be set to a no-op. This is changing that to using a variable to adjust the flow. Maybe that's okay...

But in other places, ofi_monitor_subscribe/unsubscribe are called holding the mm_lock. Here, there are no locks being held.

Other memory monitors, such as CUDA, ROCR, and ZE, have a .c file for
the implementation. This change cleans up the util_mem_monitor.c code by
defining a uffd and import .c file, thus aligning to other memory
monitor implementations.

Signed-off-by: Mike Uttormark <mike.uttormark@hpe.com>
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Not all memory montiors have a 1-N relationship between subscribe and
the MR cache entries. Some memory monitors, such as kdreg2, have a 1-1
relationship and require unsubscribe to be called when the corresponding
MR cache entry is deleted.

To meet this requirement, unsubscribe_on_delete is defined. When true
for a memory monitor, ofi_monitor_unsubscribe() will be called on MR
cache being freed.

Signed-off-by: Mike Uttormark <mike.uttormark@hpe.com>
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
@iziemba iziemba force-pushed the mem_monitor_updates branch from 20a72a4 to 8a4a4f7 Compare January 10, 2025 04:02
@iziemba iziemba requested a review from shefty January 10, 2025 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants