-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue #1146 mio leaks memory when poll does not retrieve any event #1147
Conversation
Where are we increasing the ref count without decreasing it? Can we change the code in such a way it doesn't need to call consume? Perhaps by implementing this in the |
Before calling afd.poll the SockState arc is wrapped in overlapped and sent as user data to the ioctl
Because the the arc of SockState contains self_wrapped, which is pointing back to itself, you'll always have a arc ref count of two, so it won't drop.
Looking into this, I'm not fully satisfied with this fix myself, that's why is draft, I posted this PR as draft because I wanted to see how this change does on CI tests.
Don't think it's possible to fix this with
|
After seeing the Ci failures on Windows, I've run the tests under drmemory and got the source of the crash:
The crashing test does the following: poll, register, deregister, .... another poll. Problem is, after the leak fix, you might get a crash after deregister during another poll, for getting an event for a sock state which was deregistered. This should not happen, deregister should wait for the afd cancel to complete and consume all events. |
src/sys/windows/selector.rs
Outdated
@@ -407,13 +407,50 @@ impl Selector { | |||
pub struct SelectorInner { | |||
cp: Arc<CompletionPort>, | |||
update_queue: Mutex<VecDeque<Arc<Mutex<SockState>>>>, | |||
pending_queue: Mutex<HashMap<u64, Arc<Mutex<SockState>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about using user_data as the key into the HashMap. If the user token is re-used between deregister()
and register()
of a different socket, the internal state gets messed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally pending_queue
and update_queue
would be behind the same Mutex.
I didn't get around to doing that in #1149 but it should be done eventually nonetheless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"pending_queue" is a bit of a misnomer here. This is not a queue of pending socket, but rather one of all sockets (regardless of whether they're pending/idle/cancelled/deleted).
Other than that, keeping a table of all sockets does seem like a pretty good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like explicitly deregistered sockets are removed from this queue, but implicitly deregistered ones are. Maybe make that more consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about using user_data as the key into the HashMap. If the user token is re-used between
deregister()
andregister()
of a different socket, the internal state gets messed up.
I agree, not a good idea to use token as key.
Ideally
pending_queue
andupdate_queue
would be behind the same Mutex.
Ok.
"pending_queue" is a bit of a misnomer here.
I agree, need a better name here.
It looks like explicitly deregistered sockets are removed from this queue, but implicitly deregistered ones are. Maybe make that more consistent?
What do you mean by "implicitly deregistered sockets"? Could you give an example of such a case? Thanks
src/sys/windows/selector.rs
Outdated
); | ||
|
||
match result { | ||
Ok(i) if i == 0 || i < 1024 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i == 0 || i < 1024
is tautological.
@piscisaureus I'm thinking to reimplement this. I've summarized the idea below. Please let me know what you think, or whether you want to implement a fix for the leak yourself after merging #1154. I'm trying to avoid doing duplicate work. For each registered SockState, generate a unique id, and keep an internal map of pairs (unique id, SockState). This map will be updated when doing reregister/deregister. So, this unique id will be the key instead of the token. The unique SockState id can be used as overlapped data instead of overlapped_wrapper or transmute_copy(Pin<Arc<..SockState>>>):
To fix the leaked SockState just iterate this map and cancel all SockStates from the map during Selector drop. |
… improvements. All registered sock_states are saved in a slab data structure. Sock_states are removed from slab during deregister. A reregister expects to find the sock_state saved in slab already. During Selector drop all sock_states from the slab are cancelled and dropped. The slab saves sock_states as pairs of (key, sock_state). The key is also saved in the sock_state and used as overlapped data. Later, when receiving the completion event, the key is used to retrieve the sock_state. The slab will never get full of unusable sock_states which have been marked for deletion because, once a sock_state has been marked for deletion it will also be removed from the slab. Specific cases which should trigger the removal are: 1. a sock_state for which completion returned error, i.e ERROR_INVALID_HANDLE; 2. a sock_state for which a an LOCAL_CLOSE event was retrieved; Signed-off-by: Daniel Tacalau <dst4096@gmail.com>
I'm not sure if the general idea is a good fit, I'll leave that up to some with some more experience with Windows. One idea I had, since we already allocate state data can you use the pointer as unique id? Then we don't have to keep a map. I don't know if this is possible with regards to memory safety though. |
This PR should not be needed after dtacalau@4ba341e gets merged. |
No longer needed, another leak fix is under review here #1154 |
Changes to fix leaked sock states, issue #1146, and couple of improvements. All registered sock_states are saved in a slab data structure. Sock_states are removed from slab during deregister. A reregister expects to find the sock_state saved in slab already. During Selector drop all sock_states from the slab are cancelled and dropped. The slab saves sock_states as pairs of (key, sock_state). The key is also saved in the sock_state and used as overlapped data. Later, when receiving the completion event, the key is used to retrieve the sock_state. The slab will never get full of unusable sock_states which have been marked for deletion because, once a sock_state has been marked for deletion it will also be removed from the slab. Specific cases which should trigger the removal are: 1. a sock_state for which completion returned error, i.e ERROR_INVALID_HANDLE; 2. a sock_state for which a an LOCAL_CLOSE event was retrieved;
Signed-off-by: Daniel Tacalau dst4096@gmail.com