Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue #1146 mio leaks memory when poll does not retrieve any event #1147

Closed
wants to merge 2 commits into from

Conversation

dtacalau
Copy link
Contributor

@dtacalau dtacalau commented Nov 12, 2019

Changes to fix leaked sock states, issue #1146, and couple of improvements. All registered sock_states are saved in a slab data structure. Sock_states are removed from slab during deregister. A reregister expects to find the sock_state saved in slab already. During Selector drop all sock_states from the slab are cancelled and dropped. The slab saves sock_states as pairs of (key, sock_state). The key is also saved in the sock_state and used as overlapped data. Later, when receiving the completion event, the key is used to retrieve the sock_state. The slab will never get full of unusable sock_states which have been marked for deletion because, once a sock_state has been marked for deletion it will also be removed from the slab. Specific cases which should trigger the removal are: 1. a sock_state for which completion returned error, i.e ERROR_INVALID_HANDLE; 2. a sock_state for which a an LOCAL_CLOSE event was retrieved;

Signed-off-by: Daniel Tacalau dst4096@gmail.com

@Thomasdezeeuw
Copy link
Collaborator

Where are we increasing the ref count without decreasing it? Can we change the code in such a way it doesn't need to call consume? Perhaps by implementing this in the Drop impl?

@dtacalau
Copy link
Contributor Author

@Thomasdezeeuw

Where are we increasing the ref count without decreasing it?

Before calling afd.poll the SockState arc is wrapped in overlapped and sent as user data to the ioctl

            let wrapped_overlapped = OverlappedArcWrapper::new(self_arc);
            let overlapped = wrapped_overlapped.get_ptr() as *const _ as PVOID;
            let result = unsafe {
                self.afd
                    .poll(&mut self.poll_info, (*self.iosb).as_mut_ptr(), overlapped)
            };

Because the the arc of SockState contains self_wrapped, which is pointing back to itself, you'll always have a arc ref count of two, so it won't drop.

Can we change the code in such a way it doesn't need to call consume?

Looking into this, I'm not fully satisfied with this fix myself, that's why is draft, I posted this PR as draft because I wanted to see how this change does on CI tests.

Perhaps by implementing this in the Drop impl?

Don't think it's possible to fix this with Drop, have a look at the doc of OverlappedArcWrapper

/// This is the deallocation wrapper for overlapped pointer.
/// In case of error or status changing before the overlapped pointer is actually used(or not even being used),
/// this wrapper will decrease the reference count of Arc if being dropped.
/// Remember call `forget` if you have used the Arc, or you could decrease the reference count by two causing double free.

@dtacalau
Copy link
Contributor Author

After seeing the Ci failures on Windows, I've run the tests under drmemory and got the source of the crash:

Error #18: UNADDRESSABLE ACCESS of freed memory: reading 0x000000c3647a5e60-0x000000c3647a5e68 8 byte(s)
# 0 std::sync::mutex::Mutex<>::lock<>                                        [/rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54\src\libstd\sync\mutex.rs:220]
# 1 mio::sys::windows::selector::SelectorInner::feed_events                  [C:\work\git\mio\src\sys\windows\selector.rs:646]
# 2 mio::sys::windows::selector::SelectorInner::select2                      [C:\work\git\mio\src\sys\windows\selector.rs:507]
# 3 mio::sys::windows::selector::SelectorInner::select                       [C:\work\git\mio\src\sys\windows\selector.rs:475]
# 4 mio::sys::windows::selector::Selector::select                            [C:\work\git\mio\src\sys\windows\selector.rs:383]
# 5 mio::poll::Poll::poll                                                    [C:\work\git\mio\src\poll.rs:366]
# 6 registering::register_deregister                                         [C:\work\git\mio\tests\registering.rs:109]
# 7 registering::register_deregister::{{closure}}                            [C:\work\git\mio\tests\registering.rs:70]

The crashing test does the following: poll, register, deregister, .... another poll. Problem is, after the leak fix, you might get a crash after deregister during another poll, for getting an event for a sock state which was deregistered.

This should not happen, deregister should wait for the afd cancel to complete and consume all events.

@dtacalau dtacalau marked this pull request as ready for review November 15, 2019 10:25
@dtacalau
Copy link
Contributor Author

@Thomasdezeeuw

Can we change the code in such a way it doesn't need to call consume? Perhaps by implementing this in the Drop impl?

I've removed consume after calling cancel for all pending SockStates when dropping SelectorInner. This fixes all leaks reported in issues #1146 and #1150.

@@ -407,13 +407,50 @@ impl Selector {
pub struct SelectorInner {
cp: Arc<CompletionPort>,
update_queue: Mutex<VecDeque<Arc<Mutex<SockState>>>>,
pending_queue: Mutex<HashMap<u64, Arc<Mutex<SockState>>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about using user_data as the key into the HashMap. If the user token is re-used between deregister() and register() of a different socket, the internal state gets messed up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally pending_queue and update_queue would be behind the same Mutex.
I didn't get around to doing that in #1149 but it should be done eventually nonetheless.

Copy link
Contributor

@piscisaureus piscisaureus Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pending_queue" is a bit of a misnomer here. This is not a queue of pending socket, but rather one of all sockets (regardless of whether they're pending/idle/cancelled/deleted).
Other than that, keeping a table of all sockets does seem like a pretty good solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like explicitly deregistered sockets are removed from this queue, but implicitly deregistered ones are. Maybe make that more consistent?

Copy link
Contributor Author

@dtacalau dtacalau Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piscisaureus

I'm concerned about using user_data as the key into the HashMap. If the user token is re-used between deregister() and register() of a different socket, the internal state gets messed up.

I agree, not a good idea to use token as key.

Ideally pending_queue and update_queue would be behind the same Mutex.

Ok.

"pending_queue" is a bit of a misnomer here.

I agree, need a better name here.

It looks like explicitly deregistered sockets are removed from this queue, but implicitly deregistered ones are. Maybe make that more consistent?

What do you mean by "implicitly deregistered sockets"? Could you give an example of such a case? Thanks

);

match result {
Ok(i) if i == 0 || i < 1024 => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i == 0 || i < 1024 is tautological.

@dtacalau
Copy link
Contributor Author

@piscisaureus I'm thinking to reimplement this. I've summarized the idea below. Please let me know what you think, or whether you want to implement a fix for the leak yourself after merging #1154. I'm trying to avoid doing duplicate work.

For each registered SockState, generate a unique id, and keep an internal map of pairs (unique id, SockState). This map will be updated when doing reregister/deregister. So, this unique id will be the key instead of the token.

The unique SockState id can be used as overlapped data instead of overlapped_wrapper or transmute_copy(Pin<Arc<..SockState>>>):

  • when doing afd.poll just set unique id as overlapped;
  • when polling, doing cp.get_many, retrieve SockState by using the unique id from overlapped;

To fix the leaked SockState just iterate this map and cancel all SockStates from the map during Selector drop.

… improvements. All registered sock_states are saved in a slab data structure. Sock_states are removed from slab during deregister. A reregister expects to find the sock_state saved in slab already. During Selector drop all sock_states from the slab are cancelled and dropped. The slab saves sock_states as pairs of (key, sock_state). The key is also saved in the sock_state and used as overlapped data. Later, when receiving the completion event, the key is used to retrieve the sock_state. The slab will never get full of unusable sock_states which have been marked for deletion because, once a sock_state has been marked for deletion it will also be removed from the slab. Specific cases which should trigger the removal are: 1. a sock_state for which completion returned error, i.e ERROR_INVALID_HANDLE; 2. a sock_state for which a an LOCAL_CLOSE event was retrieved;

Signed-off-by: Daniel Tacalau <dst4096@gmail.com>
@Thomasdezeeuw
Copy link
Collaborator

I'm not sure if the general idea is a good fit, I'll leave that up to some with some more experience with Windows.

One idea I had, since we already allocate state data can you use the pointer as unique id? Then we don't have to keep a map. I don't know if this is possible with regards to memory safety though.

@dtacalau
Copy link
Contributor Author

This PR should not be needed after dtacalau@4ba341e gets merged.

@dtacalau
Copy link
Contributor Author

No longer needed, another leak fix is under review here #1154

@dtacalau dtacalau closed this Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants