Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] vsock: Mem leak caused by circular referencing #438

Closed
cutelizebin opened this issue Sep 7, 2023 · 5 comments · Fixed by #499
Closed

[BUG] vsock: Mem leak caused by circular referencing #438

cutelizebin opened this issue Sep 7, 2023 · 5 comments · Fixed by #499
Assignees
Labels
bug Something isn't working

Comments

@cutelizebin
Copy link
Contributor

cutelizebin commented Sep 7, 2023

There might be a memory leak bug caused by circular referencing.

VhostUserBackend ==> VhostUserVsockThread ==> VringEpollHandler
^ V
|| ||
==================================

There three struct refers to each other using Arc , but no single line is using Weak, which make it a circle.

And all these three struct cannot be dropped properly.

Reproduction Process:

  1. Launch vhost-device-vsock
  2. Launch and kill QEMU or other VMM
  3. Use top -Hp $(pgrep vhost-device-vsock) to watch the threads details

Advice

Maybe we can use Weak<VhostUserBackend> in VringEpollHandler.

@stefano-garzarella
Copy link
Member

Maybe we can use Weak<VhostUserBackend> in VringEpollHandler.

Or what about in VhostUserVsockThread where we're already using Option<Arc<VringEpollHandler<ArcVhostBknd, VringRwLock, ()>> ?

@cutelizebin do you want to work on it?

@cutelizebin
Copy link
Contributor Author

yes,I'll send a fix later on.

@stefano-garzarella stefano-garzarella changed the title [BUG]: Mem leak caused by circular referencing [BUG] vsock: Mem leak caused by circular referencing Sep 27, 2023
@stefano-garzarella stefano-garzarella added the bug Something isn't working label Sep 27, 2023
@stefano-garzarella
Copy link
Member

This is also causing an issue after we merged commit 38caab2

When the VM reboot or shutdown, we have this error, instead of looping waiting for a new VM

ERROR vhost_device_vsock] Could not create backend: CID already in use by another vsock device

This happens because we have these cyclic references and VhostUserVsockThread::drop() is never invoked. So we don't remove the cid from the map.

@stefano-garzarella
Copy link
Member

@cutelizebin do you have time to work on it? Otherwise, I can take a look ;-)

stefano-garzarella added a commit to stefano-garzarella/vhost-device that referenced this issue Oct 24, 2023
We have the following circular references found by Li Zebin:
    VhostUserBackend ==> VhostUserVsockThread ==> VringEpollHandler

In addition to causing a resource leak, this causes also an error
after we merged commit 38caab2 ("vsock: Don't allow duplicate CIDs").
When the VM reboot or shutdown, the application exits with the
following error:

    [ERROR vhost_device_vsock] Could not create backend:
        CID already in use by another vsock device

This happened because we have these circular references and
VhostUserVsockThread::drop() is never invoked. So, we don't remove
the cid from the map.

Let's fix this problem by simply removing the reference to
VringEpollHandler from VhostUserVsockThread. In fact, we do not
need to keep the reference for the lifetime of VhostUserVsockThread,
as we only need to add the handlers once.

Let's also rename the fields to follow the current VhostUserDaemon
API.

Closes rust-vmm#438

Reported-by: Li Zebin <cutelizebin@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
@stefano-garzarella
Copy link
Member

@cutelizebin was easier than expected, we don't need to keep the VringEpollHandler reference in VhostUserVsockThread. I just sent #499 .

Thanks for reporting this!

stefano-garzarella added a commit to stefano-garzarella/vhost-device that referenced this issue Oct 25, 2023
We have the following circular references found by Li Zebin:
    VhostUserBackend ==> VhostUserVsockThread ==> VringEpollHandler

In addition to causing a resource leak, this causes also an error
after we merged commit 38caab2 ("vsock: Don't allow duplicate CIDs").
When the VM reboot or shutdown, the application exits with the
following error:

    [ERROR vhost_device_vsock] Could not create backend:
        CID already in use by another vsock device

This happened because we have these circular references and
VhostUserVsockThread::drop() is never invoked. So, we don't remove
the cid from the map.

Let's fix this problem by simply removing the reference to
VringEpollHandler from VhostUserVsockThread. In fact, we do not
need to keep the reference for the lifetime of VhostUserVsockThread,
as we only need to add the handlers once.

Let's also rename the fields to follow the current VhostUserDaemon
API.

Closes rust-vmm#438

Reported-by: Li Zebin <cutelizebin@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
vireshk pushed a commit that referenced this issue Oct 27, 2023
We have the following circular references found by Li Zebin:
    VhostUserBackend ==> VhostUserVsockThread ==> VringEpollHandler

In addition to causing a resource leak, this causes also an error
after we merged commit 38caab2 ("vsock: Don't allow duplicate CIDs").
When the VM reboot or shutdown, the application exits with the
following error:

    [ERROR vhost_device_vsock] Could not create backend:
        CID already in use by another vsock device

This happened because we have these circular references and
VhostUserVsockThread::drop() is never invoked. So, we don't remove
the cid from the map.

Let's fix this problem by simply removing the reference to
VringEpollHandler from VhostUserVsockThread. In fact, we do not
need to keep the reference for the lifetime of VhostUserVsockThread,
as we only need to add the handlers once.

Let's also rename the fields to follow the current VhostUserDaemon
API.

Closes #438

Reported-by: Li Zebin <cutelizebin@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants