Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix goroutine leak in SubOutlet #7820

Merged
merged 1 commit into from
Aug 2, 2018
Merged

Conversation

adriansr
Copy link
Contributor

This fixes a goroutine leaking every time that a harvester is closed.

// still running.
o.mutex.Lock()
defer o.mutex.Unlock()
o.closeEventChannel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused , we have a call to close(o.done) this will trigger the select below and call o.closeEventChannel

Copy link
Contributor

@ph ph Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel can still be close.

Copy link
Contributor

@ph ph Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I got it,

OnEvent -> Acquire mutex
Out of bound close -> Block on mutex if on event is currently running.

Copy link
Contributor

@ph ph Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriansr I am a bit worried that we add one more synchronization mechanism to this object, we have the isOpen (atomic bool), the chClosed has some execution state, the mutex and the channels.

I think I am missing some context from the environment and I am just trying to understand what we want to prevent, it is multiple calls to `close(o.ch)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// This mutex prevents the event channel to be closed if OnEvent is
// still running.

From your comment this is what we want to stop? So we could drop the atomic bool and use isOpen instead of chClosed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by "drop" I mean using a single mutex.

urso
urso previously approved these changes Jul 31, 2018
@adriansr adriansr dismissed urso’s stale review July 31, 2018 18:35

Changed after review

@adriansr
Copy link
Contributor Author

I've submitted a simpler version addressing @ph comments.

To my understanding, the atomic bool is still necessary to prevent closing done twice.

@andrewkroh
Copy link
Member

the atomic bool is still necessary to prevent closing done twice.

I didn't examine the code but based on the sounds of it this could be a good use case for sync.Once to prevent channel closing more than once.

// still running.
o.mutex.Lock()
defer o.mutex.Unlock()
close(o.ch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @andrewkroh noted sync.Once could replace the semaphore isOpen.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The open flag is also used in OnEvent. Once the outlet is closed, we MUST NOT run into the select block in OnEvent. A select is non-deterministic. Even if o.done is closed, we might end up sending something to o.ch. Depending on current state this can generate either a panic or some inconsistent state .

@adriansr adriansr added the needs_backport PR is waiting to be backported to other branches. label Jul 31, 2018
This fixes a goroutine leaking every time that a harvester is closed.
@ph
Copy link
Contributor

ph commented Aug 2, 2018

@adriansr I've look at the failure on the CI, I think its one of our flaky test.

======================================================================
ERROR: Checks that states are properly removed after clean_inactive
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/gopath/src/github.com/elastic/beats/filebeat/tests/system/test_registrar.py", line 805, in test_clean_inactive
    max_timeout=10)
  File "/Users/travis/gopath/src/github.com/elastic/beats/filebeat/tests/system/../../../libbeat/tests/system/beat/beat.py", line 325, in wait_until
    "Waited {} seconds.".format(max_timeout))
TimeoutError: Timeout waiting for 'cond' to be true. Waited 10 seconds.

@urso we can merge as is? WDYT?

@adriansr
Copy link
Contributor Author

adriansr commented Aug 2, 2018

@ph just in case, I relaunched the failing job. Now it passed.

@ph ph merged commit b84e083 into elastic:master Aug 2, 2018
@ph
Copy link
Contributor

ph commented Aug 2, 2018

@adriansr I've merged it, I will take a look as soon as possible to find why this test is not reliable.

adriansr added a commit to adriansr/beats that referenced this pull request Aug 6, 2018
This fixes a goroutine leaking every time that a harvester is closed.

(cherry picked from commit b84e083)
@adriansr adriansr added v6.4.0 and removed needs_backport PR is waiting to be backported to other branches. labels Aug 6, 2018
urso pushed a commit that referenced this pull request Aug 6, 2018
Cherry-pick of PR #7820 to 6.4 branch. Original message: 

This fixes a goroutine leaking every time that a harvester is closed.
ph pushed a commit to ph/beats that referenced this pull request Dec 17, 2018
This fixes a goroutine leaking every time that a harvester is closed.

(cherry picked from commit b84e083)
@ph ph added the v6.5.4 label Dec 17, 2018
@ph ph added the v6.6.0 label Dec 17, 2018
ph pushed a commit to ph/beats that referenced this pull request Dec 17, 2018
This fixes a goroutine leaking every time that a harvester is closed.

(cherry picked from commit b84e083)
ph added a commit that referenced this pull request Dec 17, 2018
Cherry-pick of PR #7820 to 6.5 branch. Original message: 

This fixes a goroutine leaking every time that a harvester is closed.
ph added a commit that referenced this pull request Dec 17, 2018
Cherry-pick of PR #7820 to 6.x branch. Original message: 

This fixes a goroutine leaking every time that a harvester is closed.
lucksuper pushed a commit to lucksuper/beats that referenced this pull request Dec 23, 2018
…stic#9592)

Cherry-pick of PR elastic#7820 to 6.5 branch. Original message:

This fixes a goroutine leaking every time that a harvester is closed.
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…stic#7875)

Cherry-pick of PR elastic#7820 to 6.4 branch. Original message: 

This fixes a goroutine leaking every time that a harvester is closed.
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…stic#9592)

Cherry-pick of PR elastic#7820 to 6.5 branch. Original message: 

This fixes a goroutine leaking every time that a harvester is closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants