-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve monitor performance #10368
improve monitor performance #10368
Conversation
… doneCh is closed and never recover.
…he log channel before getting a chance of reading from it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think this is going to be a good improvement. Left some suggestions below.
I also had a question about 1eeddf7. Generally it is expected that once something is closed or Stopped it can't be started again. Was that behaviour causing a problem? Maybe we could document that a monitor is only safe for one use, instead of changing that behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think this is going to be a good improvement. Left some suggestions below.
I also had a question about 1eeddf7. Generally it is expected that once something is closed or Stopped it can't be started again. Was that behaviour causing a problem? Maybe we could document that a monitor is only safe for one use, instead of changing that behaviour?
Yes the issue was that if we stop the monitor we can't start it again. I was testing and I came cross that issue.
Why do we not expect the monitor to be started multiple times. This is in the server side so it's a plausible scenario to start and stop monitoring multiple times to debug an agent. am I missing something?
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
…mes, the doneCh is closed and never recover." This reverts commit 1eeddf7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One idea for a small improvement, but not blocking
@dnephin I added the suggested fix but went with a |
That works. I think it could potentially be confusing the the reader since generally wait groups are used for goroutines to end (not start), but functionally I think it's going to do the right thing. |
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/386958. |
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/386967. |
🍒✅ Cherry pick of commit c8ba2d4 onto |
* remove flush for each write to http response in the agent monitor endpoint * fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover. * start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it * flush every 500ms to optimize log writing in the http server side. * add changelog file * add issue url to changelog * fix changelog url * Update changelog Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * use ticker to flush and avoid race condition when flushing in a different goroutine * stop the ticker when done Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover." This reverts commit 1eeddf7 * wait for log consumer loop to start before registering the sink Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
🍒✅ Cherry pick of commit c8ba2d4 onto |
* remove flush for each write to http response in the agent monitor endpoint * fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover. * start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it * flush every 500ms to optimize log writing in the http server side. * add changelog file * add issue url to changelog * fix changelog url * Update changelog Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * use ticker to flush and avoid race condition when flushing in a different goroutine * stop the ticker when done Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover." This reverts commit 1eeddf7 * wait for log consumer loop to start before registering the sink Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
🍒✅ Cherry pick of commit c8ba2d4 onto |
* remove flush for each write to http response in the agent monitor endpoint * fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover. * start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it * flush every 500ms to optimize log writing in the http server side. * add changelog file * add issue url to changelog * fix changelog url * Update changelog Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * use ticker to flush and avoid race condition when flushing in a different goroutine * stop the ticker when done Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover." This reverts commit 1eeddf7 * wait for log consumer loop to start before registering the sink Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Fixes #10347
improve monitor performance to avoid filling the log channel and loose logs when running monitor or debug.
An issue was reported that we loose some logs when are running
consul monitor
orconsul debug
commands.the current underlying monitor implementation avoid blocking when writing logs to not impact performance and allow losing some logs when we overflow the log channel (size 512 lines of code). This PR is to enhance the performance of the log channel read so we can avoid loosing logs.