improve monitor performance #10368

dhiaayachi · 2021-06-09T02:20:58Z

improve monitor performance to avoid filling the log channel and loose logs when running monitor or debug.

An issue was reported that we loose some logs when are running consul monitor or consul debug commands.

the current underlying monitor implementation avoid blocking when writing logs to not impact performance and allow losing some logs when we overflow the log channel (size 512 lines of code). This PR is to enhance the performance of the log channel read so we can avoid loosing logs.

…point

… doneCh is closed and never recover.

…he log channel before getting a chance of reading from it

dnephin

Thanks! I think this is going to be a good improvement. Left some suggestions below.

I also had a question about 1eeddf7. Generally it is expected that once something is closed or Stopped it can't be started again. Was that behaviour causing a problem? Maybe we could document that a monitor is only safe for one use, instead of changing that behaviour?

.changelog/10368.txt

agent/agent_endpoint.go

dhiaayachi

Thanks! I think this is going to be a good improvement. Left some suggestions below.

I also had a question about 1eeddf7. Generally it is expected that once something is closed or Stopped it can't be started again. Was that behaviour causing a problem? Maybe we could document that a monitor is only safe for one use, instead of changing that behaviour?

Yes the issue was that if we stop the monitor we can't start it again. I was testing and I came cross that issue.
Why do we not expect the monitor to be started multiple times. This is in the server side so it's a plausible scenario to start and stop monitoring multiple times to debug an agent. am I missing something?

logging/monitor/monitor.go

.changelog/10368.txt

agent/agent_endpoint.go

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

…rent goroutine

logging/monitor/monitor.go

agent/agent_endpoint.go

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

…mes, the doneCh is closed and never recover." This reverts commit 1eeddf7

dnephin

LGTM! One idea for a small improvement, but not blocking

logging/monitor/monitor.go

dhiaayachi · 2021-06-15T00:49:56Z

@dnephin I added the suggested fix but went with a WaitGroup as it seems appropriate for this use case, any thought?

dnephin · 2021-06-15T16:08:36Z

That works. I think it could potentially be confusing the the reader since generally wait groups are used for goroutines to end (not start), but functionally I think it's going to do the right thing.

hc-github-team-consul-core · 2021-06-15T16:08:43Z

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/386958.

hc-github-team-consul-core · 2021-06-15T16:23:19Z

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/386967.

hc-github-team-consul-core · 2021-06-15T16:23:23Z

🍒✅ Cherry pick of commit c8ba2d4 onto release/1.10.x succeeded!

* remove flush for each write to http response in the agent monitor endpoint * fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover. * start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it * flush every 500ms to optimize log writing in the http server side. * add changelog file * add issue url to changelog * fix changelog url * Update changelog Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * use ticker to flush and avoid race condition when flushing in a different goroutine * stop the ticker when done Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover." This reverts commit 1eeddf7 * wait for log consumer loop to start before registering the sink Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

hc-github-team-consul-core · 2021-06-15T16:23:27Z

🍒✅ Cherry pick of commit c8ba2d4 onto release/1.9.x succeeded!

* remove flush for each write to http response in the agent monitor endpoint * fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover. * start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it * flush every 500ms to optimize log writing in the http server side. * add changelog file * add issue url to changelog * fix changelog url * Update changelog Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * use ticker to flush and avoid race condition when flushing in a different goroutine * stop the ticker when done Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover." This reverts commit 1eeddf7 * wait for log consumer loop to start before registering the sink Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

hc-github-team-consul-core · 2021-06-15T16:23:32Z

🍒✅ Cherry pick of commit c8ba2d4 onto release/1.8.x succeeded!

* remove flush for each write to http response in the agent monitor endpoint * fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover. * start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it * flush every 500ms to optimize log writing in the http server side. * add changelog file * add issue url to changelog * fix changelog url * Update changelog Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * use ticker to flush and avoid race condition when flushing in a different goroutine * stop the ticker when done Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover." This reverts commit 1eeddf7 * wait for log consumer loop to start before registering the sink Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

dhiaayachi added 3 commits June 8, 2021 22:12

remove flush for each write to http response in the agent monitor end…

6c39a98

…point

fix race condition when we stop and start monitor multiple times, the…

1eeddf7

… doneCh is closed and never recover.

start log reading goroutine before adding the sink to avoid filling t…

3c9d34d

…he log channel before getting a chance of reading from it

dhiaayachi requested a review from dnephin June 9, 2021 02:21

github-actions bot added the theme/telemetry Anything related to telemetry or observability label Jun 9, 2021

dhiaayachi added theme/cli Flags and documentation for the CLI interface and removed theme/telemetry Anything related to telemetry or observability labels Jun 9, 2021

flush every 500ms to optimize log writing in the http server side.

8c04856

vercel bot temporarily deployed to Preview – consul-ui-staging June 9, 2021 23:05 Inactive

vercel bot temporarily deployed to Preview – consul June 9, 2021 23:05 Inactive

add changelog file

f04b371

vercel bot temporarily deployed to Preview – consul-ui-staging June 10, 2021 03:04 Inactive

vercel bot temporarily deployed to Preview – consul June 10, 2021 03:04 Inactive

add issue url to changelog

ae3f858

vercel bot temporarily deployed to Preview – consul June 10, 2021 13:35 Inactive

vercel bot temporarily deployed to Preview – consul-ui-staging June 10, 2021 13:35 Inactive

fix changelog url

a524e54

vercel bot temporarily deployed to Preview – consul-ui-staging June 10, 2021 13:40 Inactive

vercel bot temporarily deployed to Preview – consul June 10, 2021 13:40 Inactive

dnephin reviewed Jun 10, 2021

View reviewed changes

.changelog/10368.txt Outdated Show resolved Hide resolved

agent/agent_endpoint.go Outdated Show resolved Hide resolved

dhiaayachi commented Jun 10, 2021

View reviewed changes

logging/monitor/monitor.go Show resolved Hide resolved

.changelog/10368.txt Outdated Show resolved Hide resolved

agent/agent_endpoint.go Outdated Show resolved Hide resolved

Update changelog

762eda1

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

vercel bot temporarily deployed to Preview – consul June 10, 2021 19:44 Inactive

vercel bot temporarily deployed to Preview – consul-ui-staging June 10, 2021 19:44 Inactive

use ticker to flush and avoid race condition when flushing in a diffe…

7428230

…rent goroutine

vercel bot temporarily deployed to Preview – consul-ui-staging June 11, 2021 15:58 Inactive

vercel bot temporarily deployed to Preview – consul June 11, 2021 15:58 Inactive

dnephin reviewed Jun 11, 2021

View reviewed changes

logging/monitor/monitor.go Outdated Show resolved Hide resolved

dnephin reviewed Jun 11, 2021

View reviewed changes

agent/agent_endpoint.go Show resolved Hide resolved

dnephin reviewed Jun 11, 2021

View reviewed changes

agent/agent_endpoint.go Show resolved Hide resolved

stop the ticker when done

d80cd84

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

vercel bot temporarily deployed to Preview – consul June 11, 2021 17:49 Inactive

vercel bot temporarily deployed to Preview – consul-ui-staging June 11, 2021 17:49 Inactive

Revert "fix race condition when we stop and start monitor multiple ti…

74419c7

…mes, the doneCh is closed and never recover." This reverts commit 1eeddf7

vercel bot temporarily deployed to Preview – consul June 14, 2021 14:08 Inactive

vercel bot temporarily deployed to Preview – consul-ui-staging June 14, 2021 14:08 Inactive

dnephin approved these changes Jun 14, 2021

View reviewed changes

logging/monitor/monitor.go Show resolved Hide resolved

wait for log consumer loop to start before registering the sink

85d00cb

vercel bot temporarily deployed to Preview – consul-ui-staging June 15, 2021 00:47 Inactive

vercel bot temporarily deployed to Preview – consul June 15, 2021 00:47 Inactive

dhiaayachi merged commit c8ba2d4 into master Jun 15, 2021

dhiaayachi deleted the dhia/monitor-performance-improvement branch June 15, 2021 16:05

dhiaayachi added backport/1.10 labels Jun 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve monitor performance #10368

improve monitor performance #10368

dhiaayachi commented Jun 9, 2021 •

edited by dnephin

Loading

dnephin left a comment

dhiaayachi left a comment

dnephin left a comment

dhiaayachi commented Jun 15, 2021

dnephin commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

improve monitor performance #10368

improve monitor performance #10368

Conversation

dhiaayachi commented Jun 9, 2021 • edited by dnephin Loading

dnephin left a comment

Choose a reason for hiding this comment

dhiaayachi left a comment

Choose a reason for hiding this comment

dnephin left a comment

Choose a reason for hiding this comment

dhiaayachi commented Jun 15, 2021

dnephin commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

hc-github-team-consul-core commented Jun 15, 2021

dhiaayachi commented Jun 9, 2021 •

edited by dnephin

Loading