Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve monitor performance #10368

Merged
merged 12 commits into from
Jun 15, 2021
Merged

Conversation

dhiaayachi
Copy link
Contributor

@dhiaayachi dhiaayachi commented Jun 9, 2021

Fixes #10347

improve monitor performance to avoid filling the log channel and loose logs when running monitor or debug.

An issue was reported that we loose some logs when are running consul monitor or consul debug commands.

the current underlying monitor implementation avoid blocking when writing logs to not impact performance and allow losing some logs when we overflow the log channel (size 512 lines of code). This PR is to enhance the performance of the log channel read so we can avoid loosing logs.

@dhiaayachi dhiaayachi requested a review from dnephin June 9, 2021 02:21
@github-actions github-actions bot added the theme/telemetry Anything related to telemetry or observability label Jun 9, 2021
@dhiaayachi dhiaayachi added theme/cli Flags and documentation for the CLI interface and removed theme/telemetry Anything related to telemetry or observability labels Jun 9, 2021
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 9, 2021 23:05 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 9, 2021 23:05 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 10, 2021 03:04 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 10, 2021 03:04 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 10, 2021 13:35 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 10, 2021 13:35 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 10, 2021 13:40 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 10, 2021 13:40 Inactive
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this is going to be a good improvement. Left some suggestions below.

I also had a question about 1eeddf7. Generally it is expected that once something is closed or Stopped it can't be started again. Was that behaviour causing a problem? Maybe we could document that a monitor is only safe for one use, instead of changing that behaviour?

.changelog/10368.txt Outdated Show resolved Hide resolved
agent/agent_endpoint.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@dhiaayachi dhiaayachi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this is going to be a good improvement. Left some suggestions below.

I also had a question about 1eeddf7. Generally it is expected that once something is closed or Stopped it can't be started again. Was that behaviour causing a problem? Maybe we could document that a monitor is only safe for one use, instead of changing that behaviour?

Yes the issue was that if we stop the monitor we can't start it again. I was testing and I came cross that issue.
Why do we not expect the monitor to be started multiple times. This is in the server side so it's a plausible scenario to start and stop monitoring multiple times to debug an agent. am I missing something?

logging/monitor/monitor.go Show resolved Hide resolved
.changelog/10368.txt Outdated Show resolved Hide resolved
agent/agent_endpoint.go Outdated Show resolved Hide resolved
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
@vercel vercel bot temporarily deployed to Preview – consul June 10, 2021 19:44 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 10, 2021 19:44 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 11, 2021 15:58 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 11, 2021 15:58 Inactive
logging/monitor/monitor.go Outdated Show resolved Hide resolved
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
@vercel vercel bot temporarily deployed to Preview – consul June 11, 2021 17:49 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 11, 2021 17:49 Inactive
…mes, the doneCh is closed and never recover."

This reverts commit 1eeddf7
@vercel vercel bot temporarily deployed to Preview – consul June 14, 2021 14:08 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 14, 2021 14:08 Inactive
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One idea for a small improvement, but not blocking

logging/monitor/monitor.go Show resolved Hide resolved
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 15, 2021 00:47 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 15, 2021 00:47 Inactive
@dhiaayachi
Copy link
Contributor Author

@dnephin I added the suggested fix but went with a WaitGroup as it seems appropriate for this use case, any thought?

@dhiaayachi dhiaayachi merged commit c8ba2d4 into master Jun 15, 2021
@dhiaayachi dhiaayachi deleted the dhia/monitor-performance-improvement branch June 15, 2021 16:05
@dnephin
Copy link
Contributor

dnephin commented Jun 15, 2021

That works. I think it could potentially be confusing the the reader since generally wait groups are used for goroutines to end (not start), but functionally I think it's going to do the right thing.

@hc-github-team-consul-core
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/386958.

@hc-github-team-consul-core
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/386967.

@hc-github-team-consul-core
Copy link
Contributor

🍒✅ Cherry pick of commit c8ba2d4 onto release/1.10.x succeeded!

hc-github-team-consul-core pushed a commit that referenced this pull request Jun 15, 2021
* remove flush for each write to http response in the agent monitor endpoint

* fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover.

* start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it

* flush every 500ms to optimize log writing in the http server side.

* add changelog file

* add issue url to changelog

* fix changelog url

* Update changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* use ticker to flush and avoid race condition when flushing in a different goroutine

* stop the ticker when done

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover."

This reverts commit 1eeddf7

* wait for log consumer loop to start before registering the sink

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
@hc-github-team-consul-core
Copy link
Contributor

🍒✅ Cherry pick of commit c8ba2d4 onto release/1.9.x succeeded!

hc-github-team-consul-core pushed a commit that referenced this pull request Jun 15, 2021
* remove flush for each write to http response in the agent monitor endpoint

* fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover.

* start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it

* flush every 500ms to optimize log writing in the http server side.

* add changelog file

* add issue url to changelog

* fix changelog url

* Update changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* use ticker to flush and avoid race condition when flushing in a different goroutine

* stop the ticker when done

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover."

This reverts commit 1eeddf7

* wait for log consumer loop to start before registering the sink

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
@hc-github-team-consul-core
Copy link
Contributor

🍒✅ Cherry pick of commit c8ba2d4 onto release/1.8.x succeeded!

hc-github-team-consul-core pushed a commit that referenced this pull request Jun 15, 2021
* remove flush for each write to http response in the agent monitor endpoint

* fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover.

* start log reading goroutine before adding the sink to avoid filling the log channel before getting a chance of reading from it

* flush every 500ms to optimize log writing in the http server side.

* add changelog file

* add issue url to changelog

* fix changelog url

* Update changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* use ticker to flush and avoid race condition when flushing in a different goroutine

* stop the ticker when done

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* Revert "fix race condition when we stop and start monitor multiple times, the doneCh is closed and never recover."

This reverts commit 1eeddf7

* wait for log consumer loop to start before registering the sink

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/cli Flags and documentation for the CLI interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

monitor: consul monitor and consul debug are missing many logs
3 participants