Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start node-problem-detector on deployed instances to collect memory stats #1523

Closed
wants to merge 5 commits into from

Conversation

dconnolly
Copy link
Contributor

@dconnolly dconnolly commented Dec 15, 2020

Motivation

The current default metrics available in gcloud about our zebrad nodes deployed on VMs don't have metrics about memory usage.

Solution

Add google-monitoring-enabled=true metadata to deployed instances

Enables the Node Problem Detector on Container-Optimized OS, which collects metrics on memory usage, open tcp connections, processes, cpu steal, swap usage, on top of existing host-collected metrics.

Review

Not urgent.

Enables the Node Problem Detector on Container-Optimized OS, which collects
metrics on memory usage, open tcp connections, processes, cpu steal, swap usage,
on top of existing host-collected metrics.
@dconnolly dconnolly added A-infrastructure Area: Infrastructure changes A-devops Area: Pipelines, CI/CD and Dockerfiles labels Dec 15, 2020
This was referenced Dec 15, 2020
teor2345
teor2345 previously approved these changes Dec 15, 2020
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, and it should help us monitor memory issues like #1486 and #1487

@dconnolly
Copy link
Contributor Author

Unfortunately this metadata flag doesn't seem to get picked up when using any of the create-with-container instance deployment variants, it works fine for plain ol' gcloud compute instances create. I'll play around with options like startup scripts /cloud init

@teor2345
Copy link
Contributor

These look good, but I'm not sure if they actually work, and if they need another review.

Also I think there is a conflict with #1529.

@dconnolly dconnolly marked this pull request as draft December 17, 2020 22:25
@dconnolly
Copy link
Contributor Author

Moving to draft while I keep trying options that work with the create-with-container command variants: #1523 (comment)

@dconnolly dconnolly changed the title Add google-monitoring-enabled=true metadata to deployed instances Start node-problem-detector on deployed instances to collect memory stats Dec 30, 2020
@dconnolly dconnolly force-pushed the enable-node-problem-detector branch from 11b53f2 to faff2de Compare December 30, 2020 18:53
@dconnolly
Copy link
Contributor Author

When deploying containers to containers I just cannot get these memory metrics out. Closing. :/

@dconnolly dconnolly closed this Dec 30, 2020
@dconnolly dconnolly deleted the enable-node-problem-detector branch December 30, 2020 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-devops Area: Pipelines, CI/CD and Dockerfiles A-infrastructure Area: Infrastructure changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants