Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cadvisor for container insight receiver #3414

Closed
wants to merge 1 commit into from
Closed

Add cadvisor for container insight receiver #3414

wants to merge 1 commit into from

Conversation

pxaws
Copy link
Contributor

@pxaws pxaws commented May 17, 2021

Description:
Add cadvisor for container insight receiver

  • embed cadvisor lib inside and use it collect container stats
  • generate metrics based on container stats

Link to tracking Issue:
#2307

Testing:
Unit tests.

@pxaws pxaws requested a review from a team May 17, 2021 22:19
@pxaws
Copy link
Contributor Author

pxaws commented May 17, 2021

@alolita @mxiamxia

// override other defaults (in future)
}
for name, defaultValue := range flagOverrides {
if f := flag.Lookup(name); f != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this flag supposed to be configured? I'm not aware of any mechanism that the collector exposes for individual components to register flags.

If it is a flag that is configured elsewhere, is it really okay to be changing its values here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a workaround to change the default setting for cadvisor since it doesn't provide an api to change the setting. The Kubelet code is doing the same thing:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cadvisor/cadvisor_linux.go#L64-L78

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu @tigrannajaryan can you give your thoughts on this? It feels weird to me to be changing flag configuration in a library like this.

In any event, to the extent that this isn't in an init() function, as is done by the kubelet, I'm not sure this can even be effective. I'd expect the flags to already have been parsed by the time this is invoked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The flag overriding should be done in init(). I have made the change. As for whether we should changing flag in the current way (like what kubelet did), I will do more research and put info here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the flag housekeeping_interval that we are overriding is only used in one place: https://github.com/google/cadvisor/search?q=%22housekeeping_interval%22&type=code. The value is taken directly from commandline flag: https://github.com/google/cadvisor/blob/ef7e64f9efab1257e297d7af339e94bb016cf221/manager/container.go#L49
I see the value is used in a couple of places:

I don't think cadvisor currently provide a way to customize the flag value through any api. So overriding the flag value seems to be the only way to change the default. Maybe that's the reason that kubelet code is doing the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu @tigrannajaryan Please take a look and let us know if it is OK for us to merge the code like this. Thank you very much!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not overriding cAdvisor flags, and instead opening a PR against cAdvisor to make it explicitly configurable. I'm happy to approve such PRs. The kubelet probably shouldn't be doing that either...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since that will take a while, it is probably acceptable to do this for now if they need to have this configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about cadvisor, so take what I write below with a big grain of salt:
The flags are a global shared state and components should not touch anything globally shared. Doing this sounds like a very bad idea to me.
I haven't had a chance to look at the source code of this PR and don't know what is the problem that we are solving but again, we should avoid touching flags in the component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the code to override the variable HousekeepingInterval from cadvisor manager package. This is a little hacky but is much better than overriding the global flags. I will need to use the proper way once the cadvisor issue is resolved: google/cadvisor#2886

@pxaws pxaws requested a review from Aneurysm9 May 26, 2021 21:02
@pxaws
Copy link
Contributor Author

pxaws commented Jun 2, 2021

The checks failed at loadtest which shouldn't be caused by current code changes. I will git push -f to re-trigger the checks.

// override other defaults (in future)
}
for name, defaultValue := range flagOverrides {
if f := flag.Lookup(name); f != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu @tigrannajaryan can you give your thoughts on this? It feels weird to me to be changing flag configuration in a library like this.

In any event, to the extent that this isn't in an init() function, as is done by the kubelet, I'm not sure this can even be effective. I'd expect the flags to already have been parsed by the time this is invoked.

@pxaws pxaws changed the title add cadvisor for container insight receiver Add cadvisor for container insight receiver Jun 4, 2021
@alolita alolita added ready to merge Code review completed; ready to merge by maintainers release:allowed-for-ga labels Jun 9, 2021
@bogdandrutu
Copy link
Member

@alolita @anuraaga @Aneurysm9 it is very hard (impossible) to review 7K line PRs especially when I don't know if some of the code is copied, etc. I don't know how you did this, but I have 0 confidence about merging this change. Let me know how you did this, otherwise I think we should split the PR into a PR that copies the code (if that is the case) and second PR that uses the code.

@bogdandrutu bogdandrutu removed the ready to merge Code review completed; ready to merge by maintainers label Jun 11, 2021
@bogdandrutu
Copy link
Member

Removed ready-to-merge since this is way above the size of a reviewable PR

@pxaws
Copy link
Contributor Author

pxaws commented Jun 11, 2021

@alolita @anuraaga @Aneurysm9 it is very hard (impossible) to review 7K line PRs especially when I don't know if some of the code is copied, etc. I don't know how you did this, but I have 0 confidence about merging this change. Let me know how you did this, otherwise I think we should split the PR into a PR that copies the code (if that is the case) and second PR that uses the code.

@bogdandrutu The test json files take about ~5 K lines. I have separated them out into a different PR: #3756. Could you approve it? Thanks!

@bogdandrutu
Copy link
Member

Still ~2.5K PR, needs to be split

@pxaws
Copy link
Contributor Author

pxaws commented Jun 14, 2021

Still ~2.5K PR, needs to be split

I am going to split the current PR further, but they will need to base on the PR for adding the test json files: #3756. Could you merge that PR first? Thanks

@pxaws
Copy link
Contributor Author

pxaws commented Jun 15, 2021

@bogdandrutu I have separated some code into a different PR: #3790. Could you merge it? Thanks! (I will sent another PR after that PR is merged and then I will modify the current PR to reduce the size)

@tigrannajaryan
Copy link
Member

@bogdandrutu I have separated some code into a different PR: #3790. Could you merge it? Thanks! (I will sent another PR after that PR is merged and then I will modify the current PR to reduce the size)

@pxaws since you are submitting different PRs should we close this PR?

@pxaws
Copy link
Contributor Author

pxaws commented Jun 16, 2021

@bogdandrutu I have separated some code into a different PR: #3790. Could you merge it? Thanks! (I will sent another PR after that PR is merged and then I will modify the current PR to reduce the size)

@pxaws since you are submitting different PRs should we close this PR?

Sure. You can close this PR.

@pxaws pxaws closed this Jun 17, 2021
alexperez52 referenced this pull request in open-o11y/opentelemetry-collector-contrib Aug 18, 2021
Implements a simple store that will be used to record stale metrics;
those that are NOT present in the current scrape, but were in the prior
scrape. This logic will then be attached to our receiver logic
to then complete the mechanism and then issue stale markers.

Updates #3413
Updates #2961
bogdandrutu pushed a commit that referenced this pull request Sep 3, 2021
… own (#5062)

* Revert "receiver/prometheus: glue and complete staleness marking for disappearing metrics (#3423)"

This reverts commit 8b79380.

* Revert "receiver/prometheus: add store to track stale metrics (#3414)"

This reverts commit cdc1634.

* stop dropping staleness markers from prometheus, and fix tests

* add staleness end to end test from #3423
bogdandrutu pushed a commit that referenced this pull request Sep 9, 2021
… own (#5062) (#5170)

* Revert "receiver/prometheus: glue and complete staleness marking for disappearing metrics (#3423)"

This reverts commit 8b79380.

* Revert "receiver/prometheus: add store to track stale metrics (#3414)"

This reverts commit cdc1634.

* stop dropping staleness markers from prometheus, and fix tests

* add staleness end to end test from #3423
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants