Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target Allocator does not operate outside of Kubernetes but is essential for scaling OTEL(prometheus "receivers" are essential) #3317

Open
vape-spryker opened this issue Oct 1, 2024 · 13 comments · May be fixed by #3532
Labels
area:target-allocator Issues for target-allocator enhancement New feature or request

Comments

@vape-spryker
Copy link

vape-spryker commented Oct 1, 2024

Component(s)

target allocator

Is your feature request related to a problem? Please describe.

I am a user of ECS but what I am writing makes sense for most out of Kubernetes use-cases like ours like bare instances.

We are using OTEL relaying heavily on the prometheus "receiver" ( I put in quotes as its a scraper :) ) these days most of the cloud-native stack is running prometheus-compliant api metrics endpoint hence this plugin of OTEL becomes critical for metrics collection. We are faced in a situation where we need to scale our collector for HA and potentially for capacity and thats where prometheus "receiver" becomes a pain which is currently only elegantly solved by the Target Allocator - any other solution like randomly spread the config and somehow feeding it via a separate configuration management creates complicated dynamics.

But here comes the problem, TA is written and designed for Kubernetes environment and currently tightly coupled in the otel-operator codebase, however it solves a domain of issues beyond orchestrator.
Currently the implementation allows me to feed static list of scraping configuration which is great but this is not flexible enough to use it in my use case. Discovery of collectors is still K8s hardcoded. Technically I can try to add AWS CloudMap discovery for ECS and maybe that will be enough to make it work but I am not sure if this contribution will be accepted in this project.

The use case of TA is outside of the domain of otel operator(the Single Responsibility principal) and it would be great that any OTEL citizen not only K8s has access to it :)

Describe the solution you'd like

Extend collector discovery with AWS CloudMap based on tags and names so the collectors be discovered.
Add service discovery using the AWS CloudMap so endpoints can be created automatically - this is not a big issue as I can provide static scraping config.
AWS has a https://github.com/aws-samples/prometheus-for-ecs/ (https://github.com/aws-samples/prometheus-for-ecs/blob/main/pkg/aws/cloudmap.go) and I am aiming to use it in a similiar approach to augment the TA.

At least, you should be able to provide static config of the collectors and scraping config to be chunked and distributed which would democratize the TA to work in any environment.

Describe alternatives you've considered

Manually randomize configuration, write it in AWS SSM and potentially let initcontainers for the collectors Service in ECS to determine which config is for which collector when they start. This is a flawed approach but apart from the TA there is no option.

Additional context

https://github.com/aws-samples/prometheus-for-ecs/

@vape-spryker vape-spryker added enhancement New feature or request needs triage labels Oct 1, 2024
@jaronoff97 jaronoff97 added area:target-allocator Issues for target-allocator and removed needs triage labels Oct 1, 2024
@nicolastakashi
Copy link
Contributor

I think it's doable especially because the allocator is using prometheus libraries and all the service discovery capabilities should be easy use.

The part I think requires the most amount of changes is about the collector discovery, which is fully tied to the Kubernetes API.

@swiatekm swiatekm added the discuss-at-sig This issue or PR should be discussed at the next SIG meeting label Dec 10, 2024
@swiatekm
Copy link
Contributor

swiatekm commented Dec 10, 2024

I will go even further, and point out that Target Allocator isn't even necessarily Otel-specific. You could easily adapt it to horizontally scaling plain Prometheus, and there's been some efforts in that direction.

One of the reasons I'm bringing this up is that we're probably not going to accept any PRs adding non-K8s service discovery to the Target Allocator in the short-term. The reason is simply that this is out-of-scope for the operator project - we have neither the domain expertise nor the ability to test a wide range of SD mechanisms.

With that said, I will happily review your PR @vape-spryker and discuss the changes you've made to introduce non-K8s SD. I've also added this topic to the agenda for our SIG meeting at 6 PM CET on 19.12.2024. If you'd like to join the discussion, we'd love to have you!

@nicolastakashi
Copy link
Contributor

I'm wondering if we could use plain prom config for non kubernetes service discovery.

@swiatekm
Copy link
Contributor

Also tagging @open-telemetry/operator-approvers @open-telemetry/operator-maintainers for visibility.

@vape-spryker
Copy link
Author

vape-spryker commented Dec 13, 2024

@swiatekm I've introduced only outside of Kubernetes collector discovery based on AWS-Cloud Map- it can be used in any topology in AWS not specifically ECS. The service discovery is not touched and is entirely Prometheus based. For ECS service discovery, instead of implementing this I've just added a sidecar otel-collector with observer/ecs to present the targets as files which TA takes w/o problem as it uses the service discovery functionality from Prometheus.

@nicolastakashi
Copy link
Contributor

Why not only using static config leverage prometheus service discovery? @vape-spryker

@vape-spryker
Copy link
Author

@nicolastakashi Thats exactly what I am doing. The part that adds is to discover the collectors so targets can be assigned to them effectevly. This part prometheus does not do. Apart from that the service discovery is entirely based on the prometheus. It is static config but the endpoints of the ecs cluster as they change has to be discovered and prometheus dont have native discovery of ECS, thats why i use sidecar collector with observer. If you think of a native option i would gladly use it

@swiatekm swiatekm removed the discuss-at-sig This issue or PR should be discussed at the next SIG meeting label Dec 19, 2024
@vape-spryker
Copy link
Author

@swiatekm I couldn't join the SIG meeting but also we are internally solving the CLA in the company. Did you manage to discuss it ?

@jaronoff97
Copy link
Contributor

@vape-spryker Apologies, we discussed this at the SIG meeting and came to this conclusion (link to the meeting notes). We decided that we would accept the ability for the TA to discover collector targets outside of only Kubernetes, however, to be as environment agnostic as possible we would like that interface to simply be through a static file that the TA reads, combined with a reload endpoint that triggers a read-refresh. This would allow anyone (not just ECS) to take advantage of the TA through a separate environment-specific process.

@nicolastakashi
Copy link
Contributor

@jaronoff97 if we use the same file_sd as prometheus offer we can get it working right?

@swiatekm
Copy link
Contributor

@nicolastakashi can we? Prometheus' service discovery discovers scrape targets, whereas here we just need a list of collector IDs. Sounds a bit overkill to me.

@vape-spryker
Copy link
Author

@swiatekm Both are not related. Introducing such dependency can be dangerous mechanism as IP discovery can be a byproduct of file_sd but in case if changes in this mechanism and format this will break collector discovery. Service discovery and collector discovery are not related. Lets try to uphold the Single responsibility.
@jaronoff97 We can use inotify to reread the file but reload trigger is also a good backup just in case.

@nicolastakashi
Copy link
Contributor

@nicolastakashi can we? Prometheus' service discovery discovers scrape targets, whereas here we just need a list of collector IDs. Sounds a bit overkill to me.

Sorry for the delay, I was on PTO and just come back this week ☀️
Well in theory we can, we would win all the inotify implementation and in the future if we need more information like instance ip or something like that it will be available by design.

But I don't have strong opnion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:target-allocator Issues for target-allocator enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants