Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The elasticsearch role should NOT assume localhost for the elasticsearch service #29

Open
portante opened this issue Feb 5, 2022 · 3 comments

Comments

@portante
Copy link
Contributor

portante commented Feb 5, 2022

We need to enable the use case where PCP metrics are gathered from an Elasticsearch (or OpenSearch) instance that is not local to the PMDA, and could use HTTPS w/ certificates or bearer tokens to enable metics gathering.

@natoscott
Copy link
Member

@portante what are the situations where you would not want to run the PMDA on the same machine? PCP should always be installed on all hosts forming part of any distributed system as performance problems can originate anywhere. In that case, its always the best deployment option (more efficient, simpler installs) and it seems we should be enforcing this (as we are now) so as to not add network load while sampling. It also obviously keeps the roles simpler if we don't have to add more variables, certificate handling, and so on.

@portante
Copy link
Contributor Author

portante commented Mar 1, 2022

There are a few reasons.

The statistics for Elasticsearch returned are for the whole cluster, all nodes in the cluster. Some of the queries are very involved when you have lots of indices. So having every node in an Elasticsearch cluster gather all the same metrics, can actually cause problems for the cluster itself.

So if I have a 20 node elasticsearch cluster this would be a huge problem.

Typically, you'd have those metrics collected only from the "master" nodes, since most installations only have a small number of master nodes relative to the total number of nodes in the cluster. But even that is not great, because typically folks deploy with 3 masters for availability.

But again, the elasticsearch metrics PCP collects are not for the host name on which they are collected from, but for the elasticsearch entity cluster itself.

The metrics are not really collected locally. While the elasticsearch API in use might hit a "localhost" end-point, elasticsearch in turn will send out a flood of queries to all the hosts to return the information requested. So there is no really network load being saved.

An elasticsearch admin would want to monitor the health of the cluster from outside the cluster. That is, we would not want a load on one member of the cluster to prevent metrics from being gathered. So while each node of an elasticsearch cluster would have PMDAs for other sub-systems, a "client" node would be setup to participate in the cluster, service the metrics requests, while knowing how to communicate properly with all cluster members.

But burning a whole client node to do that is a bit of a waste, so having an external PMDA target one or more client nodes (often placed behind a load-balance service like haproxy or nginx) gives us the lowest API load for gathering metrics on the cluster.

If PCP had a way for the PMDA to target metrics for something other than host name, then ideally we'd have an archive per elasticsearch cluster.

We run 3 elasticsearch cluster in our environment: Elasticsearch V1, OpenSearch 1.2.4 (Elasticsearch V7 equiv), and a second OpenSearch 1.2.4 cluster just for the logs from the infrastructure nodes providing the services.

So ideally, I'd have one PMDA which could be configured to gather all the metrics from each cluster.

@natoscott
Copy link
Member

Makes sense, thanks @portante

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants