[elastic-agent] managed apm-server doesn't know its elasticsearch cluster_uuid #145

stuartnelson3 · 2021-09-28T16:00:19Z

When apm-server is managed by elastic-agent, it doesn't know its elasticsearch cluster_uuid, which is necessary for being properly displayed in the stack monitoring ui.

Version: 7.15
Operating System: Ubuntu 21.06
Steps to Reproduce:

Start elastic-agent with a config that spawns apm-server

Check the /state endpoint on the apm-server:

$ curl -s --unix-socket /usr/share/elastic-agent/state/data/tmp/default/apm-server/apm-server.sock http:/state | jq .outputs
{
  "elasticsearch": {
    "cluster_uuid": ""
  }
}

Compare that to a standalone apm-server:

# curl -s http://localhost:5066/state | jq '.outputs.elasticsearch'
{
  "cluster_uuid": "JYUVz_CMSkaQvxV6xl4jPg"
}

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-09-28T16:00:23Z

Pinging @elastic/agent (Team:Agent)

ruflin · 2021-09-29T13:32:50Z

Have not tested this but this could be a permission issue that the API Key provided by Elastic Agent does not have permissions to fetch this information: https://github.com/elastic/beats/blob/master/libbeat/cmd/instance/beat.go#L897 Any chance you could try this out to see if this the issue?

stuartnelson3 · 2021-09-30T10:51:32Z

I checked the keys that exist in my local setup and they're all for user:admin, so I'm not sure if it's that.

I was looking at the code you linked to, and it appears that this is called in the clients Connect() method. If the callback fails (which is what I'm assuming would happen if there were a permissions error), the entire Connect() method returns an error.

Also, would these permissions settings disallow viewing the elasticsearch root, but allow indexing documents? The API key is provided on each request to elasticsearch: https://github.com/elastic/beats/blob/4b1f69923b3f2abbbf1860295fe5dbff7db3d63c/libbeat/esleg/eslegclient/connection.go#L419-L421 I'm able to index documents with the managed apm-server, so I'm wondering if the issue is that the initial callback is never registered because when the managed apm-server starts up, it doesn't have an output? I'm not sure.

ruflin · 2021-09-30T11:42:54Z

If I remember correctly / requires monitoring privileges or similar. But as you wrote, I would also expect an error in the logs.

@blakerouse @michalpristas Could you comment on the above? I don't know the details around when exactly the output is loaded / pushed but this could be it.

stuartnelson3 · 2021-09-30T12:03:35Z

@simitt confirmed that cluster:monitor is required for /

simitt · 2021-09-30T12:14:56Z

@ruflin any concerns in adding this privilege to Elastic Agent and subprocesses? Where would we need to make the changes? This is critical for enabling metrics collection for the managed apm-server.

ruflin · 2021-09-30T12:30:28Z

Yes, I have concerns to add these. We recently had a similar conversation in the context of the license check and we do not want to give each Elastic Agent cluster:monitor. The information that can be gather through these permissions is way too broad for an Elastic Agent running on an untrusted machine.

We should work with the Elasticsearch team to figure out a way on how to not require the full blown monitoring permissions to just get the cluster id. (@colings86 I tried to find the issue / conversation related to the license check but without success).

There might be another workaround we can do. It is based on the assumption that fleet-server and apm-server always run in a trusted environment. The apm integration could these permissions to the permission list in the package. I think @axw added recently a feature to support this to the spec.

axw · 2021-10-04T06:00:36Z

Is it feasible for Fleet to inject the Elasticsearch cluster UUID into the agent policy?

I think @axw added recently a feature to support this to the spec.

The feature only covers index/data stream privileges currently. It would have to be extended to support cluster privileges.

simitt · 2021-10-04T07:09:36Z

@joshdover @jen-huang can you help figure this out? It is the last bit left to fix for supporting apm stack monitoring under elastic agent.

jen-huang · 2021-10-05T16:30:25Z

Is it feasible for Fleet to inject the Elasticsearch cluster UUID into the agent policy?

The feature only covers index/data stream privileges currently. It would have to be extended to support cluster privileges.

~~Fleet might be able to inject it directly but that feels like a hack.~~ (Edit: thinking on this more, I don't feel like this is a viable long-term solution as the user's configured ES host may not be one that Fleet/Kibana can reach.) Out of these two options, extending support for cluster privileges in package spec and Fleet seems like the better option.

We should work with the Elasticsearch team to figure out a way on how to not require the full blown monitoring permissions to just get the cluster id. (@colings86 I tried to find the issue / conversation related to the license check but without success).

But changing it on the ES side sounds even better. I wonder what is the rationale behind requiring cluster:monitor to read /?

jen-huang · 2021-10-05T17:40:08Z

@joegallo pointed me to setting cluster:monitor/main privilege instead of cluster:monitor on the API key, this should reduce the amount of permissions we grant just to retrieve the UUID. @elastic/es-security can someone on the team confirm that cluster:monitor/main only grants permissions to /? Thanks in advance :D

@here Assuming cluster:monitor/main is acceptable, where does this change need to be made? In the agent policy that Fleet sends, Fleet Server, Elastic Agent, or in Beats?

elasticmachine · 2021-10-06T08:06:11Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

tvernum · 2021-10-06T08:38:02Z

can someone on the team confirm that cluster:monitor/main only grants permissions to /?

At this point in time, that is true. It is likely to always be true, but we don't provide any commitments about the mapping from Rest APIs to specific action names, so it's possible something could change without warning in the future.

The difficulty is that this approach will expose an internal ES implementation detail (the name of the action that handles /) and embeds it into Fleet API Keys. What happens if the ES implementation needs to change? Is there some way we can regenerate the API Keys to use different privilege?

ruflin · 2021-10-11T11:04:51Z

@scunningham Pulling you in here too as it is about increasing privileges.
@tvernum The API Keys are regenerated when the policy changes. It would be nice if we have a feature that also on upgrade policies can be regenerated if the permissions changes. This would also be helpful in other cases.

@jen-huang Regarding "as the user's configured ES host may not be one that Fleet/Kibana can reach". Why is this the case? I would assume either Fleet or Fleet-server need to have credentials to access the monitoring host to create the relevant API Keys, so it should be reachable.

I understand that for historical reasons, we added the cluster uuid. But it seems odd that something monitoring itself outside Elasticsearch needs to add this as part of the monitoring doc and make a request to Elasticsearch first. And to work around this, we increase permission for all Elastic Agents which we shouldn't. What are the alternatives here?

Add it during ingestion, for example through an ingest pipeline
Don't require it anymore in the first place in the UI
Add it through the policy

Short term from my perspective the option would be to add it through the policy and figure out the long term plan in parallel. I assume kibana_system has the permissions to get the cluster uuid as it needs to get it for its own usage. Increasing Elastic Agent permissions seems to go in the wrong direction.

stuartnelson3 · 2021-10-11T12:10:19Z

@ruflin we're looking at implementing a similar solution to what was introduced by @axw, but extending it to include cluster support. Stack monitoring is essential for going GA and I'm concerned that continuing to discuss alternative solutions is going to jeopardize that.

jen-huang · 2021-10-11T18:21:06Z

@jen-huang Regarding "as the user's configured ES host may not be one that Fleet/Kibana can reach". Why is this the case? I would assume either Fleet or Fleet-server need to have credentials to access the monitoring host to create the relevant API Keys, so it should be reachable.

I was talking specially about Fleet/Kibana, not Fleet Server. Kibana makes no attempt to reach any output host, so it cannot inject the UUID in the policy directly.

@ruflin we're looking at implementing a similar solution to what was introduced by @axw, but extending it to include cluster support.

@ruflin Any objections to proceeding with this proposal?

tvernum · 2021-10-12T04:25:54Z

@tvernum The API Keys are regenerated when the policy changes. It would be nice if we have a feature that also on upgrade policies can be regenerated if the permissions changes. This would also be helpful in other cases.

I think that's enough. It means if we ever find ourselves needing to change something about that action (which isn't likely) we know we can put together a plan that would include making sure all the API keys were regenerated along with a compatibility period where the old action name continued to be supported.

Assuming there's testing on the Observabilty side to ensure that these API keys keep working with new ES releases (which I'm sure there will be), I think it's safe enough for you to add cluster:monitor/main to APM server's API key (by adding it to all API keys, or on a case-by-case basis - the latter is better). If it ever looks like we broke it, then you'll need to shout at us and we can fix it.

ruflin · 2021-10-12T08:43:38Z

Going forward with the package-spec proposal works for me. It means, the permissions are only given to Elastic Agent which run apm-server and in general I would argue it can be assumed apm-server run in a trusted environment. It also means at any time if we have a better solution all we need to do is update the apm package.

Going with this short term solution should not hold us back to find a proper solution long term that the cluster uuid is not used anymore in this context or retrieved differently. @jasonrhodes This likely falls into your area?

simitt · 2022-03-07T16:08:50Z

Stack monitoring on managed APM Server is working with the changes mentioned above.

I believe this issue should now be scoped to the last bit that @ruflin was mentioning in #145 (comment); or close this issue and create a new one with the open part.

ruflin · 2022-03-08T08:50:59Z

Lets close this issue. @jasonrhodes Can you take the lead on opening a new issue around the usage of cluster_uuid ?

stuartnelson3 added bug Something isn't working Team:Elastic-Agent Label for the Agent team labels Sep 28, 2021

stuartnelson3 mentioned this issue Oct 1, 2021

[stack monitoring ui] apm-server events indexed to metricbeat-* not displayed in stack monitoring ui elastic/kibana#112926

Closed

jlind23 added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Oct 6, 2021

stuartnelson3 mentioned this issue Oct 6, 2021

[Change Proposal] Add support for specifying cluster privileges elastic/package-spec#224

Closed

axw mentioned this issue Oct 20, 2021

apmpackage: fix cluster privilege elastic/apm-server#6396

Merged

jlind23 transferred this issue from elastic/beats Mar 7, 2022

ruflin closed this as completed Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[elastic-agent] managed apm-server doesn't know its elasticsearch cluster_uuid #145

[elastic-agent] managed apm-server doesn't know its elasticsearch cluster_uuid #145

stuartnelson3 commented Sep 28, 2021

elasticmachine commented Sep 28, 2021

ruflin commented Sep 29, 2021

stuartnelson3 commented Sep 30, 2021

ruflin commented Sep 30, 2021

stuartnelson3 commented Sep 30, 2021

simitt commented Sep 30, 2021

ruflin commented Sep 30, 2021

axw commented Oct 4, 2021

simitt commented Oct 4, 2021

jen-huang commented Oct 5, 2021 •

edited

Loading

jen-huang commented Oct 5, 2021 •

edited

Loading

elasticmachine commented Oct 6, 2021

tvernum commented Oct 6, 2021

ruflin commented Oct 11, 2021

stuartnelson3 commented Oct 11, 2021

jen-huang commented Oct 11, 2021 •

edited

Loading

tvernum commented Oct 12, 2021

ruflin commented Oct 12, 2021

simitt commented Mar 7, 2022

ruflin commented Mar 8, 2022

[elastic-agent] managed apm-server doesn't know its elasticsearch cluster_uuid #145

[elastic-agent] managed apm-server doesn't know its elasticsearch cluster_uuid #145

Comments

stuartnelson3 commented Sep 28, 2021

elasticmachine commented Sep 28, 2021

ruflin commented Sep 29, 2021

stuartnelson3 commented Sep 30, 2021

ruflin commented Sep 30, 2021

stuartnelson3 commented Sep 30, 2021

simitt commented Sep 30, 2021

ruflin commented Sep 30, 2021

axw commented Oct 4, 2021

simitt commented Oct 4, 2021

jen-huang commented Oct 5, 2021 • edited Loading

jen-huang commented Oct 5, 2021 • edited Loading

elasticmachine commented Oct 6, 2021

tvernum commented Oct 6, 2021

ruflin commented Oct 11, 2021

stuartnelson3 commented Oct 11, 2021

jen-huang commented Oct 11, 2021 • edited Loading

tvernum commented Oct 12, 2021

ruflin commented Oct 12, 2021

simitt commented Mar 7, 2022

ruflin commented Mar 8, 2022

jen-huang commented Oct 5, 2021 •

edited

Loading

jen-huang commented Oct 5, 2021 •

edited

Loading

jen-huang commented Oct 11, 2021 •

edited

Loading