Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elastic-agent] managed apm-server doesn't know its elasticsearch cluster_uuid #145

Closed
stuartnelson3 opened this issue Sep 28, 2021 · 20 comments
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@stuartnelson3
Copy link
Contributor

When apm-server is managed by elastic-agent, it doesn't know its elasticsearch cluster_uuid, which is necessary for being properly displayed in the stack monitoring ui.

  • Version: 7.15
  • Operating System: Ubuntu 21.06
  • Steps to Reproduce:
  1. Start elastic-agent with a config that spawns apm-server
  2. Check the /state endpoint on the apm-server:
    $ curl -s --unix-socket /usr/share/elastic-agent/state/data/tmp/default/apm-server/apm-server.sock http:/state | jq .outputs
    {
      "elasticsearch": {
        "cluster_uuid": ""
      }
    }
    

Compare that to a standalone apm-server:

# curl -s http://localhost:5066/state | jq '.outputs.elasticsearch'
{
  "cluster_uuid": "JYUVz_CMSkaQvxV6xl4jPg"
}
@stuartnelson3 stuartnelson3 added bug Something isn't working Team:Elastic-Agent Label for the Agent team labels Sep 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@ruflin
Copy link
Member

ruflin commented Sep 29, 2021

Have not tested this but this could be a permission issue that the API Key provided by Elastic Agent does not have permissions to fetch this information: https://github.com/elastic/beats/blob/master/libbeat/cmd/instance/beat.go#L897 Any chance you could try this out to see if this the issue?

@stuartnelson3
Copy link
Contributor Author

I checked the keys that exist in my local setup and they're all for user:admin, so I'm not sure if it's that.

I was looking at the code you linked to, and it appears that this is called in the clients Connect() method. If the callback fails (which is what I'm assuming would happen if there were a permissions error), the entire Connect() method returns an error.

Also, would these permissions settings disallow viewing the elasticsearch root, but allow indexing documents? The API key is provided on each request to elasticsearch: https://github.com/elastic/beats/blob/4b1f69923b3f2abbbf1860295fe5dbff7db3d63c/libbeat/esleg/eslegclient/connection.go#L419-L421 I'm able to index documents with the managed apm-server, so I'm wondering if the issue is that the initial callback is never registered because when the managed apm-server starts up, it doesn't have an output? I'm not sure.

@ruflin
Copy link
Member

ruflin commented Sep 30, 2021

If I remember correctly / requires monitoring privileges or similar. But as you wrote, I would also expect an error in the logs.

@blakerouse @michalpristas Could you comment on the above? I don't know the details around when exactly the output is loaded / pushed but this could be it.

@stuartnelson3
Copy link
Contributor Author

@simitt confirmed that cluster:monitor is required for /

@simitt
Copy link

simitt commented Sep 30, 2021

@ruflin any concerns in adding this privilege to Elastic Agent and subprocesses? Where would we need to make the changes? This is critical for enabling metrics collection for the managed apm-server.

@ruflin
Copy link
Member

ruflin commented Sep 30, 2021

Yes, I have concerns to add these. We recently had a similar conversation in the context of the license check and we do not want to give each Elastic Agent cluster:monitor. The information that can be gather through these permissions is way too broad for an Elastic Agent running on an untrusted machine.

We should work with the Elasticsearch team to figure out a way on how to not require the full blown monitoring permissions to just get the cluster id. (@colings86 I tried to find the issue / conversation related to the license check but without success).

There might be another workaround we can do. It is based on the assumption that fleet-server and apm-server always run in a trusted environment. The apm integration could these permissions to the permission list in the package. I think @axw added recently a feature to support this to the spec.

@axw
Copy link
Member

axw commented Oct 4, 2021

Is it feasible for Fleet to inject the Elasticsearch cluster UUID into the agent policy?

I think @axw added recently a feature to support this to the spec.

The feature only covers index/data stream privileges currently. It would have to be extended to support cluster privileges.

@simitt
Copy link

simitt commented Oct 4, 2021

@joshdover @jen-huang can you help figure this out? It is the last bit left to fix for supporting apm stack monitoring under elastic agent.

@jen-huang
Copy link

jen-huang commented Oct 5, 2021

Is it feasible for Fleet to inject the Elasticsearch cluster UUID into the agent policy?

The feature only covers index/data stream privileges currently. It would have to be extended to support cluster privileges.

Fleet might be able to inject it directly but that feels like a hack. (Edit: thinking on this more, I don't feel like this is a viable long-term solution as the user's configured ES host may not be one that Fleet/Kibana can reach.) Out of these two options, extending support for cluster privileges in package spec and Fleet seems like the better option.

We should work with the Elasticsearch team to figure out a way on how to not require the full blown monitoring permissions to just get the cluster id. (@colings86 I tried to find the issue / conversation related to the license check but without success).

But changing it on the ES side sounds even better. I wonder what is the rationale behind requiring cluster:monitor to read /?

@jen-huang
Copy link

jen-huang commented Oct 5, 2021

@joegallo pointed me to setting cluster:monitor/main privilege instead of cluster:monitor on the API key, this should reduce the amount of permissions we grant just to retrieve the UUID. @elastic/es-security can someone on the team confirm that cluster:monitor/main only grants permissions to /? Thanks in advance :D

@here Assuming cluster:monitor/main is acceptable, where does this change need to be made? In the agent policy that Fleet sends, Fleet Server, Elastic Agent, or in Beats?

@jlind23 jlind23 added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Oct 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@tvernum
Copy link

tvernum commented Oct 6, 2021

can someone on the team confirm that cluster:monitor/main only grants permissions to /?

At this point in time, that is true. It is likely to always be true, but we don't provide any commitments about the mapping from Rest APIs to specific action names, so it's possible something could change without warning in the future.

The difficulty is that this approach will expose an internal ES implementation detail (the name of the action that handles /) and embeds it into Fleet API Keys. What happens if the ES implementation needs to change? Is there some way we can regenerate the API Keys to use different privilege?

@ruflin
Copy link
Member

ruflin commented Oct 11, 2021

@scunningham Pulling you in here too as it is about increasing privileges.
@tvernum The API Keys are regenerated when the policy changes. It would be nice if we have a feature that also on upgrade policies can be regenerated if the permissions changes. This would also be helpful in other cases.

@jen-huang Regarding "as the user's configured ES host may not be one that Fleet/Kibana can reach". Why is this the case? I would assume either Fleet or Fleet-server need to have credentials to access the monitoring host to create the relevant API Keys, so it should be reachable.

I understand that for historical reasons, we added the cluster uuid. But it seems odd that something monitoring itself outside Elasticsearch needs to add this as part of the monitoring doc and make a request to Elasticsearch first. And to work around this, we increase permission for all Elastic Agents which we shouldn't. What are the alternatives here?

  • Add it during ingestion, for example through an ingest pipeline
  • Don't require it anymore in the first place in the UI
  • Add it through the policy

Short term from my perspective the option would be to add it through the policy and figure out the long term plan in parallel. I assume kibana_system has the permissions to get the cluster uuid as it needs to get it for its own usage. Increasing Elastic Agent permissions seems to go in the wrong direction.

@stuartnelson3
Copy link
Contributor Author

@ruflin we're looking at implementing a similar solution to what was introduced by @axw, but extending it to include cluster support. Stack monitoring is essential for going GA and I'm concerned that continuing to discuss alternative solutions is going to jeopardize that.

@jen-huang
Copy link

jen-huang commented Oct 11, 2021

@jen-huang Regarding "as the user's configured ES host may not be one that Fleet/Kibana can reach". Why is this the case? I would assume either Fleet or Fleet-server need to have credentials to access the monitoring host to create the relevant API Keys, so it should be reachable.

I was talking specially about Fleet/Kibana, not Fleet Server. Kibana makes no attempt to reach any output host, so it cannot inject the UUID in the policy directly.

@ruflin we're looking at implementing a similar solution to what was introduced by @axw, but extending it to include cluster support.

@ruflin Any objections to proceeding with this proposal?

@tvernum
Copy link

tvernum commented Oct 12, 2021

@tvernum The API Keys are regenerated when the policy changes. It would be nice if we have a feature that also on upgrade policies can be regenerated if the permissions changes. This would also be helpful in other cases.

I think that's enough. It means if we ever find ourselves needing to change something about that action (which isn't likely) we know we can put together a plan that would include making sure all the API keys were regenerated along with a compatibility period where the old action name continued to be supported.

Assuming there's testing on the Observabilty side to ensure that these API keys keep working with new ES releases (which I'm sure there will be), I think it's safe enough for you to add cluster:monitor/main to APM server's API key (by adding it to all API keys, or on a case-by-case basis - the latter is better). If it ever looks like we broke it, then you'll need to shout at us and we can fix it.

@ruflin
Copy link
Member

ruflin commented Oct 12, 2021

Going forward with the package-spec proposal works for me. It means, the permissions are only given to Elastic Agent which run apm-server and in general I would argue it can be assumed apm-server run in a trusted environment. It also means at any time if we have a better solution all we need to do is update the apm package.

Going with this short term solution should not hold us back to find a proper solution long term that the cluster uuid is not used anymore in this context or retrieved differently. @jasonrhodes This likely falls into your area?

@jlind23 jlind23 transferred this issue from elastic/beats Mar 7, 2022
@simitt
Copy link

simitt commented Mar 7, 2022

Stack monitoring on managed APM Server is working with the changes mentioned above.

I believe this issue should now be scoped to the last bit that @ruflin was mentioning in #145 (comment); or close this issue and create a new one with the open part.

@ruflin
Copy link
Member

ruflin commented Mar 8, 2022

Lets close this issue. @jasonrhodes Can you take the lead on opening a new issue around the usage of cluster_uuid ?

@ruflin ruflin closed this as completed Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

8 participants