Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Newly created Enrollment Tokens are not working to fully deploy an Agent (no Beats come on line) #81214

Closed
EricDavisX opened this issue Oct 20, 2020 · 8 comments · Fixed by #81236
Assignees
Labels
bug Fixes for quality problems that affect the customer experience failed-test A test failure on a tracked branch, potentially flaky-test impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. regression Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@EricDavisX
Copy link
Contributor

EricDavisX commented Oct 20, 2020

Kibana version:
7.10 BC2 as deployed via Cloud on cloud-staging env

Browser version:
macOS Chrome

Describe the bug:
when using the default policy, new enrollment tokens can be used, but when using newly created policies, new enrollment tokens do not successfully deploy Agent. They do not result in an Agent with the expected Beats (it does come 'online' and look to be doing something... but it doesn't do much). Agent shows a 'policy config change' item in the Activity Log, but then... nothing else at all, ever.

Pre-reqs:

  1. install 7.10 Kibana / ES, via cloud or Docker or otherwise.
  2. set up fleet user, etc - by clicking the button on 'Agents' tab

Steps to reproduce:

  1. create a new Agent policy on the Agent Policies tab, defaults will do fine.
  2. Create a new enrollment token back on the Agents tab, on the 'Enrollment Tokens' sub-tab. Create it associated with that newly created Agent Policy
  3. install a linux or mac Agent (tested with both) of 7.10 version, with the install command, just as seen in the UI when you select that policy (note that the new enrollment token is used by default)
  4. see it come on-line... but none of the Beats

NOTE our e2e automated test finds this, and is using linux Agents on various versions.

Expected behavior:
The new enrollment token would work just like others to successfully start up beats on Agent deploys

Screenshots (if relevant):
Screen Shot 2020-10-20 at 1 14 20 PM

Errors in browser console (if relevant):
none in browser console or on UI

Provide logs and/or server output (if relevant):

Any additional context:
somewhat relating to #81041

we have tests for this on the Kibana side:
https://github.com/elastic/kibana/blob/4a160bff8013146280c22a022ce6d8e2a4aea842/x-pack/test/ingest_manager_api_integration/apis/fleet/enrollment_api_keys/crud.ts

  • not sure whats lacking in the test or if its on the Beats side maybe?

this is also being reported in the e2e-testing repository in a number of failed cases, hence the 'failed-test' label

I can do some research to see if it worked in 7.9 to help narrow when it began, and i'll post API calls to ensure usage as well as Agent logs.

@EricDavisX EricDavisX added bug Fixes for quality problems that affect the customer experience failed-test A test failure on a tracked branch, potentially flaky-test impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team:Fleet Team label for Observability Data Collection Fleet team labels Oct 20, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-test-triage (failed-test)

@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@EricDavisX
Copy link
Contributor Author

@mdelapenya thank you for looking at this with @michalpristas and raising it.
I reproduced this manually, as you did using the e2e-testing framework - but I did it against a 7.10 BC 2 build. So its unrelated to the framework, I don't know if this will be an ACL type problem wrt to the new enrollment token as they are stored in the stack or how they are accessed by Agent or Beats, I'm unsure. @blakerouse any thoughts? @ph fyi.

@EricDavisX
Copy link
Contributor Author

a 'ps ax' call on the test mac shows nothing is running except core Agent process:
Screen Shot 2020-10-20 at 1 36 56 PM

Agent logs
darwin-new-enroll-token-agent-logs.log

@EricDavisX
Copy link
Contributor Author

api call used for new policy, POST to /api/fleet/agent_policies?sys_monitoring=true
with body:
{"name":"test2","description":"tst","namespace":"default","monitoring_enabled":["logs","metrics"]}

with response including the newly created policy ID:
id: "43ca9710-12f5-11eb-beb1-ff06f359ee75"

api call used to create new token, POST to /api/fleet/enrollment-api-keys
with body: {"name":"test2","policy_id":"43ca9710-12f5-11eb-beb1-ff06f359ee75"}

command line used for install:
sudo ./elastic-agent install -f --kibana-url=https://020651ce4a43485f9d6830b3538f9a5f.europe-west1.gcp.cloud.es.io:443 --enrollment-token=VWFieVJuVUJzYjA2SThDUWN1Rk46aFZIMThXcHJScWk2QS04ZDVLabc123==
NOTE token above is modified to avoid posting specifics. but I verified the token used is the same as what shows as the secret in Kibana enrollment tokens UI>

I don't know how to check for ACL or permissions problems on the Agent side when its using the token and starting up Beats. Anyone help confirm that? Because the default policy can have new enrollment tokens created against it and THOSE seem to work, it seems to feel ACL related to me.

@EricDavisX
Copy link
Contributor Author

looks like it was working with the old 'enroll' command in 7.9,
screenshot
Screen Shot 2020-10-20 at 2 09 59 PM

And the same failure seems to show in 7.10 when I use the old 'enroll' commands so it would seem unrelated to the new 'install' subcommand. my research here is done unless the team needs more to help track it.

@nchaulet
Copy link
Member

There is a bug on Kibana side the AgentAction Policy change (that is the config build for the agent) is generated before the system package is added to the config, I am working on a fix.

@EricDavisX
Copy link
Contributor Author

we can re-test this in the e2e-testing framework as well as manually when the next builds come out (may not be in 7.10 BC3, may need to wait for BC4, tbd)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience failed-test A test failure on a tracked branch, potentially flaky-test impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. regression Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants