Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Create task that periodically unenrolls inactive agents #189861

Merged
merged 23 commits into from
Aug 19, 2024

Conversation

criamico
Copy link
Contributor

@criamico criamico commented Aug 5, 2024

Closes #179399

Summary

Create a new periodic task that unenrolls inactive agents based on unenroll_timeout set on agent policies

In the agent policy settings there is now a new section:

Screenshot 2024-08-06 at 12 31 37

Testing

  • Create a policy with unenroll_timeout set to any value
  • Enroll many agents to a policy and make them inactive - you can use Horde or the script in `fleet/scripts/create_agents' that can directly create inactive agents
  • Leave the local env running for at least 10 minutes
  • You should see logs that indicate that the task ran successfully and remove the inactive agents
    Screenshot 2024-08-06 at 12 14 13
    Note that the executed unenroll action is also visible in the UI:
    Screenshot 2024-08-06 at 12 19 52
  • If there are no agent policies with unenroll_timeout set or there are no inactive agents on those policies, you should see logs like these:
    Screenshot 2024-08-06 at 12 13 49

Checklist

@criamico criamico self-assigned this Aug 5, 2024
@obltmachine
Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@criamico criamico changed the title 179399 inactive unenrollment timeout [Fleet] Create task that periodically unenrolls inactive agents Aug 5, 2024
@criamico criamico added Team:Fleet Team label for Observability Data Collection Fleet team release_note:enhancement v8.16.0 labels Aug 5, 2024
@criamico
Copy link
Contributor Author

criamico commented Aug 5, 2024

/ci

@criamico
Copy link
Contributor Author

criamico commented Aug 5, 2024

The PR is not 100% complete but I'm opening it for early review to get some feedback on the approach I used.

The agent policy query fetches max 500 policies and the agents query max 1000 to avoid scale issues; for this reason the task runs every 10 min. If there are no policies with unenroll_timeout set or there are no inactive agents on the policies with unenroll_timeout, the task exits early.

@criamico criamico marked this pull request as ready for review August 5, 2024 15:22
@criamico criamico requested a review from a team as a code owner August 5, 2024 15:22
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@nchaulet nchaulet requested review from nchaulet and removed request for nchaulet August 5, 2024 18:11
@criamico criamico requested a review from a team as a code owner August 6, 2024 10:39
@criamico criamico requested a review from juliaElastic August 8, 2024 12:15
@kpollich
Copy link
Member

kpollich commented Aug 8, 2024

I tested locally and left the env running for a while, sometimes I was reassigning inactive agents to a policy with unenrollment timeout set and I could see the agents unenrolling every 10 minutes: Screenshot 2024-08-08 at 11 30 29

Could we somehow flag these actions as coming from the task somehow so the user knows the "origin" of the unenrollment? It'd be great if we could say "17 inactive agents were automatically unenrolled" or something along those lines.

@criamico
Copy link
Contributor Author

criamico commented Aug 8, 2024

Could we somehow flag these actions as coming from the task somehow so the user knows the "origin" of the unenrollment?

@kpollich I'll see if it's possible to distinguish those actions from the ones performed from the user. I don't know if there's a way to mark the actions differently.

@criamico
Copy link
Contributor Author

criamico commented Aug 8, 2024

@kpollich in 6758e8c I added a prefix to the actionId so those unenrollment actions are marked differently.

This is how it appears in the UI:
Screenshot 2024-08-08 at 15 25 31

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we should probably create an ingest-docs request to document this feature, stating that the inactive agents are going to be unenrolled at a rate of 1k/10m.

@kpollich
Copy link
Member

kpollich commented Aug 8, 2024

Thanks Cristina that UI change looks great!

@criamico
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@ersin-erdal ersin-erdal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResponseOps changes LGTM

@criamico
Copy link
Contributor Author

@elasticmachine merge upstream

@criamico criamico enabled auto-merge (squash) August 19, 2024 10:52
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 1.8MB 1.8MB +14.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @criamico

@criamico criamico merged commit 1565753 into elastic:main Aug 19, 2024
20 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Aug 19, 2024
@criamico criamico deleted the 179399_inactive_unenrollment_timeout branch August 19, 2024 13:14
@criamico criamico added the QA:Needs Validation Issue needs to be validated by QA label Aug 21, 2024
juliaElastic pushed a commit that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting QA:Needs Validation Issue needs to be validated by QA release_note:enhancement Team:Fleet Team label for Observability Data Collection Fleet team v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Agent Policy should have an option to automatically unenroll INACTIVE agents
9 participants