Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Investigate side-effect of a space deletion #184864

Closed
nchaulet opened this issue Jun 5, 2024 · 22 comments
Closed

[Fleet] Investigate side-effect of a space deletion #184864

nchaulet opened this issue Jun 5, 2024 · 22 comments
Assignees
Labels
Team:Elastic-Agent-Control-Plane Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@nchaulet
Copy link
Member

nchaulet commented Jun 5, 2024

Description

as Fleet moving to be space-aware, some entities (agents, agent policies, uninstall token) will become space aware, we should investigate the effect of a space deletion, and the potential way to recover from it if there is an issue.

@elastic/kibana-security is there any hook available to react on a space deletion? maybe to clean things or to prevent if there enrolled fleet-agents

@nchaulet nchaulet added Team:Fleet Team label for Observability Data Collection Fleet team Team:Elastic-Agent-Control-Plane labels Jun 5, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@legrego
Copy link
Member

legrego commented Jun 5, 2024

@elastic/kibana-security is there any hook available to react on a space deletion? maybe to clean things or to prevent if there enrolled fleet-agents

We do not expose a hook today, but we can explore adding one (or something like it) if you can provide a set of detailed requirements.

For reference, the logic to delete a space is defined here:

public async delete(id: string) {
const existingSavedObject = await this.repository.get('space', id);
if (isReservedSpace(this.transformSavedObjectToSpace(existingSavedObject))) {
throw Boom.badRequest(`The ${id} space cannot be deleted because it is reserved.`);
}
await this.repository.deleteByNamespace(id);
await this.repository.delete('space', id);
}

This delegates to the deleteByNamespace function of the Saved Objects repository, which deletes saved objects belonging to the space, or "unshares" objects from the space if an object exists in multiple spaces:
https://github.com/elastic/kibana/blob/c89ee65c7034ba26006e2d426156a6de11b3505f/packages/core/saved-objects/core-saved-objects-api-server-internal/src/lib/apis/delete_by_namespace.ts#L25-L83

@nchaulet
Copy link
Member Author

nchaulet commented Jun 6, 2024

Thanks @legrego the issue for us is we are introducing space to non saved object documents, and those document will become orphans if the space is deleted.

@nimarezainia what will be the ideal behavior here? a way to block space deletion when we have active agents in that space? some migration to the default space?

@nimarezainia
Copy link
Contributor

@legrego would you know what happens to other kibana assets in a space when that space is deleted? is there a warning of any sorts?

@nchaulet I don't know if we should make a decision on user's behalf in this regard (as in moving all to default space) Ideally we can detect that an agent policy is associated with the space being deleted and block the space deletion until all agent policies are moved out of the space or deleted. I think the admin who has the right access to delete the space then could make a decision on what should happen to the agent policies. Presumably this persona has a higher level of access.

@legrego
Copy link
Member

legrego commented Jun 7, 2024

would you know what happens to other kibana assets in a space when that space is deleted? is there a warning of any sorts?

All saved objects within the space are deleted, or removed from the space. Any other assets are left untouched. We show a warning when deleting a space that all saved objects will be removed.

@cmacknz
Copy link
Member

cmacknz commented Jun 7, 2024

Ideally we can detect that an agent policy is associated with the space being deleted and block the space deletion until all agent policies are moved out of the space or deleted. I think the admin who has the right access to delete the space then could make a decision on what should happen to the agent policies.

+1 this seems like the best way to deal with this, but reading the prior discussion I don't think there is a way to implement this today.

The core problem is there are Elastic Agents that continue to exist outside of a deleted space that become unmanageable or in the case of Defend potentially uninstallable if the uninstall token was deleted along with the space (CC @ferullo).

@nimarezainia
Copy link
Contributor

@legrego Looks like ideally we would need a hook in that space deletion path. Perhaps a way for other users (such as Fleet) to register their dependency on Spaces. Also the deletion to be halted if any of the registered functions indicate it shouldn't be deleted. What would you need from us on this to move forward? I'd imagine this affects almost everyone who has Space dependency.

@kpollich @nchaulet this is probably a blocker for our project. What do you think?

@legrego
Copy link
Member

legrego commented Jun 11, 2024

Looks like ideally we would need a hook in that space deletion path. Perhaps a way for other users (such as Fleet) to register their dependency on Spaces.

Is this solely in support of the Also the deletion... clause below, or is there other functionality that you need this registration to support?

Also the deletion to be halted if any of the registered functions indicate it shouldn't be deleted.

Preventing space deletion is an aggressive measure and isn't something I can agree to without broader consideration (cc @rayafratkina @mwtyang @azasypkin @lukeelmers). I see benefit to warning users if Fleet indicates that other assets are impacted/degraded by the operation, but I'm not yet sold on preventing deletion.

we are introducing space to non saved object documents, and those document will become orphans if the space is deleted.

Is there a list of these non-SO assets that we can see to help guide our decision making? It would be helpful to understand:

  1. How these assets are created
  2. Who/what creates these assets
  3. What privileges are required to CRUD these assets
  4. Where these assets reside (e.g if someting is stored in a Fleet system index, Kibana system index, or is an implementation detail of ES, etc.)

@nchaulet
Copy link
Member Author

Is there a list of these non-SO assets that we can see to help guide our decision making? It would be helpful to understand:

Sure I can provide this

  • .fleet-enrollment-tokens the enrollment token for an agent policy created by a user from Kibana Fleet:Agents:All privileges to access
  • .fleet-policies Created by a user from Kibana not readable from Kibana
  • .fleet-agents the record for an agent policy created by fleet-server, readable from the UI with Fleet:Agents:Read privileges
  • .fleet-actions .fleet-actions-results created by a user from Kibana and from fleet-server readable from the UI with Fleet:Agents:Read

@cmacknz
Copy link
Member

cmacknz commented Jun 18, 2024

Fleet is a remote management UI. The biggest non-shared objects I am concerned about are Elastic Agents, which live completely outside of the stack.

Deleting a space and deleting the internal state without going through the intended UX for un-managing or uninstalling an agent will not work well and users are unlikely to understand the consequences of it.

We would not intentionally build a button into Fleet's UI that mass deletes Fleet's internal state with no warning or protection for the user and we are worried with space awareness we have unintentionally created that via deleting a space and want to eliminate it.

@nchaulet
Copy link
Member Author

nchaulet commented Jul 4, 2024

@legrego it is blocking space deletion if a user have enrolled agent in that space something we can envisage? It will really solve our usecase and avoid user being in a unsolvable situation.

@cmacknz @kpollich Thinking loud here otherwise we could probably come with a hacky solution, as the problematic saved object here is the uninstall token, we could make that SO space agnostic and does the filtering based on space manually (not using the built in saved object space mechanism but using our own fields for that) This way if a space is deleted and recreated the user will have access to their active agents and uninstall tokens (they will loose their policies)

@legrego
Copy link
Member

legrego commented Jul 15, 2024

it is blocking space deletion if a user have enrolled agent in that space something we can envisage? It will really solve our usecase and avoid user being in a unsolvable situation.

Sorry for the delay. Based on what you've shared, this doesn't feel outside the realm of possibility. Let me discuss with the folks I pinged above, and we'll get back to you.

@legrego
Copy link
Member

legrego commented Jul 22, 2024

I discussed with @lukeelmers, @bitzandeb, and @rayafratkina today. We propose taking a progressive approach, rather than immediately move forward with blocking space deletion.

Could we instead start by showing a warning, which lists the enrolled agents that are impacted by this operation, and explain the consequences of deleting a space with enrolled agents? If we wanted to get fancy, we could also allow sufficiently authorized administrators to perform the unenrollment from this warning step.

If we learn that this warning is not sufficient for our users, then we could discuss other measures, such as blocking deletion.

@nimarezainia
Copy link
Contributor

Thanks @legrego. We can't really list all the agents in this manner (can be in the 10s of thousands) but could certainly give a summary snapshot (like total of X agents in Y many policies).

The user would have a choice to Proceed or Cancel with the warning given - correct? to me this is pretty much blocking the deletion albeit by the user and not us). I think we can pursue this as a first option. We should strongly urge the user to either delete or move agents to another policy before doing this
The concern however is that an unsuspecting user may just click "continue" (as we are all accustomed to do) and cause a lot of pain.

@cmacknz @kpollich @nchaulet WDYT?

@nchaulet
Copy link
Member Author

The concern however is that an unsuspecting user may just click "continue" (as we are all accustomed to do) and cause a lot of pain.

I think if we go this way, it may be interesting to make the unenrollment token non space aware, so we have a recovery scenario for SDHs

@cmacknz
Copy link
Member

cmacknz commented Jul 23, 2024

What consequences does keeping the unenrollment tokens global have? Doing that would eliminate the worst case scenario of users being unable to uninstall agents that can't be managed because the space was deleted.

@nchaulet if someone deletes a space, are the agent API keys still valid? If the agents keep checking in perhaps we can have some way to reassign them into a space that still exists, even if this is via an API call as a recovery mechanism for support in case this happens accidentally.

@nchaulet
Copy link
Member Author

@nchaulet if someone deletes a space, are the agent API keys still valid? If the agents keep checking in perhaps we can have some way to reassign them into a space that still exists, even if this is via an API call as a recovery mechanism for support in case this happens accidentally.

Yes api key will still be valid, and agent will be visible in the UI if the user create the space again, the policy will not be visible again as it's stored in saved object and will be deleted during the space deletion

@nchaulet
Copy link
Member Author

nchaulet commented Jul 23, 2024

What consequences does keeping the unenrollment tokens global have? Doing that would eliminate the worst case scenario of users being unable to uninstall agents that can't be managed because the space was deleted.

We will not be able to use the saved object built-in mechanism to filter per space and have to build our own (that could be doable as that saved object is used only in a few places), the saved object will not be deleted during space deletion so we could have a recovery scenario and recreate the space to access the uninstall token.

@nchaulet
Copy link
Member Author

nchaulet commented Aug 2, 2024

@kpollich @cmacknz I am in the process of moving to mutiple saved object, and I would like to move a little more the discussions of having unenrollment tokens global (and does the namespace filtering outside of the saved object framework).

Having global unenrollment token will enable a recovery scenario, if a user recreate a deleted space he will see previously created unenrollment token and enrolled agent, and have a way to unenroll them.

@kpollich
Copy link
Member

kpollich commented Aug 2, 2024

I'm in agreement that we should make unenrollment tokens global as a recovery, then applying filtering at the application level. Recreating the deleted space is a good thing to keep in mind, but honestly if the unenrollment tokens are fetchable via dev tools after the space has been deleted that will probably be good enough as a recovery mechanism.

@nimarezainia
Copy link
Contributor

but honestly if the unenrollment tokens are fetchable via dev tools after the space has been deleted that will probably be good enough as a recovery mechanism.

especially if we had ample warning of the consequences before the user deletes the space. We can certainly document the recovery aspects of this.

@nchaulet
Copy link
Member Author

nchaulet commented Sep 9, 2024

We can probably close this one for now, as part of #190741 uninstall tokens will not be deleted, so we will have a way to recover from a space deletion

@nchaulet nchaulet closed this as completed Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Control-Plane Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

6 participants