Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spire-Agent is failing to rotate its SVID due to Missing KeyManager directory #5488

Open
Krannthi opened this issue Sep 10, 2024 · 5 comments
Assignees
Labels
priority/backlog Issue is approved and in the backlog

Comments

@Krannthi
Copy link

We noticed that due to missing KeyManager directory on the host, spire-agent is failing to rotate its keys.

To avoid this we could add a check in the disk keymanager 'writeEntries' method: https://github.com/spiffe/spire/blob/main/pkg/agent/plugin/keymanager/disk/disk.go#L108

@Krannthi Krannthi changed the title Spire-Agent is failing to rotate its SVID due to Missing disk KeyManager directory Spire-Agent is failing to rotate its SVID due to Missing KeyManager directory Sep 10, 2024
@amartinezfayo amartinezfayo self-assigned this Sep 12, 2024
@amartinezfayo
Copy link
Member

We can have the check in the Configure function, so we can fail the agent startup if the directory does not exist. This is really a pre-requisite for the plugin to work.

@amartinezfayo amartinezfayo added the priority/backlog Issue is approved and in the backlog label Sep 12, 2024
@Krannthi
Copy link
Author

Krannthi commented Sep 13, 2024

That's true but we saw the issue at runtime when the directory path gets deleted for some reason. Our agents were healthy but couldn't rotate key due to missing directory.

@Krannthi
Copy link
Author

Krannthi commented Sep 24, 2024

@amartinezfayo Do we want to address both checks in this issue.

  • Check directory exists in plugin Configure function, do we wanna fail in this case or just create ?
  • Create directory if doesn't exist on key writes

@zmt
Copy link
Contributor

zmt commented Oct 16, 2024

@amartinezfayo Do we want to address both checks in this issue.

  • Check directory exists in plugin Configure function, do we wanna fail in this case or just create ?
    I think we should attempt to create dir if missing and fail if unable to create the configured data dir during Configure. Reasoning about Validate API complicates it a bit:
  • would we want the same stat check and create attempt or fail on Validate?
    • might be unexpected that Validate could have a side effect on the file system
    • might be just as unexpected for Validate to succeed, but Configure fail b/c lack of perms to create and/or write to data dir
  • would we want to fail Validate if the dir is missing?
  • Create directory if doesn't exist on key writes
  • what if the dir exists, but perms changed or filesystem is read-only at write?
  • what if we can't create the dir?

I think the answer to these 2 questions is log error (also emit metric?) and crash agent. I would argue that is better than quietly failing to rotate keys, but it is a different trade-off. Maybe we do want to keep failure to rotate keys behavior, but make more noise about it?

@amartinezfayo
Copy link
Member

I personally don't think that the key manager should be creating directories. If the key manager is unable to rotate the key (due to an error writing the key to disk or other reason), I think that we should keep the current logic of retrying with backoff, and not just crash, because it could be recoverable.
I agree that emitting a metric will help to better observe what's happening. We can add that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Issue is approved and in the backlog
Projects
None yet
Development

No branches or pull requests

3 participants