-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spire-agent: re-attest without restarting #4991
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sorindumitru for this contribution!
pkg/agent/svid/rotator.go
Outdated
return fmt.Errorf("unexpected value type: %T", r.state.Value()) | ||
} | ||
|
||
if state.Reattestable && !r.c.DisableReattestToRenew { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably separate this in two different conditions and return separate errors so it's easier to understand and debug what happened.
pkg/agent/svid/rotator.go
Outdated
if state.Reattestable && !r.c.DisableReattestToRenew { | ||
err = r.reattest(ctx) | ||
} else { | ||
return fmt.Errorf("attestation method is not re-attestable or re-attestation disabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No formatting is needed here.
pkg/agent/manager/manager.go
Outdated
m.deleteSVID() | ||
return err | ||
} | ||
goto restart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a goto, can you add a for
before util.RunTasks
?
for {
err := util.RunTasks(ctx,
...
}
pkg/agent/manager/manager.go
Outdated
@@ -204,9 +205,14 @@ func (m *manager) Run(ctx context.Context) error { | |||
m.c.Log.Info("Cache manager stopped") | |||
return nil | |||
case nodeutil.ShouldAgentReattest(err): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: there is no unit test to verify this (the original ShouldAgentReattest),
so it is hard for me to ask you to add a unit test here...
in any case it is not a blocker but if you can add a unit test it will be great, if not we can add that later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can do something about this. The test doesn't use an attestor yet, is they some mock attestor available that can be used?
@@ -137,6 +138,29 @@ func (r *rotator) SetRotationFinishedHook(f func()) { | |||
r.rotationFinishedHook = f | |||
} | |||
|
|||
func (r *rotator) Reattest(ctx context.Context) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add unit tests for this new function?
if !r.c.DisableReattestToRenew { | ||
err = r.reattest(ctx) | ||
} else { | ||
return errors.New("re-attestation is disabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this error, since a user must DIsableReattesttion to get into this case...
Maybe it worth to have a single if?
if state.Reattestable && !r.c.DisableReattestToRenew {
and if that is the case this code is pretty much the same that is in rotateSVIDIfNeeded
and we can expose that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what i had before and @amartinezfayo asked me to change. Either way is fine with me.
rotateSVIDIfNeeded
can't be used directly because that only rotates if the SVID is expired.
@@ -1,21 +0,0 @@ | |||
#!/bin/bash | |||
|
|||
log-debug "deleting agent..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This IT is exercise:
- evict agent
- start again
- evict again
So our current test case is to evict and verify that agent was re-atested, but agent never restarted, and keep alive with without timeout,
may we update this IT to verify that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some checks that the agents was able to re-attest. Note that I've had to make a bunch of changes in this test because it was unreliable on my machine. E.g. a server was brought up and then the test wouldn't wait for it to be available before trying to use it.
check-attested-agents | ||
|
||
# spire-agent-a will re-attest but spire-agent-b won't because join_token implements trust on first use model. | ||
AGENT_A_SPIFFE_ID_PATH="/spire/agent/x509pop/$(fingerprint conf/agent/agent.crt.pem)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this PATH at the start and use it to create AGENT_A_SPIFFE_ID?
@@ -78,16 +78,6 @@ docker-compose exec -T spire-server \ | |||
docker-compose exec -T spire-server \ | |||
/opt/spire/bin/spire-server agent evict -spiffeID "$agentID1" | grep "Agent evicted successfully" || fail-now "failed to evict agent 1" | |||
|
|||
# Verify agent list after evict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: if I follow this... this is not longer required because attested agent was using a reattestable attestor,
and now we evict but agent recovers itself,
however, may we add a test case with a no reattestable node? so we can verify that agent was really reattested?
or catch some logs to verify we had a successful reattestation?
When an agent is evicted it can re-attest to reconnect to spire-server but it currently needs to restart to do that. To avoid unavailability periods, which can lead to latency in applications, reattest in process Signed-off-by: Sorin Dumitru <sdumitru@bloomberg.net>
Signed-off-by: Sorin Dumitru <sdumitru@bloomberg.net>
Signed-off-by: Sorin Dumitru <sdumitru@bloomberg.net>
Signed-off-by: Sorin Dumitru <sdumitru@bloomberg.net>
they didn't seem to need to be removed Signed-off-by: Sorin Dumitru <sdumitru@bloomberg.net>
Pull Request check list
Affected functionality
spire-agent behaviour in case of eviction or expired agent SVID
Description of change
When an agent is evicted it can re-attest to reconnect to spire-server but it currently needs to restart to do that. To avoid unavailability periods, which can lead to noticeable latency in workloads, reattest in process.