Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster certificate authority rotation, implements #1860 #1899

Merged
merged 1 commit into from
May 1, 2018

Conversation

klizhentas
Copy link
Contributor

@klizhentas klizhentas commented Apr 24, 2018

This PR implements #1860

    Teleport cert authority rotation.
    
    During the the rotation procedure issuing TLS and SSH
    certificate authorities are re-generated and all internal
    components of the cluster re-register with the new
    credentials.
    
    The rotation procedure is based on the distributed
    state machine algorithm - certificate authorities have
    explicit rotation state and all parts of the cluster sync
    local state machines following transitions between phases.
    
    Operator can launch CA rotation in auto or manual modes.
    
    In manual mode operator moves bewtween rotation states
    and watches the states of the components to sync.
    
    In auto mode state transitions are happening automatically
    on a specific schedule.
    
    The design documentation is embedded in the code:
    
    lib/auth/rotate.go

Teleport.e counterpart:

https://github.com/gravitational/teleport.e/pull/69

@klizhentas klizhentas force-pushed the sasha/rotate branch 2 times, most recently from 8178633 to 2dba9c1 Compare April 28, 2018 18:24
@klizhentas klizhentas changed the title WIP: Start work on rotation, updates #1860 Cluster certificate authority rotation, implements #1860 Apr 28, 2018
@klizhentas klizhentas force-pushed the sasha/rotate branch 2 times, most recently from 83c9494 to 58f9410 Compare April 28, 2018 18:52
@klizhentas
Copy link
Contributor Author

retest this please

@klizhentas klizhentas force-pushed the sasha/rotate branch 4 times, most recently from e728afe to cda402a Compare April 29, 2018 21:48
@klizhentas klizhentas requested review from russjones and r0mant and removed request for russjones April 29, 2018 21:49
@klizhentas klizhentas added this to the 2.6.0 "Austin" milestone Apr 29, 2018
@klizhentas
Copy link
Contributor Author

retest this please


func (a *AuthServer) autoRotate(ca services.CertAuthority) error {
rotation := ca.GetRotation()
// rotation mode is not manual, nothing to do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is not automatic"?

default:
return nil, trace.BadParameter("unsupported phase: %q", req.targetPhase)
}
return nil, trace.BadParameter("internal error")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very descriptive error. Moreover, it will never be reached, will it? You can instead get rid of the "default" clause and return "unsupported phase" error here.


// generate keys and certificates:
if len(req.privateKey) != 0 {
log.Infof("Generating CA in test mode.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this say "in test mode"? Is private key provided only in tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, updated

} else {
// Rotation sets the first key to be the new key
// and keep only public keys/certs for the new CA.
signingKeys = [][]byte{sshPrivPEM, signingKeys[0]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible panics here and in the following 2 lines since the slices' lengths are not checked above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the checks are performed one level up, to make sure these functions are called from certain states, so if there are not enough element in the list, the data is corrupted, so panic will be appropriate.

keyPairs = []services.TLSKeyPair{*tlsKeyPair}
// In case of forced rotation, rotation has been started and completed
// in the same step moving it to standby state.
rotation.State = services.RotationStateStandby
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "phase" or "state"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing, but will update for consistency.

// in case if value did not match
func (b *BoltBackend) CompareAndSwapVal(bucket []string, key string, newData []byte, prevData []byte, ttl time.Duration) error {
if len(prevData) == 0 {
return trace.BadParameter("missing prevVal parameter, to atomically create item, use CreateVal method")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"missing prevData parameter"

@@ -141,6 +142,51 @@ func (b *BoltBackend) CreateVal(bucket []string, key string, val []byte, ttl tim
return trace.Wrap(err)
}

// CompareAndSwapVal compares and swap values in atomic operation,
// succeeds if prevVal matches the value stored in the databases,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prevVal -> prevData (or rename function parameter)

return trace.ConvertSystemError(err)
}
if bytes.Compare(oldVal, prevVal) != 0 {
return trace.CompareFailed("%v/%v did not match expected value", dirPath, key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add "expected value" to the message as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the expected value could be a large JSON object, so it will make a very hard to read error message.

// GenerateSchedule generates schedule based on the time period, using
// even time periods between rotation phases.
func GenerateSchedule(clock clockwork.Clock, gracePeriod time.Duration) (*RotationSchedule, error) {
if gracePeriod == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for < 0 too?

@@ -832,7 +953,7 @@ func startAndWait(process *service.TeleportProcess, expectedEvents []string) ([]
// register to listen for all ready events on the broadcast channel
broadcastCh := make(chan service.Event)
for _, eventName := range expectedEvents {
process.WaitForEvent(eventName, broadcastCh, make(chan struct{}))
process.WaitForEvent(context.TODO(), eventName, broadcastCh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for the caller to pass context to "startAndWait?

@klizhentas klizhentas force-pushed the sasha/rotate branch 2 times, most recently from 081d1ff to 64a4317 Compare April 30, 2018 19:58
This commit implements #1860

During the the rotation procedure issuing TLS and SSH
certificate authorities are re-generated and all internal
components of the cluster re-register to get new
credentials.

The rotation procedure is based on a distributed
state machine algorithm - certificate authorities have
explicit rotation state and all parts of the cluster sync
local state machines by following transitions between phases.

Operator can launch CA rotation in auto or manual modes.

In manual mode operator moves cluster bewtween rotation states
and watches the states of the components to sync.

In auto mode state transitions are happening automatically
on a specified schedule.

The design documentation is embedded in the code:

lib/auth/rotate.go
@klizhentas
Copy link
Contributor Author

retest this please

@klizhentas klizhentas merged commit cb95a65 into master May 1, 2018
@klizhentas klizhentas deleted the sasha/rotate branch May 1, 2018 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants