Key Rotation

DRAFT

SUMMARY

As of 0.5.14, each munged supports only a single cryptographic key. Transitioning to a new key may be desirable due to a security incident with a compromised host, or a routine part of a local security policy. When transitioning an administrative realm to a new key, there will be a point in time where some hosts have received the new key while others are still using the old key. munged should be capable of transitioning to a new key without having a downtime where the service is unavailable. Support is needed to allow authentication between hosts to continue during a time interval where some hosts have only the old key while others have both the old and new keys. During this time interval, both keys are valid; after it has elapsed, the old key will be invalidated.

This document proposes a mechanism to transition to a new key within an administrative realm spanning multiple hosts. A new key is generated on one host, distributed to all hosts within the administrative realm, and added to each local munged daemon asynchronously. During this time, credentials will be encoded with both keys, but can be decoded with either key. The underlying key rotation on individual hosts can thereby occur at different times within this time interval. After the new key has been distributed to all hosts and added to each munged daemon, the old key can be automatically removed after which credentials will be encoded with only the new key. See #19.

DESIGN

Rotate from the current key k0 to a new key k1 on host h0:

Set the expiration time of key k0 such that k0:valid-end = current time t + 1 day. This specifies the end of the time interval where both the old and new keys are valid.
Create key k1 such that k1:valid-start < k0:valid-end. k1:valid-start is presumed to be the current time, but it could also be a future time when the key would become valid; that would allow the time interval where both keys are valid to be very short.
Signal munged via mungekey to begin the key rotation.
Upon receipt of the signal, munged reloads its keyring.
munged identifies k0 and k1 as being valid at the current time t.
munged sets a timer to expire k0 at k0:valid-end, at which point k0 will be disabled.
munged sets a timer to expire k1 at k1:valid-end, at which point k1 will be disabled.
munged encodes new credentials during this time interval with both k0 and k1.
At time k0:valid-end, munged will disable k0; new credentials will then be encoded with only k1.

Transfer the entire keyring from host h0 to host h1:

Copy the keyring to host h1.
rsync /etc/munge/munge.keyring h1:/etc/munge/munge.keyring
Signal munged on h1 to reload its keyring.
ssh h1 mungekey --reload

Transfer k0 (with the updated expiration time) and k1 from host h0 to host h1:

Obtain the list of keys and their corresponding key IDs (kid).
mungekey --list
Export the old key k0 with its updated expiration time.
mungekey --export --id <k0:kid> --output munge.keyring.new
Export the new key k1.
mungekey --export --id <k1:kid> --output munge.keyring.new --append
Copy the new keys to host h1.
rsync munge.keyring.new h1:/etc/munge/munge.keyring.new
Import the new keys.
ssh h1 mungekey --import /etc/munge/munge.keyring.new
Signal munged on h1 to reload its keyring.
ssh h1 mungekey --reload

Decode a new credential having multiple data-encryption-key packets:

If the host has not updated its keyring, it will only have key k0. When decoding a new credential, the k0:data-encryption-key packet with k0 will successfully decode the credential. The k1:data-encryption-key packet is not needed. However, all data-encryption-key packets within the credential should be processed to mitigate potential timing attacks.
If the host has updated its keyring, it will have both k0 and k1. When decoding a new credential, the k0:data-encryption-key packet with k0 will successfully decode the credential. The k1:data-encryption-key packet is not needed. However, all data-encryption-key packets within the credential should be processed to mitigate potential timing attacks.
After k0:valid-end, k0 will be disabled. When decoding a new credential, the k0:data-encryption-key packet will fail to decode the credential, but the k1:data-encryption-key packet with k1 will succeed.

MITIGATIONS

systemd socket activation: Currently, munged reads the key at start-up, and then creates a thread work crew to process requests. Reloading the key without restarting the daemon requires pausing the work crew and/or protecting key access with a mutex, both of which may adversely affect transaction processing speed. Another option would be to restart the daemon, while minimizing the window between service shutdown and restart where requests could be dropped. On Linux, systemd socket activation [1, 2, 3] allows the service to be restarted while keeping around its socket, thereby ensuring no incoming requests will be dropped.

NOTES

The overall size of the credential will increase since data-encryption-key can no longer be implicitly derived from the message-authentication-code.
During the key rotation time interval where multiple keys are valid, the creation of an additional data-encryption-key packet will incur a small overhead in terms of encoding time, credential size, and decoding time. Consequently, the length of this interval should be kept as short as possible (but no shorter!) based on the time required to propagate the new key throughout the administrative realm.
The mechanism for mungekey to signal munged is yet to be determined. The simplest would be to send a SIGHUP to the appropriate munged process. The pid could be determined from the advisory lock on the appropriate socket. But if SIGHUP is already being used for something conventional like re-opening the logfile, a separate signaling mechanism may be required. This could be accomplished with the addition of a RELOAD_KEYRING command to the client/server protocol.
The addition of a protocol command would necessitate changes to libmunge to support having mungekey signal munged.
munged needs to handle the case where no keys on its keyring are currently valid. The simplest action to take would be to exit, but it should instead remain running so it can be signaled by mungekey again to reload its keyring, etc. During this time, requests to encode or decode a credential would return an error to the client.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly