Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate rotation for long running clients #474

Open
jean-airoldie opened this issue Oct 14, 2019 · 7 comments
Open

Certificate rotation for long running clients #474

jean-airoldie opened this issue Oct 14, 2019 · 7 comments

Comments

@jean-airoldie
Copy link
Contributor

Currently the server's certificate can only be set once at the creation of the server (AFAIK). This means that certificate rotation is basically impossible for long running servers without restarting the socket. This might be unacceptable in use cases where certificates are rotated quickly (few hours).

An alternative solution would be hitless rotation. The strategy involves periodically updating the server's certificate so that new connections use that new certificate and old connections continue using the old certificate.

Do you think that updating a Endpoint's rustls::ServerSession over time would achieve this effect?

@djc
Copy link
Member

djc commented Oct 14, 2019

It would be the ServerConfig rather than the ServerSession, but basically, yes (this thing here: https://github.com/djc/quinn/blob/master/quinn-proto/src/shared.rs#L260). This would be a cool feature to have!

@jean-airoldie
Copy link
Contributor Author

jean-airoldie commented Oct 14, 2019

Cool. Do you think exposing the proto::Endpoint::server_config via quinn::Endpoint::set_server_config would make sense?

https://github.com/djc/quinn/blob/89cd5d06e5b59b2b8774dfa5a1508e572b149f78/quinn-proto/src/endpoint.rs#L55

edit: Nevermind the Endpoint is already behind a mutex, so the RwLock would not be needed.

@djc
Copy link
Member

djc commented Oct 14, 2019

I would investigate a bit what can be done with Arc itself (around the wider ServerConfig, not sure if that's a problem?), I think it has some facilities in this direction. But if that doesn't work out, a RwLock could definitely make sense. Exposing Endpoint::set_server_config() seems like the right approach, though beware the naming around ServerConfig vsCryptoSession::ServerConfig.

(In general it would probably be cool to update the wider ServerConfig and not just the crypto aspects of it, though.)

@jean-airoldie
Copy link
Contributor Author

Alright I'll take a more in dept look once I get the time.

@Ralith
Copy link
Collaborator

Ralith commented Oct 14, 2019

Arc::make_mut is relevant.

A more general solution is to place a routing service in front of your application that can be used to direct new connections to a new instance of your application. Highly-available services will need something like this regardless to support graceful upgrades. QUIC is designed to support this case gracefully by allowing data (e.g. a phase bit) to be encrypted into the local connection ID to coordinate with external routing systems. Quinn does not currently provide any way to do this, but it's something we'll want to explore eventually. I don't believe anybody's working on standardizing encoding of information into connection IDs, so it may take substantial effort (e.g. a custom-written load balancer) to deploy in practice.

@jean-airoldie
Copy link
Contributor Author

Arc::make_mut is relevant.

Cool, wasn't aware of this.

A more general solution is to place a routing service in front of your application that can be used to direct new connections to a new instance of your application. Highly-available services will need something like this regardless to support graceful upgrades.

I think this makes sense for updates that require application restart (OS updates or application update), but for certificates I'm not sold. If you run your own internal certificate authority and you emit short-lived certificates that you rotate often this would be a headache. But for web facing certificates which are usually long-lived that wouldn't be an issue.

Moreover for gracefull updates of long running applications you still need to migrate your connected peers to the new server. For instance you would have to emit a notification telling the peer to connect to the new connection, maybe finish processing the pending request and then gracefully shutdown.

QUIC is designed to support this case gracefully by allowing data (e.g. a phase bit) to be encrypted into the local connection ID to coordinate with external routing systems.

I'm afraid I'm not following. How would this phase bit be of use?

@Ralith
Copy link
Collaborator

Ralith commented Oct 15, 2019

for certificates I'm not sold

Yeah, it's a big hammer. I'm not opposed to being able to update things live. Be aware that there are subtle 0-RTT correctness implications to changing some configuration parameters live; replacing the crypto configuration is probably fine (though we should make sure), but allowing modification of the TransportConfig will require taking care to reject 0-RTT data associated with an incompatible configuration.

I'm afraid I'm not following. How would this phase bit be of use?

To allow a stateless packet router to direct packets to the instance associated with their connection. Simple stateful routing based on storing connection IDs does not work because connections may change ID unpredictably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants