Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNI-based cert selection in TLS transport socket #21739

Closed
LuyaoZhong opened this issue Jun 16, 2022 · 19 comments · Fixed by #22036
Closed

SNI-based cert selection in TLS transport socket #21739

LuyaoZhong opened this issue Jun 16, 2022 · 19 comments · Fixed by #22036
Labels

Comments

@LuyaoZhong
Copy link
Contributor

Title: SNI-based cert selection in TLS transport socket

Description:

Current Envoy selects cert by selecting filter chain based on SNI, it doesn’t support cert selection based on SNI inside one tls transport socket. But it is possible that we access different services via one filter chain, bumping is such a case since we will attach multiple mimic certs to one tls transport socket. Therefore,we need to implement SNI-based cert selection in transport socket.

cc @ggreenway @mattklein123

@LuyaoZhong LuyaoZhong added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Jun 16, 2022
@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jun 16, 2022

Continuing discussion from #1984 and #18928:
As you know,

message CommonTlsContext {
......
  // Only one of *tls_certificates*, *tls_certificate_sds_secret_configs*,
  // and *tls_certificate_provider_instance* may be used.
  repeated TlsCertificate tls_certificates = 2;

  repeated SdsSecretConfig tls_certificate_sds_secret_configs = 6
      [(validate.rules).repeated = {max_items: 2}];

  // [#not-implemented-hide:]
  CertificateProviderPluginInstance tls_certificate_provider_instance = 14;
......
}

If we get certificates via one of these three provider, we need to modify existing cert selection logic to support different SNI.
If anyone use custom handshaker to provide certificates, then custom handshaker should define the selection logic and set SSL_CTX_set_select_certificate_cb() .

So I propose to modify current cert selection logic directly, not sure if we need to introduce some flag to indicate we need do cert selection based on SNI in transport socket.

What do you think about it?

@mattklein123
Copy link
Member

I would definitely want to hear from @ggreenway here, but I think this comes up often enough that having an in-tree handshake provider that can do cert selection would I think be useful. I'm not sure if this would be implemented as a cert provider or something else.

@soulxu
Copy link
Member

soulxu commented Jun 16, 2022

It doesn't seems like the cert selection logic is part of handshaker

if (!config.capabilities().provides_certificates) {
SSL_CTX_set_select_certificate_cb(
tls_contexts_[0].ssl_ctx_.get(),
[](const SSL_CLIENT_HELLO* client_hello) -> ssl_select_cert_result_t {
return static_cast<ServerContextImpl*>(
SSL_CTX_get_app_data(SSL_get_SSL_CTX(client_hello->ssl)))
->selectTlsContext(client_hello);
});
}

Even if we have a custom handshaker for selection, but I guess that custom handshaker most of parts same with the default handshaker except the selection logic.

@mattklein123
Copy link
Member

Even if we have a custom handshaker for selection, but I guess that custom handshaker most of parts same with the default handshaker except the selection logic.

Yeah I don't know the right answer here without digging in myself. There will need to be either a new extension point or significant code sharing. I'm not sure the right solution, it will need to be investigated.

@soulxu
Copy link
Member

soulxu commented Jun 16, 2022

So I propose to modify current cert selection logic directly, not sure if we need to introduce some flag to indicate we need do cert selection based on SNI in transport socket.

If the only usecase of multiple certs is about support both RSA and ECDSA, then I think it is fine without a flag for me

And the selection of RSA and ECDSA should be keep with SNI selection behavior
https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/tls.proto#extensions-transport-sockets-tls-v3-commontlscontext

Only a single TLS certificate is supported in client contexts. In server contexts, the first RSA certificate is used for clients that only support RSA and the first ECDSA certificate is used for clients that support ECDSA.

But I don't know how custom validator use the multiple certs, at least there is link for it https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/security/ssl#arch-overview-ssl-cert-select

@ggreenway
Copy link
Contributor

In general, I'd be fine with adding support for multiple certs and selecting the correct one based on SNI. There are a few questions we'd need to sort out:

  • What happens if the SNI value doesn't match the names in any of the certs, or there is no SNI? Which cert do we choose?
  • What if certs have overlapping names? Is that a config-load error, or do we allow it and have some criteria for which cert to use?
  • What if there's a cert with the correct name but incorrect type (RSA vs EC), but not a complete match (name and type)?

@LuyaoZhong
Copy link
Contributor Author

And the selection of RSA and ECDSA should be keep with SNI selection behavior

sure

But I don't know how custom validator use the multiple certs, at least there is link for it https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/security/ssl#arch-overview-ssl-cert-select

custom validator does not have multiple certs issue, multiple certs in this feature is indentity certs used for TLS handshake, validator is to use a trusted ca to validate peer certificate.

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jun 17, 2022

In general, I'd be fine with adding support for multiple certs and selecting the correct one based on SNI. There are a few questions we'd need to sort out:

  • What happens if the SNI value doesn't match the names in any of the certs, or there is no SNI? Which cert do we choose?

If there is no SNI, we can fallback to only RSA/ECDSA selection, this will not break existing logic.
If no certs match the SNI, we should let the selection fail.

  • What if certs have overlapping names? Is that a config-load error, or do we allow it and have some criteria for which cert to use?

I think it should be a config-load error. It sounds weird that multiple certs have the same SNI.

  • What if there's a cert with the correct name but incorrect type (RSA vs EC), but not a complete match (name and type)?

ServerContextImpl::selectTlsContext(const SSL_CLIENT_HELLO* ssl_client_hello) {
const bool client_ecdsa_capable = isClientEcdsaCapable(ssl_client_hello);
const bool client_ocsp_capable = isClientOcspCapable(ssl_client_hello);
// Fallback on first certificate.
const TlsContext* selected_ctx = &tls_contexts_[0];
auto ocsp_staple_action = ocspStapleAction(*selected_ctx, client_ocsp_capable);
for (const auto& ctx : tls_contexts_) {
if (client_ecdsa_capable != ctx.is_ecdsa_) {
continue;
}
auto action = ocspStapleAction(ctx, client_ocsp_capable);
if (action == OcspStapleAction::Fail) {
continue;
}
selected_ctx = &ctx;
ocsp_staple_action = action;
break;
}

Only a single TLS certificate is supported in client contexts. In server contexts, the first RSA certificate is used for clients that only support RSA and the first ECDSA certificate is used for clients that support ECDSA.
According to the code and comment, it assumes that there is always a RSA cert existing, and fallback to RSA cert when client supports EC but no EC cert is found. We can keep this logic when matching the SNI.

What's your suggestion? @ggreenway

@RyanTheOptimist RyanTheOptimist added area/tls and removed triage Issue requires triage labels Jun 17, 2022
@ggreenway
Copy link
Contributor

In general, I'd be fine with adding support for multiple certs and selecting the correct one based on SNI. There are a few questions we'd need to sort out:

  • What happens if the SNI value doesn't match the names in any of the certs, or there is no SNI? Which cert do we choose?

If there is no SNI, we can fallback to only RSA/ECDSA selection, this will not break existing logic. If no certs match the SNI, we should let the selection fail.

But if there are many certs, and no SNI (or no matching SNI), which of the many certs should be used? We'd probably need to specify which one, maybe the first in the list of configured certs.

  • What if certs have overlapping names? Is that a config-load error, or do we allow it and have some criteria for which cert to use?

I think it should be a config-load error. It sounds weird that multiple certs have the same SNI.

Sounds good to me. You'll also need to account for wildcards, and make sure to use a more-specific cert before considering a wildcard.

  • What if there's a cert with the correct name but incorrect type (RSA vs EC), but not a complete match (name and type)?

ServerContextImpl::selectTlsContext(const SSL_CLIENT_HELLO* ssl_client_hello) {
const bool client_ecdsa_capable = isClientEcdsaCapable(ssl_client_hello);
const bool client_ocsp_capable = isClientOcspCapable(ssl_client_hello);
// Fallback on first certificate.
const TlsContext* selected_ctx = &tls_contexts_[0];
auto ocsp_staple_action = ocspStapleAction(*selected_ctx, client_ocsp_capable);
for (const auto& ctx : tls_contexts_) {
if (client_ecdsa_capable != ctx.is_ecdsa_) {
continue;
}
auto action = ocspStapleAction(ctx, client_ocsp_capable);
if (action == OcspStapleAction::Fail) {
continue;
}
selected_ctx = &ctx;
ocsp_staple_action = action;
break;
}

Only a single TLS certificate is supported in client contexts. In server contexts, the first RSA certificate is used for clients that only support RSA and the first ECDSA certificate is used for clients that support ECDSA.
According to the code and comment, it assumes that there is always a RSA cert existing, and fallback to RSA cert when client supports EC but no EC cert is found. We can keep this logic when matching the SNI.
What's your suggestion? @ggreenway

No, it is not safe to assume there is always an RSA cert. But currently, there will be at most one RSA cert and one EC cert. You'll need to decide on and clearly document the matching criteria/algorithm, including both SNI and type.

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jun 22, 2022

But if there are many certs, and no SNI (or no matching SNI), which of the many certs should be used? We'd probably need to specify which one, maybe the first in the list of configured certs.

I'm ok with that selecting first cert as default.

I think it should be a config-load error. It sounds weird that multiple certs have the same SNI.

Sounds good to me. You'll also need to account for wildcards, and make sure to use a more-specific cert before considering a wildcard.

After thinking about this for a while, It should be possible that multiple certs has the same SNI, e.g. EC and RSA certs, they both serves for the same SNI.
For wildcard, do you mean SAN(subjectAltName) certificate? In server certificate, we might have subject common name and subjectAltName. We need to use subject common name for exact SNI matching first, if no matching then we use subjectAltName for wildcard matching, is that correct?

No, it is not safe to assume there is always an RSA cert. But currently, there will be at most one RSA cert and one EC cert. You'll need to decide on and clearly document the matching criteria/algorithm, including both SNI and type.

I mean current code assume there is always a RSA cert. Because the comment says "the first RSA certificate is used...", but code always fallback to the first certificate without looking up RSA keyword.

   // Fallback on first certificate. 
   const TlsContext* selected_ctx = &tls_contexts_[0]; 

So I just propose to keep this logic when do type matching.

Based on your suggestion, criteria v1 should be:
If there is SNI in CLIENT_HELLO, we should do SNI matching to get a cert list. Then if the list is not empty, we do type matching based on this list further.
Otherwise(if no SNI in CLIENT_HELLO or SNI matching return emplty), we do type matching based on whole cert list.
(We fallback to the first cert in the list when do type matching)

@ggreenway

@ggreenway
Copy link
Contributor

I mean current code assume there is always a RSA cert. Because the comment says "the first RSA certificate is used...", but code always fallback to the first certificate without looking up RSA keyword.

This isn't correct. The current code tries to find a match, and if one isn't found, it uses certificate 0. There's no guarantee that this is an RSA cert. But I agree that we can keep this same logic.

For wildcard, do you mean SAN(subjectAltName) certificate? In server certificate, we might have subject common name and subjectAltName. We need to use subject common name for exact SNI matching first, if no matching then we use subjectAltName for wildcard matching, is that correct?

That is incorrect. The SANs should be checked first, and if there are none, then the CN should be used, according to RFC 6125. https://www.rfc-editor.org/rfc/rfc6125#section-6.4.4

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jun 23, 2022

That is incorrect. The SANs should be checked first, and if there are none, then the CN should be used, according to RFC 6125. https://www.rfc-editor.org/rfc/rfc6125#section-6.4.4

So for SNI matching:

  1. How do we organize the certs/tls contexts when loading config?
    When loading the cert with SANs and constructing TlsContext, add it to the map<SAN, list<TlsContext>>
    When loading the cert without SANs and constructing TlsContext, add it to the map<CN, list<TlsContext>>
  2. How do we select the cert based on SNI when doing TLS handshake?
    First, do exact matching with map<CN, list<TlsContext>>
    if no tls context matched, do wildcard matching with map<SAN, list<TlsContext>>

After the SNI matching,
If return list is not empty, we do type matching based on this list (fallback to the first one if none matched)
if return list is empty, we do type matching based on whole tls contexts (fallback to the first if none matched)

A few questions:

  1. Can CN be a wildcard?
    It mentions that "Common Name contains a string whose form matches that of a fully qualified DNS domain name."
    https://www.rfc-editor.org/rfc/rfc6125#section-2.3
    But I saw CN can be a wildcard in some blogs
    https://aboutssl.org/how-to-generate-csr-for-wildcard-ssl-certificate/

  2. Matching SNI with SAN?
    Not sure how to do this. There are several identifier types within SAN entries:
    DNS-ID, SRV-ID, URI-ID
    But for SNI support, RFC6066 mentions that "Currently, the only server names supported are DNS hostnames"
    https://datatracker.ietf.org/doc/html/rfc6066#section-3
    So does that mean we only care about DNS-ID and need to ignore SRV-ID and URI-ID when do SNI matching with SAN?

@LuyaoZhong
Copy link
Contributor Author

@ggreenway kindly ping :)

@LuyaoZhong
Copy link
Contributor Author

That is incorrect. The SANs should be checked first, and if there are none, then the CN should be used, according to RFC 6125. https://www.rfc-editor.org/rfc/rfc6125#section-6.4.4

So for SNI matching:

  1. How do we organize the certs/tls contexts when loading config?
    When loading the cert with SANs and constructing TlsContext, add it to the map<SAN, list<TlsContext>>
    When loading the cert without SANs and constructing TlsContext, add it to the map<CN, list<TlsContext>>
  2. How do we select the cert based on SNI when doing TLS handshake?
    First, do exact matching with map<CN, list<TlsContext>>
    if no tls context matched, do wildcard matching with map<SAN, list<TlsContext>>

After thinking about this for a while, we don't need to create a map, we can just add a member SANs and CN to TlsContext, just like is_ecdsa_ in TlsContext, then we look up these TlsContext objects, and do SNI matching.

After the SNI matching, If return list is not empty, we do type matching based on this list (fallback to the first one if none matched) if return list is empty, we do type matching based on whole tls contexts (fallback to the first if none matched)

A few questions:

  1. Can CN be a wildcard?
    It mentions that "Common Name contains a string whose form matches that of a fully qualified DNS domain name."
    https://www.rfc-editor.org/rfc/rfc6125#section-2.3
    But I saw CN can be a wildcard in some blogs
    https://aboutssl.org/how-to-generate-csr-for-wildcard-ssl-certificate/
  2. Matching SNI with SAN?
    Not sure how to do this. There are several identifier types within SAN entries:
    DNS-ID, SRV-ID, URI-ID
    But for SNI support, RFC6066 mentions that "Currently, the only server names supported are DNS hostnames"
    https://datatracker.ietf.org/doc/html/rfc6066#section-3
    So does that mean we only care about DNS-ID and need to ignore SRV-ID and URI-ID when do SNI matching with SAN?

@ggreenway What do you think about matching criteria? And for above questions do you have any experience about that?

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jul 6, 2022

@ggreenway could you have a look at the code? Let us nail down the matching criteria

I don't introduce any cert loading error, because I think the users or control plane should consider what certificates they provide. And it's hard to define which condition should raise loading error, two certs have overlapped SANs or overlapped subject common name? It's should be enough that we select a correct cert that the client wants. What's your suggestions?

For cert selection criteria, please look the code, I left many comments there and updated the doc to explain the matching process.

@github-actions
Copy link

github-actions bot commented Aug 6, 2022

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 6, 2022
@ggreenway
Copy link
Contributor

Not stale; PR in progress: #22036

@ggreenway ggreenway removed the stale stalebot believes this issue/PR has not been touched recently label Aug 8, 2022
@github-actions
Copy link

github-actions bot commented Sep 7, 2022

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added stale stalebot believes this issue/PR has not been touched recently and removed stale stalebot believes this issue/PR has not been touched recently labels Sep 7, 2022
@LuyaoZhong
Copy link
Contributor Author

Not stale, pr will be updated soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants