Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes SNI support for clusters can generate illegal DNS names #2766

Closed
webvictim opened this issue Jun 10, 2019 · 6 comments
Closed

Kubernetes SNI support for clusters can generate illegal DNS names #2766

webvictim opened this issue Jun 10, 2019 · 6 comments
Labels
Milestone

Comments

@webvictim
Copy link
Contributor

webvictim commented Jun 10, 2019

What happened:
The Kubernetes SNI support (introduced in #2619) doesn't work when it ends up generating a DNS name which is longer than 63 characters (such as k67757374657374696e676e65772d656173742e6772617669746174696f6e616c2e636f.gustestingnew-main.gravitational.co)

$ kubectl get pods
Unable to connect to the server: dial tcp: lookup k67757374657374696e676e65772d656173742e6772617669746174696f6e616c2e636f.gustestingnew-main.gravitational.co: no such host
------------------------------------------------------------
$ host k67757374657374696e676e65772d656173742e6772617669746174696f6e616c2e636f.gustestingnew-main.gravitational.co
host: 'k67757374657374696e676e65772d656173742e6772617669746174696f6e616c2e636f.gustestingnew-main.gravitational.co' is not a legal IDN name (domain label longer than 63 characters), use +noidnin
------------------------------------------------------------
$ echo "k67757374657374696e676e65772d656173742e6772617669746174696f6e616c2e636f.gustestingnew-main.gravitational.co" | wc -c
108

k67757374657374696e676e65772d656173742e6772617669746174696f6e616c2e636f here is gustestingnew-east.gravitational.co which is a fairly standard cluster name.

The effect of these >63 character records is that the connection fails and Kubernetes support can't be used for those clusters. >63 characters is pretty easy to hit in many standard multi-region setups which contain the region in the DNS name.

In my experience so far, hex-encoding the string containing the name of the trusted cluster causes its length to roughly double, so realistically any trusted cluster name which is longer than 30 characters or so will probably be unusable, and this value gets shorter depending on the length of the main cluster's domain name.

The only workaround for this so far is "use shorter cluster names" which isn't really doable in many cases.

What you expected to happen:
Kubernetes SNI support to work as described in the PR. This static encoding of the SNI probably needs to be replaced with some kind of stateful encoding which doesn't have limits on cluster name length and actively tries to generate DNS records which are shorter than 63 characters.

How to reproduce it (as minimally and precisely as possible):
Set up a trusted cluster with a name that contains a few subdomains (such as gustestingnew-east.gravitational.co above)

Environment:

  • Teleport version (use teleport version): Teleport Enterprise v3.2.4git:v3.2.4-0-g339827c6 go1.11.5
  • Tsh version (use tsh version): Teleport v3.2.4 git:v3.2.4-0-g339827c6 go1.11.5
  • OS (e.g. from /etc/os-release): Fedora 29
@webvictim webvictim added the bug label Jun 10, 2019
@kontsevoy kontsevoy added this to the 4.1 "Seattle" milestone Jun 11, 2019
@kontsevoy
Copy link
Contributor

Looks like we have an enterprise evaluation that depends on it. Let's schedule it for a 4.1 point-release.

@klizhentas
Copy link
Contributor

It would be impossible to completely solve the problem given the fact that it would still be possible to create a cluster name of 252 characters (with max DNS name of 253) and even with the shortener that would result in reaching out max DNS name.

However, we can solve the problem of 63 max label name by using a hash function instead of variable length hex encoding.

I suggest to use SHA1, for the example above it will look like this:

>>> hashlib.sha1("gustestingnew-main.gravitational.co").hexdigest()
'c2ead9d108feec21599547c7cee5f62e3c3ce684'

The collision probability on our set of trusted clusters ~1K range is negligible, so I'm leaning towards this option.

I've picked SHA1 because it generates 40 char hex dumps which is nice given our 63 max limit.

@fspmarshall @russjones thoughts?

@knisbet
Copy link
Contributor

knisbet commented Jul 6, 2019

I'm not sure I understand teleport or this problem well enough to provide useful feedback. I'm also not sure if this is a problem only for the kubernetes integrations, or other protocols / integrations have similar problems.

Assuming it's only Kubernetes/kubectl/client-go as described by Gus, one area I would explore is whether teleport uses a single x509 certificate for all proxied clusters or whether it creates or is able to create a certificate per cluster. If it creates a certificate per cluster, I would consider embedding the state for which cluster to access into the certificate, instead of trying to signal it over SNI with wildcard DNS. This way the kubeconfig can just point to the proxy, and the proxy can make it's routing decision based on a field embedded in the Subject. Something like /CN=kevin/O=gustestingnew-west.gravitational.co. If the current behaviour is when logging in, teleport signs a single certificate, and kubectl is then configured for all clusters, this obviously won't work. Another benefit, is it might mean it doesn't require wildcard DNS, which I assume adds some friction to getting teleport setup in some cases.

Some other random thoughts:

  • I wouldn't use sha1, for this use case there's nothing wrong with it, it's just due to it's weaknesses scanners/audits etc may take it out of context and decide teleport has a security issue.
  • If this doesn't require crypto properties, I would probably pick a non-crypto hash so as not to confuse this as requiring strong cryptograghy. I'm not up to date on non-crypto hashes, but CityHash/Siphash/SpookyHash could be looked into.
  • If it does require a crypto hash, probably for better collision resistance, I would then go with something from sha2 (can be truncated), or blake 2b. I'm not sure if blake2b should be used though, as it may not be compatible with the fips compliant builds.

@klizhentas
Copy link
Contributor

I like your Subject-driven routing idea, it's very nice, that's not how teleport works at the moment - user gets a single client cert per cluster.

I agree re: sha1, it's better to stick to non-crypto cert to signal that it's just for routing, I will look into it.

@fspmarshall
Copy link
Contributor

I'm still getting caught up on the multi-cluster stuff, but I take it that the reason for not requiring a cryptographic hash would be because adding a cluster requires administrative privileges, yeah? And, in theory, a hash collision could cause a client to be routed to a different cluster than intended?

Assuming the above to be true, and assuming a large dose of professional paranoia, I'd opt for using a cryptographic hash (barring any performance issues). Clearly there is no way to prevent a malicious administrator from doing damage, but I still think it is preferable to reduce the ability of malicious administrators to pull off subtle attacks. Inducing a hash collision is a less obvious attack than re-routing traffic explicitly, and almost surely would be harder to detect during an audit.

On the subject of maximum paranoia, SHAKE128 with a digest length of 31 bytes, would result in a 62 character hex string (just under the limit), is FIPS compliant, and offers 124 bits of collision resistance at that configuration.

@webvictim
Copy link
Contributor Author

webvictim commented Jul 9, 2019

Unless I'm missing something, I don't see how we could reverse a SHA1/SHA2/any one-way hash to extract the real name of the target cluster from an SNI request? The beauty of hex encoding was that it was easily reversible to get back to a real cluster name. I suppose we could just hash every remote cluster name until we find one that fits...

I like Kevin's suggestion of encoding the cluster name into the certificate.

klizhentas added a commit that referenced this issue Jul 12, 2019
This commit fixes issue #2766.

The prior logic in Kubernetes module used
SNI to route requests to the target kubernetes cluster.

This approach created problems with long cluster names
exceeding 61 character DNS label limit and
required setting up DNS wildcard records.

This commit changes the routing to use the metadata
encoded in client's x509 certificate to route the
request to the target cluster.

SNI approach will be supported for several versions
to preserve backwards compatibility.
klizhentas added a commit that referenced this issue Jul 16, 2019
This commit fixes issue #2766.

The prior logic in Kubernetes module used
SNI to route requests to the target kubernetes cluster.

This approach created problems with long cluster names
exceeding 61 character DNS label limit and
required setting up DNS wildcard records.

This commit changes the routing to use the metadata
encoded in client's x509 certificate to route the
request to the target cluster.

SNI approach will be supported for several versions
to preserve backwards compatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants