-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
underscores should be stripped from hostnames generated for apt config #3628
Comments
Launchpad user Dan Watkins(oddbloke) wrote on 2020-03-20T14:33:28.768520+00:00 The current behaviour is that cloud-init will use the patterns defined in /etc/cloud/cloud.cfg: primary:
to determine the mirror to use. In this case, it will be one of the two latter patterns, depending on exactly how the data source in question presents "zone_01". Either way, the problem is the same. Once this mirror URL is generated, cloud-init tests that it resolves before using it. This is where the problem lies: *.clouds.archive.ubuntu.com will always resolve, but the newly-deployed Apache servers will no longer serve every domain that resolves. Arguably this is a misconfiguration of the archive servers (why resolve something that you can't serve?), but cloud-init should handle this case gracefully regardless. There are (at least) a couple of ways in which we could address this issue in cloud-init: (a) rewrite the generated URL (or the variables which we are substituting into the pattern) to only include valid URI characters While both of these would address the immediate issue, only implementing (b) would mean that all instances in such zones would fallback to using archive.ubuntu.com, so I think we should do some form of (a) regardless. One obvious downside to (b) is that it will introduce an additional HTTP request to each boot on a Debian/Ubuntu host; this could be a concern both from a client boot speed perspective, but perhaps more importantly from a server load perspective. (My gut feel is that the cost in both cases wouldn't be significantly noticeable: most Debian/Ubuntu instances that come up will perform many HTTP requests to the archive hosts, so one additional one isn't likely to be noticed. We should consider this more deeply before we implement this, however.) (As an aside, we should do some research to confirm that the non-ASCII encoding described in the linked RFC 3986 section won't be affected by our filtering. For example, if we currently rely on the libraries we use to convert non-ASCII hostnames to the defined percent-encoding, then we would regress non-ASCII hostnames by applying a naive filter before we pass the name to those libraries.) |
Launchpad user Dan Watkins(oddbloke) wrote on 2020-03-20T15:51:32.599784+00:00 I've just done some research (by stepping through with a debugger), and socket.getaddrinfo does perform the encoding of non-ASCII characters: In [7]: socket.getaddrinfo('www.\u2603.com', None)[0][4][0] It does so using the 'idna' encoding: In [2]: "www.☃.com".encode('idna') which (unsurprisingly, given this bug) doesn't do anything to underscores: In [4]: "www_foo.☃.com".encode('idna') So I believe the correct implementation of (a) would be to encode the URL ourselves, and then drop any invalid characters out. (We should check if there is any stdlib/requests functionality that already does this.) |
Launchpad user Dan Watkins(oddbloke) wrote on 2020-03-23T16:15:04.922236+00:00 The cloud-init team chatted just now and we plan to implement (a), and reach out to the security and archive teams to confirm that they are comfortable with this path forward. I'll reiterate that proposal here for ease of review by those teams: The Background: To enable the provisioning of cloud-specific mirrors via only DNS changes, cloud-init configures apt with a .clouds.archive.ubuntu.com archive URL by default when launching Ubuntu instances. The specific "" name comes from the cloud (in slightly different ways depending on the cloud in question, but they all boil down to region/availability zone names). There is a *.clouds.archive.ubuntu.com DNS rule which resolves all of these archive URLs to the archive.ubuntu.com servers (in the absence of specific DNS rules for a region/cloud). The Problem: When instances are launched in regions/availability zones containing characters that are invalid in URIs (e.g. zone_01), cloud-init will currently generate and configure apt host names which are invalid URIs (e.g. zone_01.clouds.archive.ubuntu.com). These invalid URIs are now rejected by the archive.ubuntu.com web hosts (which were recently upgraded to bionic and its Apache), which means that new instances launched in these zones are configured with non-functional mirrors (the configured mirrors will return 400 for all requests). The Proposed Solution: cloud-init will remove invalid URI characters from the generated hostnames before configuring apt with them. (For example, zone_01.clouds.archive.ubuntu.com would be rewritten to zone01.clouds.archive.ubuntu.com.) Special care will be taken to ensure that IDNA names are not regressed. |
Launchpad user Seth Arnold(seth-arnold) wrote on 2020-03-24T02:58:49.996545+00:00 It would be nice to address the wildcard DNS entries, as those have a potential for abuse, and can be endlessly confusing if you're not prepared for them. In the meantime though, this plan sounds good to me. I'm worried about collisions, where multiple providers may use us_west_2, us%west2, uswest~2, etc. Some phrases read differently if the spacing is removed. The usual examples are powergen_italia and experts_exchange, but perhaps there's more realistic phrases for region names. (This seems quite small problem compared to the overall wildcard DNS entries, though.) Reversible transformations are usually better but since we're presumably doing this with business partners, the trouble cases may fall under "don't do that" kinds of categories. Are these actual problems? Probably it's fine but I thought I'd mention them just in case someone else with more context or creativity can make more of them. Thanks |
Launchpad user Junien Fridrick(axino) wrote on 2020-03-24T08:02:28.839221+00:00 I do wonder about the actual risks of using "HttpProtocolOptions Unsafe" on the archive servers, which are only serving public, static files. |
Launchpad user Dan Watkins(oddbloke) wrote on 2020-03-24T14:10:45+00:00 On Tue, Mar 24, 2020 at 02:58:49AM -0000, Seth Arnold wrote:
The wildcard DNS entries are required for the system as-designed to (We generate per-region mirrors for, I believe, a couple of reasons.
OK, good!
To some extent, we do have this problem to solve already, as clouds
It wouldn't be reversible, but we could convert invalid characters into, (I wonder if there are URI/hostname length boundaries that we would risk
The input is certainly welcome! |
Launchpad user Dimitri John Ledkov(xnox) wrote on 2020-03-25T11:02:37.036902+00:00 in punnycode -- is the encoding of -, thus i wouldn't want to use that. In [36]: "-☃.com".encode('idna') I wish that idna module would do something sensible, and allow invalid chars to be encoded into xn-* string as unicode codepoints, but nothing seems to do that in python. Maybe time for an RFC update?! Maybe it is worth talking to the cloud admins about using idna-safe zone names. Most clouds use '-' for region name separators. We don't need round trip safety, as one can always query the zone name from the cloud-init metadata. Thus cloud init imho, should be replacing any invalid chars with '-' (slightly better than dropping them, as the string length remains the same) In terms of applying "HttpProtocolOptions Unsafe" on the cloud-mirror side, we should do this to support all the already launched instances, and those that are getting launched with cloud-init versions that generate unsafe urls, because otherwise those instances do not receive any updates. (I hope i understand the option right, and support strategy) |
Launchpad user Dan Watkins(oddbloke) wrote on 2020-03-25T21:00:12.385015+00:00
This isn't the case; "xn--" is the prefix used to indicate a domain name label is encoded with Punycode. I don't believe Punycode modifies ASCII characters in its input (though it of course does change their position (and order relative to non-ASCII characters)): In [17]: "-\u3061".encode('idna') == "a\u3061".encode('idna').replace(b'a', b'-') Regardless, I think increasing the length of the string is asking for problems, so I think "--" would a mistake. Replacing them with "-" also poses a problem: "-" isn't a valid character at the start or end of domain name labels; rewriting "foo_.example.com" to "foo-.example.com" isn't any more valid (and, in fact, as "-"s are explicitly disallowed there, I bet it's actually worse in practical terms as naive implementations may reject the latter but not the former). I think I'm going to go for "replace non-LDH characters with a single dash, then strip leading/trailing dashes". (It won't be too costly to modify, so if someone has a better idea please let me know.) |
Launchpad user Chad Smith(chad.smith) wrote on 2020-03-26T23:14:26.834061+00:00 +1 on this case. I worry about extending hostnames too as that typically has been a problem in the past at 63 chars disappear quickly. The only alternative I could think of was replacing a leading or trailing '-' with an arbitrary 0 to avoid collisions with other hostnames that don't start or end with '-' if we strip characters. |
Launchpad user Paul Collins(pjdc) wrote on 2020-04-15T04:45:19.170283+00:00 We've made some changes to the archive.ubuntu.com environment, and now requests to e.g. under_score.clouds.archive.ubuntu.com are directed to a dedicated vhost that has "HTTPProtocolOptions unsafe" enabled. |
Launchpad user Chad Smith(chad.smith) wrote on 2020-05-09T03:02:10.584272+00:00 This bug is believed to be fixed in cloud-init in version 20.2. If this is still a problem for you, please make a comment and set the state back to New Thank you. |
Launchpad user Launchpad Janitor(janitor) wrote on 2020-05-13T22:38:19.088559+00:00 This bug was fixed in the package cloud-init - 20.2-20-gd10ce3ec-0ubuntu1 cloud-init (20.2-20-gd10ce3ec-0ubuntu1) groovy; urgency=medium
-- Chad Smith chad.smith@canonical.com Mon, 11 May 2020 20:17:06 -0600 |
Launchpad user Seth Arnold(seth-arnold) wrote on 2020-05-20T23:50:28.112617+00:00 I believe this is fixed via c478d0b I read through this patch and liked what I saw, thanks. |
This bug was originally filed in Launchpad as LP: #1868232
Launchpad details
Launchpad user Tom Haddon(mthaddon) wrote on 2020-03-20T09:04:54.664199+00:00
In a ticket filed in the Ubuntu RT instance we were made aware of an issue where if a cloud is configured with an “” in the region name, cloud-init will generate an apt configuration that also includes that “” in the name.
So for example if the region name is zone_01, apt will be configured to use zone_01.clouds.archive.ubuntu.com.
On Friday March 13th we deployed some new archive servers on 18.04 using Apache 2.4.29-1ubuntu4.13. This version of apache has more strict protocol options than previous versions, per https://httpd.apache.org/docs/2.4/mod/core.html#httpprotocoloptions and the result is that a request to zone_01.clouds.archive.ubuntu.com returns a 400 Bad Request.
Could cloud-init be updated to remove non-permitted characters including “_” per https://tools.ietf.org/html/rfc3986#section-3.2.2 ?
The text was updated successfully, but these errors were encountered: