-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC3922: Removing SRV records from homeserver discovery #3922
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# MSC3922: Removing SRV records from homeserver discovery | ||
|
||
Currently when [resolving server names](https://spec.matrix.org/v1.4/server-server-api/#resolving-server-names), | ||
homeservers (or any implementation trying to locate a server, such as integration managers or widgets | ||
using [OpenID Connect validation](https://spec.matrix.org/v1.4/server-server-api/#openid)) must support | ||
an ability to resolve SRV DNS records. Aside from this being difficult in the case of widgets (for example), | ||
SRV records typically cause deployment issues due to them not working "as expected" by server administrators. | ||
|
||
In addition to SRV records not "properly" supporting CNAMEs, TLS certificates are difficult to configure | ||
correctly and often lead to issues with the wrong certificate being presented. These sorts of issues | ||
come up often enough that [Synapse's documentation](https://matrix-org.github.io/synapse/v1.70/delegate.html#srv-dns-record-delegation) | ||
doesn't even explain how to use SRV records, instead referencing the specification itself and citing that | ||
.well-known is often what administrators are looking for. The documentation additionally calls it | ||
"SRV delegation", further indicating that the use of SRV records is complex (it's not true delegation, | ||
unlike what is possible with .well-known). | ||
|
||
This proposal removes all reference of SRV records from the homeserver discovery specification, and | ||
a plan to handle the rollout of such an invasive change. | ||
|
||
## Proposal | ||
|
||
In short, the [current rules](https://spec.matrix.org/v1.4/server-server-api/#resolving-server-names) | ||
which reference SRV records are deleted. This leads to the following discovery mechanism: | ||
|
||
*Note*: Some details, such as caching and certificate presentation, are excluded. They are unchanged. | ||
|
||
1. If the hostname is an IP literal, then that IP address should be used. If a port number is given then | ||
it should be used, otherwise using port 8448. The `Host` header in the request is set to the server name | ||
(which is the IP address), with port number if explicitly given. | ||
2. If the hostname is *not* an IP literal, but does have an explicit port, resolve the name using A or | ||
AAAA records to an IP and use that with the explicit port. The `Host` header in the request is set to | ||
the server name, with port number. | ||
3. If the hostname is *not* an IP literal, a regular HTTPS request is made to the .well-known endpoint | ||
on that domain. The hostname presented by this endpoint is called the "delegated hostname" and repeats | ||
discovery steps 1 & 2 above. It does not repeat step 3 (this step) as that could cause infinite loops | ||
or needless delays in discovery. | ||
4. Server discovery fails and the server is presumed offline or invalid if it has not been resolved to | ||
a usable IP and port by this step. | ||
|
||
Clearly this would cause disruption in the larger ecosystem as some servers might still be using SRV | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the whole I worry that this will create a significant headache for if the Matrix federation protocol changes sufficiently enough to no longer be HTTP-centric (which I really hope it will). I have been advocating for some research into a binary federation protocol for ages now to get around the computational expense, wire bloat and signing difficulties that JSON has. This puts quite a roadblock in the way of that as it still mandates that the discovery part of the stack is HTTP+TLS even if no other part of the stack is. DNS SRV might be awkward to configure but it's otherwise protocol-agnostic. I don't think it is wise at all to remove it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This MSC is part 1 of a fairly long series tied in with the IETF/MIMI work we're doing. Specifically, we're aiming to separate transport from the protocol by defining a "Matrix over HTTP+JSON" thing, which would naturally include discovery. The discovery mechanism itself might still be http-centric, however this is not a requirement that the entire federation protocol be over http. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels to me like a different transport mechanism would necessitate a different discovery mechanism. SRV records don't provide a way for a server to say "actually you can talk to me over CoAP instead of HTTPs", so I don't really understand what benefit they provide in the current ecosystem. On the other hand, one of the reasons SRV records currently suck is that they interact poorly with HTTPs. (Or at least, they don't interact in the intuitive way.) Everyone has expectations about how HTTPs and DNS interact, and SRV records don't follow the pattern. With a different transport mechanism, perhaps those expectations will be different, in which case we can consider using SRV records for that transport. But I don't see why that means we need to keep SRV records for HTTP-based transports. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like to point out that SRV is currently problematic for e.g. Dendrite and all other implementations using golang. Go does not try to handle domain name compression gracefully, when resolvers and/or authoritative NSes act against the RFC with TXT and SRV records. I've run into this issue with multiple HSes in the past when debugging obscure cases of broken federation. Some of the relevant details of the behaviour can be found e.g. here: golang/go#10622 Additionally I've seen cases where SRV works "by accident" because an admin didn't request separate certificates as they had originally planned to, thus having the required hostname included correctly while their intention was to use separate certificates instead. Point being, the SRV is hard to get right as it doesn't follow conventions familiar with the HTTPS. Additional or differently formatted SRV record would be required for other than current HTTPS based federation service discovery anyway, so designing and adding such method back later is IMO rather obvious possibility, but not blocker for this proposal. Getting rid of SRV for HTTPS based federation would avoid many problems and time wasted, so the relevant consideration is, if all practical use-cases can be met with the well-known based discovery alone. I would suggest adding configuration option to log HSes that were discovered over SRV, enabling easy gathering of real life usage information by participating HS admins. Additionally the log would help detecting those cases where Go dns resolution fails to serve its purpose and admin is debugging federations problems encountered by e.g. Dendrite. Unfortunately the mis-handling of domain name compression can happen at either end, so this cannot be fully solved by adding test functionality to federationtester only. Properly working caching resolver can hide the issue, or badly behaving one could introduce it regardless of the correct response from the authoritative NS. Unfortunately I don't have statistics or further information of those NSes that were involved in those cases that I have had to debug. Golang has decided not to mitigate the issue unlike practically every other implementation has seemed to do, so yes, it's (still) the DNS, again. |
||
records to identify themselves. Readers of this proposal are encouraged to proactively change over to | ||
.well-known to identify if there are legitimate reasons for keeping SRV records, even if this proposal | ||
is still in a draft/unapproved state. | ||
|
||
In order to not cause massive breaking changes in the ecosystem, this proposal first deprecates SRV | ||
discovery for a minimum of 1 calendar year from the time of the spec release itself. Afterwards, at the | ||
discretion of the Spec Core Team (SCT), SRV discovery can be removed without notice. | ||
|
||
Homeserver authors (Synapse, Dendrite, Conduit, etc) are encouraged to use the deprecation period to | ||
help their users transition to .well-known discovery, and if reading this proposal before it is accepted | ||
then to help identify any legitimate reason to keep SRV discovery in the specification (for example, a | ||
user of theirs is completely unable to switch to .well-known - the case would be discussed to determine | ||
if it's a reasonable blocker for this proposal). | ||
|
||
The Matrix.org Foundation would also be engaged in helping users move over to .well-known through the | ||
normal channels (blog posts, changelog on Synapse, social media, etc). | ||
|
||
## Potential issues | ||
|
||
As identified, this change could impact legitimate usage of SRV records for discovery - this proposal | ||
exists to give readers, homeserver authors, etc time to identify these cases before the proposal is put | ||
up for final comment period (thus proposing it for acceptance). If legitimate cases of SRV records are | ||
found, this proposal may be declined or rejected (per normal process). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are a couple nice things about SRV that we don't get (at least currently) from the The first, and one that has always made me fret a bit about Broadly though: it makes me a bit nervous to tie availability of federation to the availability of your front door web server (which is a natural target for e.g. DDoS and the like). Especially for smaller deployments, where it generally doesn't take much for their website to go down. Secondly, SRV records have inbuilt support for returning multiple servers with different priorities and weights. This is not currently used by anyone (AFAIK), but may prove to be very helpful in the event we get a HA Matrix server. We see these options used heavily in SMTP land (via MX records), where you have your primary set of SMTP servers and then your backup set of servers in cases of severe outages of your primary set. This can be added to I do have sympathy for the argument that we shouldn't overly worry about this as its just not used currently, but personally I think we need to maintain half an eye on ensuring that it would be possible to implement HA in federation sanely. Most of the above arguments were made when we introduced |
||
|
||
## Alternatives | ||
|
||
This may be a good time to design new discovery mechanisms, however that would have an even larger | ||
impact on the ecosystem. Additionally, .well-known appears to be the (current) industry standard | ||
for this mechanism. | ||
|
||
## Security considerations | ||
|
||
Removing SRV discovery could mean a higher rate of homeservers being delegated to third party providers | ||
or being targets of takeover attempts, however given Synapse (the most populous homeserver implementation) | ||
already strongly recommends .well-known over SRV, this issue is considered trivial in nature. | ||
|
||
## Unstable prefix | ||
|
||
No unstable prefix is possible for this proposal. Instead, a migration period is explicitly proposed | ||
as an alternative. | ||
|
||
## Dependencies | ||
|
||
None relevant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for abundance of clarity: this MSC is currently extremely low on the priorities list, and is leaning towards rejection rather than acceptance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This MSC just came up in #conduit:fachschaften.org, and as an SDK & power user client developer, this is something that I would absolutely love to see getting removed as I have functionality that eg. explicitly depends on /_matrix/federation/v1/version. Getting this in a browser is currently a big pain point.
Adding that endpoint to the Client-Server API would help, but that doesn't solve eg. being able to look up server keys or other future extensions to the federation APIs.