Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Sanity check identity server passed to bind/unbind. #9802

Merged
merged 5 commits into from
Apr 19, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/9802.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add some sanity checks to identity server passed to 3PID bind/unbind endpoints.
27 changes: 24 additions & 3 deletions synapse/handlers/identity.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
# limitations under the License.

"""Utilities for interacting with Identity Servers"""

import logging
import urllib.parse
from typing import Awaitable, Callable, Dict, List, Optional, Tuple
Expand All @@ -35,7 +34,11 @@
from synapse.types import JsonDict, Requester
from synapse.util import json_decoder
from synapse.util.hash import sha256_and_url_safe_base64
from synapse.util.stringutils import assert_valid_client_secret, random_string
from synapse.util.stringutils import (
assert_valid_client_secret,
parse_and_validate_server_name,
random_string,
)

from ._base import BaseHandler

Expand Down Expand Up @@ -173,6 +176,11 @@ async def bind_threepid(
server with, if necessary. Required if use_v2 is true
use_v2: Whether to use v2 Identity Service API endpoints. Defaults to True

Raises:
SynapseError: On any of the following conditions
- the supplied id_server is not a valid Matrix server name
- we failed to contact the supplied identity server

Returns:
The response from the identity server
"""
Expand All @@ -182,6 +190,11 @@ async def bind_threepid(
if id_access_token is None:
use_v2 = False

try:
parse_and_validate_server_name(id_server)
except ValueError:
raise SynapseError(400, "id_server must be a valid Matrix server name")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not (necessarily) supposed to be a valid Matrix server name.

Identity servers don't use the matrix server-name resolution algorithm (fully specced at https://matrix.org/docs/spec/server_server/r0.1.4#resolving-server-names): rather they just go straight into an https url.

so I think "what is valid here" is "what is valid in the 'authority' part of an HTTPS URL", which I think is defined by RFC3986 ?

It's also not entirely clear to me that the IS API must be exposed at the root of the path hierarchy (I don't think this is mandated anywhere in the spec), so maybe id_server should be allowed to include / components?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, sorry, I missed this. My email rules still need a bit of tweaking, it seems.

Identity servers don't use the matrix server-name resolution algorithm (fully specced at https://matrix.org/docs/spec/server_server/r0.1.4#resolving-server-names): rather they just go straight into an https url.

Ah, I missed this. I did check the C-S API spec and the "hostname plus optional port" language made me think of the Matrix server name algorithm. The language used in the spec seems inconsistent/colourful enough, so this now looks to me as a bit of a spec bug:

The identity server to unbind all of the user's 3PIDs from.

The hostname+port of the identity server which should be used for third party identifier lookups.

The hostname of the identity server to communicate with. May optionally include a port.

The identity server to use.

url of identity server authed with, e.g. 'matrix.org:8090'


It's also not entirely clear to me that the IS API must be exposed at the root of the path hierarchy (I don't think this is mandated anywhere in the spec), so maybe id_server should be allowed to include / components?

Per the above, only one place mentions the word "url" (which wouldn't work if followed literally), but then proceeds to give a hostname+port example. In other places, the wording seems to flat out disallow using a hostname + path components hybrid. I also don't see this added flexibility as being too useful practically, but maybe I'm missing some scenarios?

so I think "what is valid here" is "what is valid in the 'authority' part of an HTTPS URL", which I think is defined by RFC3986 ?

The differences of RFC3986's authority rules from the Matrix server algorithm seem very exotic:

  • Allowing the optional userinfo "@" part (which seems non-sensical in this context)
  • Allowing percent-encoded characters
  • Allowing sub-delims: "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

I'm wondering whether the best course of action here would be to change the spec to mandate a Matrix server name here, since it seems it is overwhelmingly likely that this is what everyone running an IS is doing.

Copy link
Member Author

@dkasak dkasak Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and passing either the userinfo, percent-encoded chars or the chars from sub-delims fails shortly afterwards, with either twisted or treq complaining that it isn't a valid hostname. So it seems like the remaining question is whether a root path should be allowed.

Given the existing wording of the spec, it seems unlikely to me that anyone is using that. The spec should be clarified in either case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language used in the spec seems inconsistent/colourful enough, so this now looks to me as a bit of a spec bug:

The spec may be unclear, but synapse has always been very clear that we do not use the matrix federation routing algorithm for ISes. For example, if no explicit port is given, we use port 443 (federation uses 8448). Since Synapse has done this forever, that behaviour is de-facto part of the spec, even if the actual words of the spec don't make that clear.

In other places, the wording seems to flat out disallow using a hostname + path components hybrid.

Could you cite any specific examples?

I also don't see this added flexibility as being too useful practically, but maybe I'm missing some scenarios?

It's just that sometimes it's easier to deploy things at a sub-path. There are parallels to the ongoing debate about pusher URLs; we've even heard of people setting up Synapse's C-S API at a subpath (#9574).

I'm on the fence about this, tbh. I could go either way. We should make a decision and then be explicit about it, though.

I'm wondering whether the best course of action here would be to change the spec to mandate a Matrix server name here, since it seems it is overwhelmingly likely that this is what everyone running an IS is doing.

I wouldn't have any particular objection to defining a grammar for IS names which happens to be the same as the grammar for Matrix server_names - as you say the current differences are minor. However, I would oppose the use of any phrasing like "must be a valid Matrix server name", because server_names bring with them the implication of using the "Resolving server names" algorithm - which, as above, is not what happens.

Copy link
Member Author

@dkasak dkasak Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you cite any specific examples?

My reading of e.g. https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-account-password-email-requesttoken, which says the following for the id_server parameter

The hostname of the identity server to communicate with. May optionally include a port.

Is that it doesn't allow a path.

I'm on the fence about this, tbh. I could go either way. We should make a decision and then be explicit about it, though.

Agreed. Since this used to allow specifying a root path, I'm going to change the PR so that it doesn't forbid it. Once a decision has been made, we can easily make it stricter again.

I wouldn't have any particular objection to defining a grammar for IS names which happens to be the same as the grammar for Matrix server_names - as you say the current differences are minor. However, I would oppose the use of any phrasing like "must be a valid Matrix server name", because server_names bring with them the implication of using the "Resolving server names" algorithm - which, as above, is not what happens.

Gotcha. I pushed a new commit which does away with this wording (but still uses the parse_and_validate_server_name function internally to enforce the same rules).


# Decide which API endpoint URLs to use
headers = {}
bind_data = {"sid": sid, "client_secret": client_secret, "mxid": mxid}
Expand Down Expand Up @@ -270,12 +283,20 @@ async def try_unbind_threepid_with_id_server(
id_server: Identity server to unbind from

Raises:
SynapseError: If we failed to contact the identity server
SynapseError: On any of the following conditions
- the supplied id_server is not a valid Matrix server name
- we failed to contact the supplied identity server

Returns:
True on success, otherwise False if the identity
server doesn't support unbinding
"""

try:
parse_and_validate_server_name(id_server)
except ValueError:
raise SynapseError(400, "id_server must be a valid Matrix server name")

url = "https://%s/_matrix/identity/api/v1/3pid/unbind" % (id_server,)
url_bytes = "/_matrix/identity/api/v1/3pid/unbind".encode("ascii")

Expand Down