Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2346: Bridge information state event #2346

Open
wants to merge 23 commits into
base: old_master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 241 additions & 0 deletions proposals/2346-bridge-info-state-event.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
# MSC 2346: Bridge information state event
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved

Many rooms across the Matrix network are currently bridged into third party networks, using bridges.
However the spec does not contain a cross-federated method to determine which networks are
bridged into a given room.

There exists a way to do this in a local setting, by using the
[/thirdparty/location](https://matrix.org/docs/spec/application_service/r0.1.2#get-matrix-app-v1-thirdparty-protocol-protocol)
API but this creates a splitbrain view across the federation and is an unnacceptable situation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MSC doesn't solve what the /thirdparty APIs are trying to do though, as they're trying to answer the question of "what third party networks are bridged on the server and what are their rooms", whereas this MSC is trying to answer the question of "which room is this bridged to?".


Many users have taken to peeking at the list of aliases for a giveaway alias like `#freenode_` or
looking for bridge bots or users with a `@_discord_` prefix. This is an unacceptable situation,
as it assumes prior knowledge of these networks and an understanding of how bridges operate.

## Proposal

This proposal attempts to address this problem by providing a single state event for each bridge in a room
to announce which channels have been bridged into a room.

It should be noted that this MSC is intended to provide the baseline needed to display information about
a bridge, and nothing more. See the "Future MSCs" section for more information.

This proposal is heavily based upon my previous attempt [#1410](https://github.com/matrix-org/matrix-doc/issues/1410)
albeit with a notably reduced set of features. The aim of this proposal is to offer information about the
bridged network and nothing more.

### `m.bridge`

```js
{
"state_key": "org.matrix.appservice-irc://{protocol.id}/{network.id}/{channel.id}",
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved
turt2live marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be the format of the state key if network ID/channel ID are omitted? e.g. foo.bar.appservice-skype://skype//someid or foo.bar.appservice-skype://skype/someid? (do we just drop the component and keep all the slashes, or do we collapse slashes?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably makes sense for the format of what comes after :// to be dependent on the value before ://. Here {protocol.id}/{network.id}/{channel.id} would be the format for org.matrix.appservice-irc://; but for foo.bar.appservice-skype:// it would always be {protocol.id}/{chat.id}

"type": "m.bridge",
"content": {
"bridgebot": "@appservice-irc:matrix.org",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be optional? You can imagine that some bridges could just use the bridge users as bridge bots. For example if you bridge a 1:1 chat from Telegram you can just use the remote user as the "bridge bot".

"creator": "@alice:matrix.org", // Optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? Widgets already suffer from this being unreliable and unhelpful, to the point of us ignoring it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to the point of us ignoring it.

image

How's that going for you?

This is actually the same use case that widgets are using it for right now, it's just sugar to point at whoever added the bridge, if it was added by a user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you may have been referring to it duplicating the sender field. This is intentional, for the plumbing use case. I don't see any reason why this would be unreliable though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really trusted information, though I guess if somehow the bridge is limited to being able to send it then it can be trusted. The reason we don't use creatorUserId for widgets, even if someone else edits the widget, is because it is displayed so prominently and can cause lies to be shown to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The major concern here is essentially a room admin/mod hopping into the room and changing the state event to say that @bob:example.org is responsible for the bridge, who may or may not even be a real user ID. One of the downsides for Bob in that case would be spam about bridging to somewhere controversial. In a general scenario though, as an informational thing it's probably okay, but it would be nice to acknowledge the privacy/social risks in the security section.

"protocol": {
"id": "irc",
"displayname": "IRC", // Optional
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved
"avatar_url": "mxc://foo/bar", // Optional
"external_url": "https://example.com" // Optional
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved
},
"network": { // Optional
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved
"id": "freenode",
"displayname": "Freenode", // Optional
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved
turt2live marked this conversation as resolved.
Show resolved Hide resolved
"avatar_url": "mxc://foo/bar", // Optional
"external_url": "irc://chat.freenode.net" // Optional
},
"channel": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why are we calling this a channel? In Matrix we call these rooms so doesn't it make sense to use the local terminology. Basically "the thing in the source protocol which maps to a room in Matrix". For example if the remote network has groups and Matrix has rooms why are we calling the field channel?

"id": "#friends",
"displayname": "Friends", // Optional
"avatar_url": "mxc://foo/bar", // Optional
"external_url": "irc://chat.freenode.net/#friends" // Optional
}
},
"sender": "@appservice-irc:matrix.org"
}
```

The `state_key` must be comprised of the bridge's prefix, followed by the `protocol.id`, followed by the `network.id`,
followed by the `channel.id`. Any `/`s must be escaped into `%2F`. The bridge prefix can be anything, but should uniquely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to escape % to %25.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should say that the IDs should be URL-encoded, including encoding / to %2F?

identify the bridge software. E.g. The matrix.org IRC bridge `matrix-org/matrix-appservice-irc`
becomes `org.matrix.appservice-irc`. This is to help distinguish two bridges on different softwares which may conflict.

The `bridgebot` should be the MXID of the bridge bot. It is important to note that `sender` should not be presumed to be
the bridge bot. This is because room upgrades, other bridges or admins could also set the state in the room on behalf of
the bridge bot.

The `creator` field is the name of the *user* which provisioned the bridge. In the case of alias based bridges, where the
creator is not known -- it should be omitted.

The `protocol` field describes the protocol that is being bridged. For example, it may be "IRC", "Slack", or "Discord". This
benparsons marked this conversation as resolved.
Show resolved Hide resolved
field does not describe the low level protocol the bridge is using to access the network, but a common user recongnisable
name.

The `network` field should be information about the specific network the bridge is connected to.
It's important to make the distinction here that this does *NOT* describe the protocol name, but the specific network
the user is on. For protocols that do not have the concept of a network, this field may be omitted.

The `channel` field should be information about the specific channel the room is connected to.

The `id` field is case-insensitive and should be lowercase. Uppercase characters should be escaped (e.g. using QP encoding
or similar).The purpose of the id field is not to be human readable but just for comparing within the same bridge type,
hence no encoding standard will be enforced in this proposal.

The `network`, `channel` and `protocol` fields can contain `displayname` and `avatar` keys. The `displayname` is meant to
be a human readable identifier for the item in question, whereas the ID should be a unique identifer relevant to the protocol.
The `id` should be used in place of a `displayname`, if not given. The `avatar` key is a MXC URI which refers to an image
file, similar to a user or room avatar.

The `external_url` key is a optional link to a connected channel, network or protocol that works in much the same way as
`external_url` works for bridged messages in the AS spec.

In terms of hierachy, the protocol can contain many networks, which can contain many channels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have examples of the protocol.id/network.id/channel.id part of the state key? e.g. for XMPP, would it be XMPP/xsf@muc.xmpp.org or XMPP//xsf@muc.xmpp.org (i.e. do we drop a / if there's no network field? What if there are more levels (e.g. protocol, network, community, room)? Is the network supposed to be human-readable like "Freenode" or something more like "freenode.org"? Maybe it would be better to just say that it's a path representing the hierarchy starting from the protocol and ending in the room? And then instead of hard-coding protocol, network, and channel as keys in the content, make it an array where the first element is the protocol and the last element is the channel?

Also, if a protocol/network name has a / in it, does it get escaped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added several examples of the event body to the description that might help make this more understandable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then instead of hard-coding protocol, network, and channel as keys in the content, make it an array where the first element is the protocol and the last element is the channel.

Potentially, though I am worried this might make it harder for clients to show a sensible UI. How would riot render:

["Github", "matrix-org", "matrix-doc", "2346"]
vs
{"protocol": "GitHub", "network": "matrix-org/matrix-doc", "channel": "2346"}

(Simplified for readability, and using a deliberately complex example).

In the first example, there are 4 keys and it's hard for a client to decide how to format this in a settings page. Joining them with a delimiter is too ugly (to me). There are probably examples which are restricted by the 3 component limit, but I am struggling to come up with any?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the client actually need to care about the state key? The information embedded into the state key is already available in the body, and clients would want to use a more friendly name anyways. I think we can just use an empty state key and let clients figure it out.

If you're running multiple bridges off the same bridge bot, don't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're running multiple bridges off the same bridge bot, don't.

Or if you are running several bridges off different bridge bots, you will still need different state events and therefore different keys.

Why can't we use user_ids? We could, but that does forever tie your bridge to using one user_id for life when the actual thing the bridge is "keyed" off is the protocol,network and channel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client doesn't need to give a heck about the state key, other than it being more readable for those who want to work on it. Given there isn't really a downside to having a schema for the state key, and it gives more readability, I don't see why not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would much rather that we don't prescribe the state key. Just say that it needs to start with the bridge ID (unique prefix) and the rest is implementation defined. Any meaningful values should be delegated to the context of the state object.

### Removing a bridge

When removing a bridge, you simply need to send a new state event with the same `state_key` with a `content` of `{}`. This
is because matrix does not yet have a mechanism to remove a state event in it's entireity.

### Example Content

#### XMPP

An example of a straight forward messaging bridge, such as the XMPP (bifrost) bridge:

```js
{
"state_key": "org.matrix.matrix-bifrost://xmpp/muc.xmpp.org/xsf@muc.xmpp.org",
"type": "m.bridge",
"content": {
"creator": "@alice:matrix.org",
"bridgebot": "@xmpp:matrix.org",
"protocol": {
"id": "xmpp",
"displayname": "XMPP"
},
"network": {
"id": "muc.xmpp.org",
"displayname": "XSF",
"external_url": "xmpp:muc.xmpp.org"
},
"channel": {
"id": "xsf@muc.xmpp.org",
"displayname": "XSF Discussion",
"external_url": "xmpp:xsf@muc.xmpp.org"
}
},
"sender": "@xmpp:matrix.org"
}
```

#### GitHub

An example of a non-messaging bridge, such as the GitHub bridge:

```js
{
"state_key": "uk.half-shot.matrix-github://github/matrix-org%2Fmatrix-doc/2346",
"type": "m.bridge",
"content": {
"creator": "@alice:matrix.org",
"bridgebot": "@github:matrix.org",
"protocol": {
"id": "github",
"displayname": "GitHub"
},
"network": {
"id": "matrix-org/matrix-doc",
"external_url": "https://github.com/matrix-org/matrix-doc"
},
"channel": {
"id": "2346",
"displayname": "MSC2346: Bridge information state event",
"external_url": "https://github.com/matrix-org/matrix-doc/pull/2346"
},
"uk.half-shot.matrix-github.merged": false,
"uk.half-shot.matrix-github.opened_by": "Half-Shot",
},
"sender": "@github:matrix.org"
}
```

#### Mastodon feed

An example of a feed oriented bridge.

```js
{
"state_key": "org.matrix-org.matrix-mastodon://mastodon/mastodon.matrix.org/@matrix",
"type": "m.bridge",
"content": {
"creator": "@alice:matrix.org",
"bridgebot": "@mastodon:matrix.org",
"protocol": {
"id": "mastodon",
"displayname": "Mastodon"
},
"network": {
"id": "mastodon.matrix.org",
"external_url": "https://mastodon.matrix.org"
},
"channel": {
"id": "@matrix",
"displayname": "Matrix.org",
"external_url": "https://mastodon.matrix.org/@matrix"
},
"org.matrix-org.matrix-mastodon.bio": "An open standard for decentralised persistent communication. Toots by @matthew, @Amandine & co.",
"org.matrix-org.matrix-mastodon.joined": "May 2017",
},
"sender": "@mastodon:matrix.org"
}
```

Note the `@` in this case helps distinguish the type of channel. Here the protocol used is "Mastodon" rather than "ActivityPub".
While the underlying protocol might indeed be ActivityPub, the choice of name should be recognisable to users.

## Potential issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing: what if the bridge doesn't have permission to send state events? (a completely valid thing to do)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such is life. I don't think we can engineer around users not giving their bridges permission to add state. It's not great, but it's better than adding lots of sketchy outside-of-the-room data(though matrix.org bridges in both portal and plumbed rooms have PL50 by default, so this is a relatively unlikely).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a lot more concerned with the bridges that aren't on matrix.org, given the popularity of bridges in the last year alone (and the year prior to this thread). Most bridges don't ask for permissions, and a quick poll shows that most non-matrix.org bridges appear to be running without appropriate permissions in those rooms.

This can be considered a room configuration error, but it's still a valid issue that this MSC needs to acknowledge. There's a point where we can't just write off issues as "users should be better".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing: what if the bridge doesn't have a bridge bot? (puppet bridges, transparent bridges, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is more of a "well, meh". There may be a way to solve this in a future MSC, but let's leave it for then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure that's the best option: can we figure out a way to publish this info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Either a user the bridge has ownership over publishes the information, or a user publishes the information on behalf of the bridge. I don't think it's worth speccing this though, as it's down to implementation how they want to insert the event.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to be a state event. Could start publishing this over EDUs or some other DAG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But having it in the room dag and a state event allows us to reuse pretty much all logic across the homeservers/clients/bridges. This is important, because we can not only reuse the same API routes when synchronizing the information across clients and bridges, but also re-use the same access control semantics as with other information in the room like names and topics (using PL events).

I would like to hear from others if they also think supporting the use case of a bridgebot-less bridge is important and requires us to invent our own non-room dag or EDU structure. Personally, it feels like a lot of faff for little gain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note in the MSC about how the non-bridge-bot user can publish this would be great.

Half-Shot marked this conversation as resolved.
Show resolved Hide resolved

### We do not specify a "bridge type".

The proposal intentionally sidesteps the 'bridge type' problem: Codifing what a portal, plumbed and gatewayed bridge look
like in Matrix. For the time being, the event will not contain information about the type of bridge in a room but merely
information about what is is connected to.

### Anyone* can send this event into the room.

This kinda goes for *any* event in Matrix, there is no way to determine a bridge across federation. The difference here
is that we at least require the user for the ability to send state events into the room. If you are allowed to send
arbitrary state events into the room, it's assumed you are somewhat trusted.

## Alternatives

Some thoughts have been thought on using the third party bridge routes in the AS api to get bridge info,
by calling a specalised endpoint. There are many issues with this, such as the routes not working presently
over federation, as well as requring the bridge to be online. Using a state event ensures the data is scoped
per room, and can be synchronised and updated over federation.

## Future MSCs
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved

(This section is for the beneift of readers to understand why this MSC doesn't contain X feature)
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved

This proposal forms the basis for bridges to become more interactive with clients as first class citizens rather
than relying upon users having prior knowledge about which users are bridged users, or where a room is bridged to.

Future MSCs could expand the /publicRooms response format to show what network a room is bridged to before the
user attempts to join it. Another potential MSC could allow users to see which bridges they are connected to
via an accounts settings page, rather than relying on PMs to the bridge bot.

## Security considerations

Anybody with the correct PLs to post state events will be able to manipulate a room by sending a bridge
event into a room, even if the bridge is not present or does not exist. It goes without saying that if
you let people modify your room state, you need to trust them not to mess around. A future MSC may allow
users to "trust" some mxids as bridges, rather than relying on just PLs to convey trustworthiness.


## Implementation notes

This proposal is partially implemented by [Riot](https://github.com/vector-im/riot-web) and the
[IRC Bridge](https://github.com/matrix-org/matrix-appservice-irc) using the `uk.half-shot.*` namespace
until this becomes stable. Therefore `m.bridge` becomes `uk.half-shot.bridge`.