Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3266: Room summary API #3266

Open
wants to merge 36 commits into
base: old_master
Choose a base branch
from

Conversation

deepbluev7
Copy link
Contributor

@deepbluev7 deepbluev7 commented Jul 4, 2021

Rendered.

Somewhat related to MSC2946 this provides an API to get a summary for a specific room either from the local server or over federation.

Useful for a few cases, where you need to show a summary for a room like matrix.to, traveler bots, showing spaces, lightweight clients, etc.

Open design questions looking for feedback: see this comment chain: #3266 (comment) resolved as of 2021/10/06

Implementations:

Signed-off-by: Nicolas Werner nicolas.werner@hotmail.de


SCT stuff:

Shepherd: @richvdh
FCP tickyboxes

Checklist: #3266 (comment)

Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
@deepbluev7 deepbluev7 changed the title Room summary proposal MSC3266: Room summary proposal Jul 4, 2021
@deepbluev7 deepbluev7 changed the title MSC3266: Room summary proposal MSC3266: Room summary Jul 4, 2021
@turt2live turt2live added client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal proposal-in-review labels Jul 5, 2021
@deepbluev7 deepbluev7 changed the title MSC3266: Room summary MSC3266: Room summary API Jul 11, 2021
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
…est of the path separate

Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
That way the requesting server knows, if any user would have access to
that room and it can forward the room to the user.

Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
@turt2live turt2live requested a review from KitsuneRal January 9, 2025 05:58
deepbluev7 and others added 3 commits January 10, 2025 20:56
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
It suffers from the same knock_restricted issue.
@ara4n
Copy link
Member

ara4n commented Jan 15, 2025

I've given this a green tick in order not to block its progress, given the MSC as a whole is definitely useful and very proven at this point. @deepbluev7 can you please deal with the remaining open PR comments though (e.g. spelling out what 'public' means; etc)?

@cyb3rko
Copy link

cyb3rko commented Jan 20, 2025

Mautrix has also implemented it now: https://github.com/mautrix/go/releases/tag/v0.23.0

@richvdh richvdh self-requested a review February 7, 2025 10:32
@richvdh
Copy link
Member

richvdh commented Feb 18, 2025

This has got stuck waiting for updates once more, so in the interests of pushing it over the line, the Spec Core Team are going to make the final few updates.

richvdh and others added 3 commits February 18, 2025 14:42
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
guest_can_join: false,
name: "CHEESE",
num_joined_members: 37,
topic: "Tasty tasty cheese",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is an entirely new API, could it make sense to use the rich topic format from #3765 here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the value of having rich text in the room summary. The way this API is utilised implies brief representation. As a client author, I would prefer the topic reduced to plain text. Maybe it's just me though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3765 argued along the same lines though that was partially motivated by not wanting to touch the existing endpoints. 🙈

I think it's probably fine to keep this as plain text. Just wanted to raise it because changing it now would still be easy due to this being a new API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think basically, we need to keep /room_summary in sync with GET /_matrix/federation/v1/hierarchy/{roomId}, so that the information available for remote rooms matches that for local rooms.

Given we haven't yet changed the spec for GET /_matrix/federation/v1/hierarchy/{roomId} to use rich topics (beyond preferring the plain-text representation of m.topic over topic), I don't think we should do so for /room_summary either. In future, we can update both together.

That said, this MSC probably does need clarifying that it should prefer m.topic over topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason why this MSC should add support for rich text topics, when they are not supported by the hierarchy API and public rooms API. It would make more sense to me to add that to all of those APIs in a targeted MSC, if there is a use case for it. I don't really see one right now.

I also don't see why this MSC needs to clarify if it is using m.topic over topic now. That stuff didn't exist when this MSC was written and this MSC already clarifies where it is taking the fields from (the hierarchy API and related to the public rooms API). At some point we need to stop clarifying every little detail, it causes needless churn on the MSC and most of those things are already obvious enough. This MSC doesn't care about those changes and the rich text topic MSC explicitly mentions that servers should switch to m.topic. The MSC was merged yesterday and is not part of the spec yet. This MSC references the spec. If I now update it to reference some MSC, I will get to update it in a few months again to reference the spec and so on. At some point people will just have to consider some context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't see why this MSC needs to clarify if it is using m.topic over topic now ... this MSC already clarifies where it is taking the fields from (the hierarchy API and related to the public rooms API)

Yup, this is fair, on closer inspection: this MSC just says "These are the same fields as those returned by /publicRooms ..." which seems clear enough.

Comment on lines +113 to +117
It should be possible to call this API without authentication, but servers may
rate limit how often they fetch information over federation more heavily, if the
user is unauthenticated. Being able to call this API unauthenticated is
beneficial to avoid third parties registering guest users for one-shot API
calls. Restricting this API to guests only would provide no security benefit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should server admins be allowed to disable returning results on this API similar to how profile look-ups can be disabled?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be an implementation detail for servers, no? Nothing would change in the API whether a server has such an internal control or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't change the API (beyond error codes maybe). But we do explicitly call out other cases where admins can deny look-up such as: https://spec.matrix.org/unstable/client-server-api/#server-behaviour

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call it out, and indeed I am very hesitant to recommend unauthenticated access by default. We probably do want to allow this to be unauthenticated for e.g. matrix.org as a service to the community, but in general I don't think homeservers should not be supporting third-party services out-of-the-box. Allowing unauthenticated access opens up the server to abuse, which not all server admins can or want to deal with.

I think there are two benefits from having this unauthenticated:

  1. Third-party services that want to piggy-back of a well known public home server
  2. A client wants to show a preview of a room before the user logs in.

Either way, these depend on using a pre-configured default server, which can then be specifically configured to allow unauthenticated access.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. a sane invite experience not exclusive to publicly federating homeservers via a landing page like matrix.to - otherwise the experience will be very normie unfriendly with just a room ID visible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all of these are solvable issues (and there is some disagreement about how much each actually matters), but I think the larger point stands that we should only consider adding unauthenticated APIs if there is a need to do so.

The need to do so is demonstrated by @t3chguy's point about wanting to offer a good experience in matrix.to. We've established that there are alternatives, but none of them are great. So we're not just adding unauthenticated APIs for a laugh: we have a usecase. The question is, do the downsides outweigh the upsides? It's not as black and white as this sentence implies.

  1. Information leaking: fundamentally you're giving information out to potential attackers.

Yeah, this is compelling. As things currently stand: suppose I run a homeserver where I have disabled federation. On that homeserver I have a few rooms where any of my users can join (join_rules: public). I trust my users, but I don't want randoms on the internet to see my room names. This MSC makes that room metadata available to anyone on the internet, which I think would be a surprise to most admins in that situation.

For me, this is an absolute blocker to landing the MSC as it currently stands.

  1. It makes it easy to rack up costs.

I think the actual cost is debatable, and caching/ratelimiting could be improved iteratively later. This is not a compelling argument for me.

  1. This endpoint can trigger federation requests,

This feels like a specific case of point 2.

  1. ... create a public room with illicit content on e.g. matrix.org. Then any server with unauthenticated summary endpoint enabled would return (and potentially cache) that data.

I don't think MSC4108 is particularly relevant to this discussion, but the second half of your point is important. This feels a lot like one of the problems we tried to fix in authenticated media: I don't want randoms in the internet pulling evil room topics onto my server.

  1. From a server admin PoV, in most deployments you mostly trust your own users so you don't need to worry too much about abuse. If you see abuse its easy enough to lock out a particular user and move on with your life. Admining a server that can be used by unauthenticated third-party services can be more tedious if you have to deal with IP-locking out things, or abuse requests from your server provider, etc. We to an extent see these problems already with the federation APIs.

Given we already have these problems with federation APIs, maybe we don't need to worry about it too much? Or is your point that it will even affect admins who have disabled federation (and thus hope to be exempt from such abuse?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking on behalf of the SCT: we don't have significant time to spend on this MSC, either in terms of changing the Synapse implementation, or in terms of changing matrix.to. Nevertheless, the current situation, of matrix.org supporting an unstable endpoint is, IMHO, untenable: we need to find a way forward, and in a way that does not block this MSC for another 3.5 years.

I can see for and against allowing unauthenticated access, but I think there are some real blockers to it, and right now, I am minded to proceed as suggested earlier.

Instead, it is proposed to recommend that homeservers make the endpoint require authentication by default. Homeservers would remain free to allow anonymous access if they wish, and anonymous access would remain enabled on matrix.org.

I appreciate that this may not be the ideal outcome for some , especially matrix.to, but I think it's acceptable, and critically, it is no worse, by any measure, than where we are today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fundamentally disagree, that this endpoint should be available only with authentication by default. There are several users of the API already, that are using it unauthenticated (shields.io, matrix.to). So there clearly is a use case. Those users also all allow you to overwrite which server you want to use for the preview, because even if those services are centralized, Matrix is not. Having every preview go through matrix.org violates the design principles behind Matrix and matrix.to. matrix.to goes through a lot of effort to not leak the room addresses or other data to the server hosting matrix.to. As such people should generally have the choice to use their own servers for previews.

The load generated by this API is a LOT lower than what we had before with guest users for previews. This API was already mentioned as a replacement for those use cases, where people were using guest users before. Requiring authentication for this API now would violate that announcement (although yes, matrix.org could make an exception).

If servers want to disable unauthenticated access for this in a private federation, that is fine. But in an open federation we generally consider public rooms public. You aren't providing any additional security by requiring authentication for this API. All of that data is already available to anyone, who can set up a homeserver. While that is a minuscule amount of work and as such harder than calling an unauthenticated API directly, it won't prevent anyone from getting access to this data.

And the same issue exists in the other direction. Yes, unauthenticated access might cache unwelcome data for a certain amount of time. But to access that data someone would need to call the same API. And that data is deleted automatically, since it is just a cache, not persisted forever like media used to be for most people. Additionally that data can be pushed to a server also by simply sending an invite.

So my opinion is, that if servers want to disable unauthenticated access because they have special federation needs, that is absolutely fine. But they will have to do more work to actually prevent any of the abuse mentioned so far (like only allowing certain domains to federate like in the TI-Messenger federation). Otherwise requiring authentication doesn't technically provide any additional security, doesn't make less data available, but does make Matrix more centralized and this API less useful for use cases already relying on this API. If you want to provide more security, you need to lock down the federation API more and provide restrictions for invites and similar APIs. Requiring authentication for this endpoint makes only a superficial difference.

The authentication requirement of this API should be decided upon by whoever hosts the server and I don't think the spec can make a recommendation either way. It depends too much on what group you are running the server for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If servers want to disable unauthenticated access for this in a private federation, that is fine. But in an open federation we generally consider public rooms public. You aren't providing any additional security by requiring authentication for this API. All of that data is already available to anyone, who can set up a homeserver. While that is a minuscule amount of work and as such harder than calling an unauthenticated API directly, it won't prevent anyone from getting access to this data.

This is true but I think allowing unauthenticated access via the client-server API lowers the bar substantially. Everyone with a browser can query this endpoint. Critically, in a private federation those are users you don't control and may not want to share data with. I don't have a strong opinion on whether we default to authenticated access with the option to allow unauthenticated access or the other way around. But I think as an admin I should be able to lock down this API on my own server.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we can accommodate both points of view by simply making it clear that allowing unauthenticated access is left up to the server implementer and admin, as @Johennes stated at the start of this thread.

Comment on lines +113 to +117
It should be possible to call this API without authentication, but servers may
rate limit how often they fetch information over federation more heavily, if the
user is unauthenticated. Being able to call this API unauthenticated is
beneficial to avoid third parties registering guest users for one-shot API
calls. Restricting this API to guests only would provide no security benefit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call it out, and indeed I am very hesitant to recommend unauthenticated access by default. We probably do want to allow this to be unauthenticated for e.g. matrix.org as a service to the community, but in general I don't think homeservers should not be supporting third-party services out-of-the-box. Allowing unauthenticated access opens up the server to abuse, which not all server admins can or want to deal with.

I think there are two benefits from having this unauthenticated:

  1. Third-party services that want to piggy-back of a well known public home server
  2. A client wants to show a preview of a room before the user logs in.

Either way, these depend on using a pre-configured default server, which can then be specifically configured to allow unauthenticated access.

(1) The `membership` field will not be present when called unauthenticated, but
is required when called authenticated. It should be `leave` if the server
doesn't know about the room, since for all other membership states the server
would know about the room already.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the server doesn't know about the room, don't we return 04 as per the previous paragraph?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It looks like the 404 paragraph was added in ed79007 (and later clarified in 424189c), but this leave requirement was forgotten.

Looking at the Synapse implementation, it looks like it indeed returns a 404 if the server doesn't know about the room. I propose to remove this bit about leave.

@deepbluev7: any objections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is specifically about when the room info is fetched over federation. In those cases no membership information is available on the local server, but it can reasonably assume none of its local users are joined or invited. While that is obvious, I thought it deserved spelling this out explicitly, since servers shouldn't just omit the membership key in that case. It probably makes more sense to change doesn't know about the room to is not participating in the room or does not have any local users in the room or something about having had to fetch the summary over federation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok well, let's change it to "It should be leave if the server does not have any local users in the room" if you think it's important to have something here, though IMHO it's clearer with nothing.

These are the same fields as those returned by [`/publicRooms`](https://spec.matrix.org/v1.13/client-server-api/#get_matrixclientv3publicrooms) or
[`/hierarchy`](https://spec.matrix.org/v1.13/client-server-api/#get_matrixclientv1roomsroomidhierarchy)
, with a few additions: `membership`, `room_version`,
`encryption` and `allowed_room_ids`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed_room_ids isn't mentioned in the format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, which format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in the example above? Yes, because the join_rule is public. It is explained in the table below though, which also says that the field is optional. Does this need an example specifically with the room_ids, if the /hierarchy API already uses that field?

Comment on lines +43 to +45
Note that rooms the user has been invited to or knocked at might result in
outdated or partial information, since the the homeserver may not have access
to the current state of the room.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Wouldn't the server just go & get the latest data the same as it would if they weren't invited?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. @deepbluev7: can you shed any light here?

Looking at the synapse implementation, the logic goes:

  • If the server has at least one user whose membership is join, generate the summary locally
  • Otherwise, call out to GET /_matrix/federation/v1/hierarchy/{roomId} (or the local cache).

I don't see any reason that rooms the user has been invited to or knocked at would be special here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to fetch a room summary over federation, that isn't publicly joinable and a user has been invited to. How would it fetch a more up to date state? That is not how invites currently work on Matrix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmhmm. So, in the current implementation: if you you have been invited to a remote room that isn't publicly joinable, /summary will return an 404 M_NOT_FOUND for that room, right?

Which I guess is kinda "outdated or partial information".

Comment on lines +198 to +215
### The `sync` API

For joined rooms, the `/sync` API can be used to get a summary for all joined
rooms. Apart from not working for unjoined rooms, like knocks, invites and space
children, `/sync` is very heavy for the server and the client needs to cobble
together information from the `state`, `timeline` and
[`summary`](https://github.com/matrix-org/matrix-doc/issues/688) sections to
calculate the room name, topic and other fields provided in this MSC.

Furthermore, the membership counts in the summary field are only included, if
the client is using lazy loading. This MSC provides similar information as
calling `/sync`, but to allow it to work for unjoined rooms it only uses information
from the stripped state. Additionally, it excludes `m.heroes` as well as membership
events, since those are not included in the stripped state of a room. (A client
can call `/joined_members` to receive those if needed. It may still make sense
to include heroes so that clients could construct a human-friendly room display
name in case both the name and the canonical alias are absent; but solving the
security implications with that may better be left to a separate MSC.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is really an alternative, or at least not one that needs 2 paragraphs of discussion. It's not a problem that it's here, but just as feedback for the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mentioned as an alternative, because people brought up "just call sync" when I initially discussed this MSC with people. This section is an argument in that discussion. It obviously isn't a good alternative, but people needed that spelled out at the time and I didn't want to have this argument multiple times.

federation API and a client could then easily request a summary of all joined
rooms. It could still request the summary of a single room by just including
only a single room in the POST or a convenience GET could be provided by the
server (that looks like this proposal).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like the conclusion is that we should do this instead? Why wouldn't we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess just because there isn't a usecase for a batch API here, and frankly opening up the endpoint to use more resources makes it an even harder sell to land.

@deepbluev7 any other insights?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far nobody said they would benefit from a batch API apart from one use cases, that wanted an easy way to show a room list for the /joined_rooms response. The vast majority of use cases I have seen only query one room. Batching is simply more complicated since you also might not be able to pass vias as query parameters, etc. Unless someone provides a use case, I don't see why a batched API is a better design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fair not to specify a batch API on the premise of lacking use cases and increased complexity. The alternative, as written, doesn't include this reasoning, however. In my understanding the main reason for writing down alternatives is to prove that the proposed solution is the best solution by way of showing that all known alternatives are inferior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guessed this was probably the answer, I was more pointing out that your "alternative" sections should probably end with why you didn't choose this option (even if that reason is something like, "it doesn't matter at all but we had to pick one so we picked this one").

@richvdh
Copy link
Member

richvdh commented Feb 25, 2025

@mscbot concern should unauthenticated access be enabled by default?

@mscbot mscbot added the unresolved-concerns This proposal has at least one outstanding concern label Feb 25, 2025
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API disposition-merge kind:feature MSC for not-core and not-maintenance stuff proposal A matrix spec change proposal proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period. unresolved-concerns This proposal has at least one outstanding concern
Projects
Status: Ready for FCP ticks
Development

Successfully merging this pull request may close these issues.