From 3b378675c3d10d6f96f9c4026fab4a034f7148e5 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Wed, 28 Jun 2023 18:14:31 -0500 Subject: [PATCH] Update FAQ to explain `world_readable` only (#277) Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/239 --- docs/faq.md | 66 ++++++++++++++++++++++++++--------------------------- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/docs/faq.md b/docs/faq.md index 3c5fda0..6725558 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -19,54 +19,54 @@ messages from any given date and day-by-day navigation. ## Why did the archive bot join my room? -Only public Matrix rooms with `shared` or `world_readable` [history -visibility](https://spec.matrix.org/latest/client-server-api/#room-history-visibility) are -accessible in the Matrix Public Archive. In some clients like Element, the `shared` -option equates to "Members only (since the point in time of selecting this option)" and -`world_readable` to "Anyone" under the **room settings** -> **Security & Privacy** -> -**Who can read history?**. +Only Matrix rooms with `world_readable` [history +visibility](https://spec.matrix.org/latest/client-server-api/#room-history-visibility) +are accessible in the Matrix Public Archive and indexed by search engines. But the archive bot (`@archive:matrix.org`) will join any public room because it doesn't -know the history visibility without first joining. Any room without `world_readable` or -`shared` history visibility will lead a `403 Forbidden`. And if the public room is in -the room directory, it will be listed in the archive but will still lead to a `403 -Forbidden` in that case. +know the history visibility without first joining. Any room that doesn't have +`world_readable` history visibility will lead a `403 Forbidden`. The Matrix Public Archive doesn't hold onto any data (it's stateless) and requests the messages from the homeserver every time. The [archive.matrix.org](https://archive.matrix.org/) instance has some caching in place, 5 minutes for the current day, and 2 days for past content. -The Matrix Public Archive only allows rooms with `world_readable` history visibility to -be indexed by search engines. See the [opt -out](#how-do-i-opt-out-and-keep-my-room-from-being-indexed-by-search-engines) topic -below for more details. - -### Why does the archive user join rooms instead of browsing them as a guest? - -Guests require `m.room.guest_access` to access a room. Most public rooms do not allow -guests because even the `public_chat` preset when creating a room does not allow guest -access. Not being able to view most public rooms is the major blocker on being able to -use guest access. The idea is if I can view the messages from a Matrix client as a -random user, I should also be able to see the messages in the archive. - -Guest access is also a much different ask than read-only access since guests can also -send messages in the room which isn't always desirable. The archive bot is read-only and -does not send messages. +See the [opt out +section](#how-do-i-opt-out-and-keep-my-room-from-being-indexed-by-search-engines) below +for more details. ## How do I opt out and keep my room from being indexed by search engines? -Only public Matrix rooms with `shared` or `world_readable` history visibility are -accessible to view in the Matrix Public Archive. But only rooms with history visibility -set to `world_readable` are indexable by search engines. +Only Matrix rooms with `world_readable` [history +visibility](https://spec.matrix.org/latest/client-server-api/#room-history-visibility) +are accessible in the Matrix Public Archive and indexed by search engines. One easy way +to opt-out is to change your rooms history visibility to something else if you don't +intend for your room be world readable. -Also see https://github.com/matrix-org/matrix-public-archive/issues/47 to track better -opt out controls. +Dedicated opt-out controls are being tracked in +[#47](https://github.com/matrix-org/matrix-public-archive/issues/47). -As a workaround for [archive.matrix.org](https://archive.matrix.org/) today, you can ban -the `@archive:matrix.org` user if you don't want your room content to be shown in the +As a workaround for [archive.matrix.org](https://archive.matrix.org/), you can ban the +`@archive:matrix.org` user if you don't want your room content to be shown in the archive at all. +### Why does the archive user join rooms instead peeking in the room or using guests? + +Since the archive only displays rooms with `world_readable` history visibility, we could +peek into the rooms without joining. This is being explored in +[#272](https://github.com/matrix-org/matrix-public-archive/pull/272). But peeking +doesn't work when the server doesn't know about the room already (this is commonly +referred to as federated peeking) which is why we have to fallback to joining the room +in any case. We could solve the federated peeking problem and avoid the join with +[MSC3266 room summaries](https://github.com/matrix-org/matrix-spec-proposals/pull/3266) +to check whether the room is `world_readable` even over federation. + +Guests are completely separate concept and controlled by the `m.room.guest_access` state +event in the room. Guest access is also a much different ask than read-only access since +guests can also send messages in the room which isn't always desirable. The archive bot +is read-only and does not send messages. + ## Technical details The main readme has a [technical overview](../README.md#technical-overview) of the