-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't allow previewing shared
history rooms
#239
Conversation
Only `world_readable` can be considered as opting into having history publicly on the web. Anything else must not be archived until there's a dedicated state event for opting into archiving.
// Only `world_readable` or `shared` rooms that are `public` are viewable in the archive | ||
const allowedToViewRoom = | ||
roomData.historyVisibility === 'world_readable' || | ||
(roomData.historyVisibility === 'shared' && roomData.joinRule === 'public'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having public shared
rooms viewable but not indexed by search engines is by design.
My reply in the opt-out issue probably explains this the best so far:
"archived" is a bit of a overloaded term here but given that this project is called "Matrix Public Archive" I can see where the confusion may be be coming from. Any public room should be viewable in Matrix Public Archive. The idea is if a random Matrix user can view the room, then it should be viewable in the archive. But only
history_visibility: "world_readable"
rooms are indexable by search engines.The Matrix Public Archive doesn't hold onto any data (it's stateless) and requests the messages from the homeserver every time (it archives nothing). The archive.matrix.org instance has some caching in place, 5
minutes for the current day, and 2 days for past content.I've tried to clarify more of this in the FAQ document and added more details on why not guest access/peeking.
Banning
@archive:matrix.org
will prevent the room from showing up on archive.matrix.org and the cache will expire after 5-minutes/2-days for any content that is showing there now. Adding better opt-out controls like this issue is discussing is on the list 👍. I've updated the description with the current MSC proposals out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We plan to move ahead with this PR to remove shared
public rooms from the archive ⏩
The points in favor of keeping shared
accessible can be summarized by the following but in the end, while the archive bot was respectful of the technical implications and doesn't expose messages to any further audience (random people), there is a social obligation to consider. This option has been represented as "members only" in many clients which doesn't leave any nuance.
Otherwise, the main idea was if I can view the messages from a Matrix client as a random user, I should also be able to see the messages in the archive. In both the native Matrix client and archive cases, it’s the same result when a random user wants to view a shared
room:
- A random Matrix user accesses the room, they see the history
- A random user accesses the room in the archive, they see the history
- Search engines are not allowed in either case (that only applies to
world_readable
rooms)
The join is mostly a technical detail to anyone trying to view the room. While I don't think the join event provides much value to the room in the normal cases, it could have benefit in tracing bad actors for moderation.
From the spec:
world_readable
- All events while this is them.room.history_visibility
value may be shared by any participating homeserver with anyone, regardless of whether they have ever joined the room.shared
- Previous events are always accessible to newly joined members. All events in the room are accessible, even those sent when the member was not a part of the room.
Removing shared
rooms, does mean we’re re-introducing friction for a portion people which the archive eliminates (which homeserver do I choose, which client, why do I even need an account, how do I view this on mobile, how do I reference and share this message to someone not already in the Matrix ecosystem, etc). But people can update their room to be world_readable
as they see fit now to regain these benefits for their community.
This comment was marked as off-topic.
This comment was marked as off-topic.
@@ -155,7 +155,6 @@ const fetchRoomData = traceFunction(async function ( | |||
stateCanonicalAliasResDataOutcome, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution and patience @tulir 🙇 🐦
Happens to address part of #271 but made primarily as a follow-up to #239 --- Only 42% rooms on the `matrix.org` room directory are `world_readable` which means we will get pages of rooms that are half-empty most of the time if we just naively fetch 9 rooms at a time. Ideally, we would be able to just add a filter directly to `/publicRooms` in order to only grab the `world_readable` rooms and still get full pages but the filter option doesn't allow us to slice by `world_readable` history visibility. Instead, we have to paginate until we get a full grid of 9 rooms, then make a final `/publicRooms` request to backtrack to the exact continuation point so next page won't skip any rooms in between. --- We had empty spaces in the grid before because some rooms in the room directory are private which we filtered out before. But that was a much more rare experience since only 2% of rooms were private .
Only
world_readable
can be considered as opting into having history publicly on the web. Anything else must not bearchivedviewable without login until there's a dedicated state event for opting into archiving.See #47