Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synapse doesn't remove text search vectors for redacted messages #13122

Open
actx-1 opened this issue Jun 28, 2022 · 3 comments
Open

Synapse doesn't remove text search vectors for redacted messages #13122

actx-1 opened this issue Jun 28, 2022 · 3 comments
Labels
A-Message-Search Searching messages S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@actx-1
Copy link

actx-1 commented Jun 28, 2022

Description

When a message event is sent in an unencrypted room, a text search vector of the message's content is written to the table event_search, so that the message will appear in future search results.
If the message event is later redacted however, then synapse will remove the event's content from the table event_json in the database after a time period (given by redaction_retention_period in homeserver.yaml, defaulting to 7 days as far as I understand). However, synapse does not appear to remove or clear the associated text search entry in the event_search table.

Steps to reproduce

  • Send a message in an unencrypted room
  • Record the ID of the message
  • Redact the message
  • Wait for your homeserver's redaction_retention_period
  • Verify that the message event's content has been cleared from the table event_json
  • Select the row in event_search corresponding to your redacted message event. Confirm that vector still contains information from the original message

Homeserver

Local Instance

Synapse Version

1.61.0

Installation Method

pip (from PyPI)

Platform

Raspberry Pi 4 Model B running Raspberry Pi OS bullseye from 2022-01-28

Relevant log output

synapse=# SELECT have_censored FROM redactions WHERE redacts='$PsAQJtiPWds6zO-ZECgcP4jB83zHRycgmFiFJOJ05G0';
 have_censored 
---------------
 t
(1 row)



synapse=# SELECT json::jsonb -> 'content' FROM event_json WHERE event_id='$PsAQJtiPWds6zO-ZECgcP4jB83zHRycgmFiFJOJ05G0';
 ?column? 
----------
 {}
(1 row)



synapse=# SELECT key,vector FROM event_search WHERE event_id='$PsAQJtiPWds6zO-ZECgcP4jB83zHRycgmFiFJOJ05G0';
     key      |         vector          
--------------+-------------------------
 content.body | 'uniquestringvalue14':1
(1 row)

Anything else that would be useful to know?

Despite text search vectors appearing to persist in the database past message redaction, the redacted message is not returned in search results. However, the approximate number of results displayed does account for the persistent search vector:
grafik
Search results for a room where a message containing the string uniquestringvalue14 was sent and then redacted roughly 7 days ago. No search results are returned, but the number of results found is displayed as (~1 result).

I have only tested this with a PostgreSQL database. If memory serves correctly however, then a relevant text search entry in an SQLite database will likely contain the full text content of a redacted message.

@richvdh
Copy link
Member

richvdh commented Jul 1, 2022

related #1454

@erikjohnston erikjohnston added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. S-Minor Blocks non-critical functionality, workarounds exist. and removed X-Needs-Discussion labels Jul 1, 2022
@erikjohnston
Copy link
Member

We should delete the rows when we get a redaction, and add a background job to remove existing rows.

Since this doesn't leak the redacted contents we don't consider this high priority, but would very much accept patches that fixes this.

@richvdh richvdh added the A-Message-Search Searching messages label Aug 1, 2022
@richvdh
Copy link
Member

richvdh commented Aug 1, 2022

Seems closely related to #8686

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Message-Search Searching messages S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

4 participants