Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Internal error when posting an event with \u0000 #9341

Closed
erdnaxeli opened this issue Feb 6, 2021 · 16 comments · Fixed by #10820
Closed

Internal error when posting an event with \u0000 #9341

erdnaxeli opened this issue Feb 6, 2021 · 16 comments · Fixed by #10820
Assignees
Labels
good first issue Good for newcomers S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@erdnaxeli
Copy link
Contributor

erdnaxeli commented Feb 6, 2021

Description

Although "\u0000" is a perfectly valid JSON string, it seems to fail when storing in the database.

Steps to reproduce

HS_URL=
ROOM_ID=
ACCESS_TOKEN=
curl "https://$HS_URL/_matrix/client/r0/rooms/$ROOM_ID/send/m.room.message" -X POST H 'Content-Type: application/json' -H "Authorization: Bearer $ACCESS_TOKEN" --data-raw '{"msgtype":"m.text","body":"\u0000"}'

Error seen in the log file:

Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/persist_events.py", line 172, in handle_queue_loop
    ret = await per_item_callback(item)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/persist_events.py", line 325, in persisting_queue
    item.events_and_contexts, backfilled=item.backfilled
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/persist_events.py", line 537, in _persist_events
    backfilled=backfilled,
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/events.py", line 177, in _persist_events_and_state_updates
    new_forward_extremeties=new_forward_extremeties,
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/database.py", line 664, in runInteraction
    **kwargs,
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/database.py", line 740, in runWithConnection
    self._db_pool.runWithConnection(inner_func, *args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/enterprise/adbapi.py", line 306, in _runWithConnection
    compat.reraise(excValue, excTraceback)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/python/compat.py", line 464, in reraise
    raise exception.with_traceback(traceback)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/twisted/enterprise/adbapi.py", line 297, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/database.py", line 734, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/database.py", line 534, in new_transaction
    r = func(cursor, *args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/logging/utils.py", line 71, in wrapped
    return f(*args, **kwargs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/events.py", line 394, in _persist_events_txn
    backfilled=backfilled,
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/events.py", line 1362, in _update_metadata_tables_txn
    self._store_room_message_txn(txn, event)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/events.py", line 1653, in _store_room_message_txn
    txn, event, "content.body", event.content["body"]
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/events.py", line 1707, in store_event_search_txn
    origin_server_ts=event.origin_server_ts,
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/search.py", line 66, in store_search_entries_txn
    txn.executemany(sql, args)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/database.py", line 291, in executemany
    self._do_execute(self.txn.executemany, sql, *args)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/database.py", line 314, in _do_execute
    return func(sql, *args)
ValueError: A string literal cannot contain NUL (0x00) characters.

Version information

  • Homeserver: cloud.cervoi.se

If not matrix.org:

  • Version: 1.26.0

  • Install method: apt

  • Platform: debian 10
@erdnaxeli
Copy link
Contributor Author

For the record I got this when trying to send logs from the docker api to matrix. Docker adds an header on each line to determine if it comes from stdout or stderr, and this header contains NULL bytes, which got encoded by my code to \u0000.

@erikjohnston erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Feb 8, 2021
@clokep
Copy link
Member

clokep commented Feb 17, 2021

We're also seeing this on matrix.org occassionally: https://sentry.matrix.org/sentry/synapse-matrixorg/issues/166204/

Not quite sure what the solution is though.

@erdnaxeli
Copy link
Contributor Author

I guess you could check at the validation step (if there is) if the string contains a null byte, and return an error then.

@ShadowJonathan
Copy link
Contributor

Shouldn't we store JSON data as bytes, anyways? It's untrusted data.

@reivilibre
Copy link
Contributor

The problem seems to be related to room message search: this is not related to the JSON encoding. This occurs when Synapse inserts the plaintext of a message into the event_search table — well, actually, in Postgres it uses to_tsvector to turn it into a text-search vector (see Postgres docs about full-text search for more information). SQLite seems to have an equivalent, but I'm not familiar with it.

We could either omit search index entries for messages with null bytes or perhaps replace them with e.g. a space character (to preserve it being a word boundary character — "hi\u0000bob" would then give the search words hi and bob).

I think replacing the message text with spaces for the event_search table insertion would be a more appropriate solution. This doesn't seem like it'd be a large change.

@reivilibre reivilibre added the good first issue Good for newcomers label Aug 27, 2021
@DMRobertson
Copy link
Contributor

Some relevant bits of postgres' documentation:

to_tsvector accepts any text as its input, where text is defined here. The latter page says:

The characters that can be stored in any of these data types are determined by the database character set, which is selected when the database is created. Regardless of the specific character set, the character with code zero (sometimes called NUL) cannot be stored.

and refers to this page as a reference on character sets.

Aside: I think if we wanted to go full pedant on this:

  • The spec defines strings to be any sequence of Unicode code points
  • I don't think we prescribe a particular character set when server owners configure postgres? We could recommend that server owners configure postgres so that its text types can store as many codepoints as possible (probably UTF-8?), but maybe this is the sensible default for most server OSes anyway.

But that's going by the letter of the law. Replacing U+0000 with good old U+0020 space is probably the best use of our time!

@DMRobertson
Copy link
Contributor

Another thought: has anyone tried registering a user with a null code point?

@clokep
Copy link
Member

clokep commented Sep 7, 2021

I don't think we prescribe a particular character set when server owners configure postgres? We could recommend that server owners configure postgres so that its text types can store as many codepoints as possible (probably UTF-8?), but maybe this is the sensible default for most server OSes anyway.

The docs for what we ask people to do for postgres are at https://github.com/matrix-org/synapse/blob/251cfc4e09c36c87227771f1a3aa305c4b0a2121/docs/postgres.md#set-up-database

They seem to suggest an encoding of UTF-8 and C-locale. Not using those is known to cause issues. Does postgres have an additional charset configuration beyond that?

@reivilibre
Copy link
Contributor

The docs recommend: createdb --encoding=UTF8 --locale=C --template=template0 --owner=synapse_user synapse.
Synapse makes a fuss about the locale being set to C — not sure how that influences things.

I think 0x00 → 0x20 is the sensible choice.

Another thought: has anyone tried registering a user with a null code point?

I guess you're talking about the user directory? (Equally, the room directory will be potentially affected too)

@clokep
Copy link
Member

clokep commented Sep 7, 2021

But that's going by the letter of the law. Replacing U+0000 with good old U+0020 space is probably the best use of our time!

I think this assumes that the data going in is readable text? I don't think this is valid. It could make more sense to escape it, but than you're dealing with needing to have escape logic which is always a pain. 😢

@reivilibre
Copy link
Contributor

reivilibre commented Sep 7, 2021

I think this assumes that the data going in is readable text? I don't think this is valid. It could make more sense to escape it, but than you're dealing with needing to have escape logic which is always a pain. cry

How would you escape it, FWIW?

Treating NUL as a delimiter (word boundary) seems like the pragmatic approach (that's all that converting it to a space before feeding it to to_tsvector means, right?)

(Though since the message is unlikely to be readable text, I can see the option of just exempting it from message search altogether to be sensible, too.)

@jplatte
Copy link
Contributor

jplatte commented Sep 22, 2021

This internal error just happened to sbd. when joining #conduit:fachschaften.org (the main Conduit room) with Synapse:

8ZhxlGdY - Error attempting to resolve state at missing prev_events
Sep 22 10:50:55 debian matrix-synapse[3275]: Traceback (most recent call last):
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 710, in _resolve_state_at_missing_prevs
Sep 22 10:50:55 debian matrix-synapse[3275]:     remote_state = await self._get_state_after_missing_prev_event(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 817, in _get_state_after_missing_prev_event
Sep 22 10:50:55 debian matrix-synapse[3275]:     await self._get_events_and_persist(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 1171, in _get_events_and_persist
Sep 22 10:50:55 debian matrix-synapse[3275]:     await self._auth_and_persist_fetched_events(destination, room_id, roots)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 1240, in _auth_and_persist_fetched_events
Sep 22 10:50:55 debian matrix-synapse[3275]:     await self.persist_events_and_notify(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 1767, in persist_events_and_notify
Sep 22 10:50:55 debian matrix-synapse[3275]:     events, max_stream_token = await self._storage.persistence.persist_events(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/persist_events.py", line 322, in persist_events
Sep 22 10:50:55 debian matrix-synapse[3275]:     ret_vals = await yieldable_gather_results(enqueue, partitioned.items())
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/persist_events.py", line 239, in handle_queue_loop
Sep 22 10:50:55 debian matrix-synapse[3275]:     ret = await self._per_item_callback(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/persist_events.py", line 577, in _persist_event_batch
Sep 22 10:50:55 debian matrix-synapse[3275]:     await self.persist_events_store._persist_events_and_state_updates(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 174, in _persist_events_and_state_updates
Sep 22 10:50:55 debian matrix-synapse[3275]:     await self.db_pool.runInteraction(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/database.py", line 686, in runInteraction
Sep 22 10:50:55 debian matrix-synapse[3275]:     result = await self.runWithConnection(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/database.py", line 791, in runWithConnection
Sep 22 10:50:55 debian matrix-synapse[3275]:     return await make_deferred_yieldable(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/twisted/python/threadpool.py", line 238, in inContext
Sep 22 10:50:55 debian matrix-synapse[3275]:     result = inContext.theWork()  # type: ignore[attr-defined]
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/twisted/python/threadpool.py", line 254, in <lambda>
Sep 22 10:50:55 debian matrix-synapse[3275]:     inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/twisted/python/context.py", line 118, in callWithContext
Sep 22 10:50:55 debian matrix-synapse[3275]:     return self.currentContext().callWithContext(ctx, func, *args, **kw)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/twisted/python/context.py", line 83, in callWithContext
Sep 22 10:50:55 debian matrix-synapse[3275]:     return func(*args, **kw)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection
Sep 22 10:50:55 debian matrix-synapse[3275]:     compat.reraise(excValue, excTraceback)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction
Sep 22 10:50:55 debian matrix-synapse[3275]:     return function(*args, **kwargs)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/twisted/python/compat.py", line 404, in reraise
Sep 22 10:50:55 debian matrix-synapse[3275]:     raise exception.with_traceback(traceback)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection
Sep 22 10:50:55 debian matrix-synapse[3275]:     result = func(conn, *args, **kw)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/database.py", line 786, in inner_func
Sep 22 10:50:55 debian matrix-synapse[3275]:     return func(db_conn, *args, **kwargs)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/database.py", line 554, in new_transaction
Sep 22 10:50:55 debian matrix-synapse[3275]:     r = func(cursor, *args, **kwargs)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/logging/utils.py", line 69, in wrapped
Sep 22 10:50:55 debian matrix-synapse[3275]:     return f(*args, **kwargs)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 396, in _persist_events_txn
Sep 22 10:50:55 debian matrix-synapse[3275]:     self._update_metadata_tables_txn(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 1528, in _update_metadata_tables_txn
Sep 22 10:50:55 debian matrix-synapse[3275]:     self._store_room_members_txn(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 1666, in _store_room_members_txn
Sep 22 10:50:55 debian matrix-synapse[3275]:     self.db_pool.simple_insert_many_txn(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/database.py", line 931, in simple_insert_many_txn
Sep 22 10:50:55 debian matrix-synapse[3275]:     txn.execute_values(sql, vals, fetch=False)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/database.py", line 293, in execute_values
Sep 22 10:50:55 debian matrix-synapse[3275]:     return self._do_execute(
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/database.py", line 331, in _do_execute
Sep 22 10:50:55 debian matrix-synapse[3275]:     return func(sql, *args)
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/synapse/storage/database.py", line 294, in <lambda>
Sep 22 10:50:55 debian matrix-synapse[3275]:     lambda *x: execute_values(self.txn, *x, fetch=fetch), sql, *args
Sep 22 10:50:55 debian matrix-synapse[3275]:   File "/home/synapse/synapse/env/lib/python3.9/site-packages/psycopg2/extras.py", line 1267, in execute_values
Sep 22 10:50:55 debian matrix-synapse[3275]:     parts.append(cur.mogrify(template, args))
Sep 22 10:50:55 debian matrix-synapse[3275]: ValueError: A string literal cannot contain NUL (0x00) characters.

@DMRobertson
Copy link
Contributor

This internal error just happened to sbd. when joining #conduit:fachschaften.org (the main Conduit room) with Synapse:

That error looks to be coming from a different transaction to the one discussed above:

self.db_pool.simple_insert_many_txn(
txn,
table="room_memberships",
values=[
{
"event_id": event.event_id,
"user_id": event.state_key,
"sender": event.user_id,
"room_id": event.room_id,
"membership": event.membership,
"display_name": str_or_none(event.content.get("displayname")),
"avatar_url": str_or_none(event.content.get("avatar_url")),
}
for event in events
],
)

@jplatte
Copy link
Contributor

jplatte commented Sep 22, 2021

@DMRobertson … which probably means this issue being closed by #10820 does not mean that the underlying issue for that Conduit <> Synapse issue was fixed, right?

@clokep
Copy link
Member

clokep commented Sep 22, 2021

@DMRobertson … which probably means this issue being closed by #10820 does not mean that the underlying issue for that Conduit <> Synapse issue was fixed, right?

I'd recommend filing a new issue, it seems unrelated enough to the above conversation that we should discuss it separately!

@DMRobertson
Copy link
Contributor

@DMRobertson … which probably means this issue being closed by #10820 does not mean that the underlying issue for that Conduit <> Synapse issue was fixed, right?

Agreed!

I'd recommend filing a new issue, it seems unrelated enough to the above conversation that we should discuss it separately!

Yes please!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants