Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Use ijson to parse the response to /send_join, reducing memory usage. #9958

Merged
merged 7 commits into from
May 20, 2021

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented May 10, 2021

Instead of parsing the full response to /send_join into Python objects (which can be huge for large rooms) and then parsing that into events, we instead use ijson to stream parse the response directly into EventBase objects.

Question: Do we want to make this an optional dependency? It should be easy (just a case of adding a JsonParser subclass), though does mean we have more things to test

@erikjohnston erikjohnston force-pushed the erikj/streaming_parser branch from 15c481e to 1fc9348 Compare May 10, 2021 15:15
@erikjohnston erikjohnston force-pushed the erikj/streaming_parser branch from 1fc9348 to 36cd6b8 Compare May 10, 2021 15:23
@erikjohnston erikjohnston requested a review from a team May 10, 2021 15:51
@clokep
Copy link
Member

clokep commented May 11, 2021

A general question -- how strict is ijson at parsing JSON (does it bypass some of the strictness we do:

# Create a custom decoder to reject Python extensions to JSON.
json_decoder = json.JSONDecoder(parse_constant=_reject_invalid_json)

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally lgtm, modulo a few thoughts.

As @clokep says: it would be good to check how it handles non-standard json (inf, NaN, etc).

synapse/http/matrixfederationclient.py Show resolved Hide resolved
synapse/http/matrixfederationclient.py Show resolved Hide resolved
synapse/http/matrixfederationclient.py Outdated Show resolved Hide resolved
synapse/http/matrixfederationclient.py Outdated Show resolved Hide resolved
synapse/http/matrixfederationclient.py Outdated Show resolved Hide resolved
synapse/federation/transport/client.py Outdated Show resolved Hide resolved
@erikjohnston
Copy link
Member Author

A general question -- how strict is ijson at parsing JSON (does it bypass some of the strictness we do:

# Create a custom decoder to reject Python extensions to JSON.
json_decoder = json.JSONDecoder(parse_constant=_reject_invalid_json)

I tested this manually for all the ijson backends and they all reject NaN, Infinity and `-Infinity', so I think we're good on that front.

However, it will silently ignore if the state_events or auth keys are missing or have the wrong type, e.g. list(ijson.items('{"foo": 1}', "foo.item")) will produce [] rather than failing.

erikjohnston and others added 3 commits May 13, 2021 10:30
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
@erikjohnston erikjohnston requested a review from a team May 13, 2021 10:04
Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. I wonder if there's other places we should be using this? (For some reason the receiving end of /send for federation jumps out to me.)

synapse/http/matrixfederationclient.py Show resolved Hide resolved
@erikjohnston
Copy link
Member Author

Seems reasonable. I wonder if there's other places we should be using this? (For some reason the receiving end of /send for federation jumps out to me.)

Maybe, though I think /send_join is pretty much the only API (apart from deprecated /state) which is quite so big. /send only has a max of 50 events and 100 EDUs, so they're not that big 🤷

@erikjohnston erikjohnston merged commit 64887f0 into develop May 20, 2021
@erikjohnston erikjohnston deleted the erikj/streaming_parser branch May 20, 2021 15:11
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jun 5, 2021
Synapse 1.35.1 (2021-06-03)
===========================

Bugfixes
--------

- Fix a bug introduced in v1.35.0 where invite-only rooms would be shown to all users in a space, regardless of if the user had access to it. ([\#10109](matrix-org/synapse#10109))


Synapse 1.35.0 (2021-06-01)
===========================

Note that [the tag](https://github.com/matrix-org/synapse/releases/tag/v1.35.0rc3) and [docker images](https://hub.docker.com/layers/matrixdotorg/synapse/v1.35.0rc3/images/sha256-34ccc87bd99a17e2cbc0902e678b5937d16bdc1991ead097eee6096481ecf2c4?context=explore) for `v1.35.0rc3` were incorrectly built. If you are experiencing issues with either, it is recommended to upgrade to the equivalent tag or docker image for the `v1.35.0` release.

Deprecations and Removals
-------------------------

- The core Synapse development team plan to drop support for the [unstable API of MSC2858](https://github.com/matrix-org/matrix-doc/blob/master/proposals/2858-Multiple-SSO-Identity-Providers.md#unstable-prefix), including the undocumented `experimental.msc2858_enabled` config option, in August 2021. Client authors should ensure that their clients are updated to use the stable API (which has been supported since Synapse 1.30) well before that time, to give their users time to upgrade. ([\#10101](matrix-org/synapse#10101))

Bugfixes
--------

- Fixed a bug causing replication requests to fail when receiving a lot of events via federation. Introduced in v1.33.0. ([\#10082](matrix-org/synapse#10082))
- Fix HTTP response size limit to allow joining very large rooms over federation. Introduced in v1.33.0. ([\#10093](matrix-org/synapse#10093))


Internal Changes
----------------

- Log method and path when dropping request due to size limit. ([\#10091](matrix-org/synapse#10091))


Synapse 1.35.0rc2 (2021-05-27)
==============================

Bugfixes
--------

- Fix a bug introduced in v1.35.0rc1 when calling the spaces summary API via a GET request. ([\#10079](matrix-org/synapse#10079))


Synapse 1.35.0rc1 (2021-05-25)
==============================

Features
--------

- Add experimental support to allow a user who could join a restricted room to view it in the spaces summary. ([\#9922](matrix-org/synapse#9922), [\#10007](matrix-org/synapse#10007), [\#10038](matrix-org/synapse#10038))
- Reduce memory usage when joining very large rooms over federation. ([\#9958](matrix-org/synapse#9958))
- Add a configuration option which allows enabling opentracing by user id. ([\#9978](matrix-org/synapse#9978))
- Enable experimental support for [MSC2946](matrix-org/matrix-spec-proposals#2946) (spaces summary API) and [MSC3083](matrix-org/matrix-spec-proposals#3083) (restricted join rules) by default. ([\#10011](matrix-org/synapse#10011))


Bugfixes
--------

- Fix a bug introduced in v1.26.0 which meant that `synapse_port_db` would not correctly initialise some postgres sequences, requiring manual updates afterwards. ([\#9991](matrix-org/synapse#9991))
- Fix `synctl`'s `--no-daemonize` parameter to work correctly with worker processes. ([\#9995](matrix-org/synapse#9995))
- Fix a validation bug introduced in v1.34.0 in the ordering of spaces in the space summary API. ([\#10002](matrix-org/synapse#10002))
- Fixed deletion of new presence stream states from database. ([\#10014](matrix-org/synapse#10014), [\#10033](matrix-org/synapse#10033))
- Fixed a bug with very high resolution image uploads throwing internal server errors. ([\#10029](matrix-org/synapse#10029))


Updates to the Docker image
---------------------------

- Fix bug introduced in Synapse 1.33.0 which caused a `Permission denied: '/homeserver.log'` error when starting Synapse with the generated log configuration. Contributed by Sergio Miguéns Iglesias. ([\#10045](matrix-org/synapse#10045))


Improved Documentation
----------------------

- Add hardened systemd files as proposed in [#9760](matrix-org/synapse#9760) and added them to `contrib/`. Change the docs to reflect the presence of these files. ([\#9803](matrix-org/synapse#9803))
- Clarify documentation around SSO mapping providers generating unique IDs and localparts. ([\#9980](matrix-org/synapse#9980))
- Updates to the PostgreSQL documentation (`postgres.md`). ([\#9988](matrix-org/synapse#9988), [\#9989](matrix-org/synapse#9989))
- Fix broken link in user directory documentation. Contributed by @junquera. ([\#10016](matrix-org/synapse#10016))
- Add missing room state entry to the table of contents of room admin API. ([\#10043](matrix-org/synapse#10043))


Deprecations and Removals
-------------------------

- Removed support for the deprecated `tls_fingerprints` configuration setting. Contributed by Jerin J Titus. ([\#9280](matrix-org/synapse#9280))


Internal Changes
----------------

- Allow sending full presence to users via workers other than the one that called `ModuleApi.send_local_online_presence_to`. ([\#9823](matrix-org/synapse#9823))
- Update comments in the space summary handler. ([\#9974](matrix-org/synapse#9974))
- Minor enhancements to the `@cachedList` descriptor. ([\#9975](matrix-org/synapse#9975))
- Split multipart email sending into a dedicated handler. ([\#9977](matrix-org/synapse#9977))
- Run `black` on files in the `scripts` directory. ([\#9981](matrix-org/synapse#9981))
- Add missing type hints to `synapse.util` module. ([\#9982](matrix-org/synapse#9982))
- Simplify a few helper functions. ([\#9984](matrix-org/synapse#9984), [\#9985](matrix-org/synapse#9985), [\#9986](matrix-org/synapse#9986))
- Remove unnecessary property from SQLBaseStore. ([\#9987](matrix-org/synapse#9987))
- Remove `keylen` param on `LruCache`. ([\#9993](matrix-org/synapse#9993))
- Update the Grafana dashboard in `contrib/`. ([\#10001](matrix-org/synapse#10001))
- Add a batching queue implementation. ([\#10017](matrix-org/synapse#10017))
- Reduce memory usage when verifying signatures on large numbers of events at once. ([\#10018](matrix-org/synapse#10018))
- Properly invalidate caches for destination retry timings every (instead of expiring entries every 5 minutes). ([\#10036](matrix-org/synapse#10036))
- Fix running complement tests with Synapse workers. ([\#10039](matrix-org/synapse#10039))
- Fix typo in `get_state_ids_for_event` docstring where the return type was incorrect. ([\#10050](matrix-org/synapse#10050))
babolivier added a commit to matrix-org/synapse-dinsic that referenced this pull request Sep 1, 2021
Synapse 1.35.0 (2021-06-01)
===========================

Note that [the tag](https://github.com/matrix-org/synapse/releases/tag/v1.35.0rc3) and [docker images](https://hub.docker.com/layers/matrixdotorg/synapse/v1.35.0rc3/images/sha256-34ccc87bd99a17e2cbc0902e678b5937d16bdc1991ead097eee6096481ecf2c4?context=explore) for `v1.35.0rc3` were incorrectly built. If you are experiencing issues with either, it is recommended to upgrade to the equivalent tag or docker image for the `v1.35.0` release.

Deprecations and Removals
-------------------------

- The core Synapse development team plan to drop support for the [unstable API of MSC2858](https://github.com/matrix-org/matrix-doc/blob/master/proposals/2858-Multiple-SSO-Identity-Providers.md#unstable-prefix), including the undocumented `experimental.msc2858_enabled` config option, in August 2021. Client authors should ensure that their clients are updated to use the stable API (which has been supported since Synapse 1.30) well before that time, to give their users time to upgrade. ([\#10101](matrix-org/synapse#10101))

Bugfixes
--------

- Fixed a bug causing replication requests to fail when receiving a lot of events via federation. Introduced in v1.33.0. ([\#10082](matrix-org/synapse#10082))
- Fix HTTP response size limit to allow joining very large rooms over federation. Introduced in v1.33.0. ([\#10093](matrix-org/synapse#10093))

Internal Changes
----------------

- Log method and path when dropping request due to size limit. ([\#10091](matrix-org/synapse#10091))

Synapse 1.35.0rc2 (2021-05-27)
==============================

Bugfixes
--------

- Fix a bug introduced in v1.35.0rc1 when calling the spaces summary API via a GET request. ([\#10079](matrix-org/synapse#10079))

Synapse 1.35.0rc1 (2021-05-25)
==============================

Features
--------

- Add experimental support to allow a user who could join a restricted room to view it in the spaces summary. ([\#9922](matrix-org/synapse#9922), [\#10007](matrix-org/synapse#10007), [\#10038](matrix-org/synapse#10038))
- Reduce memory usage when joining very large rooms over federation. ([\#9958](matrix-org/synapse#9958))
- Add a configuration option which allows enabling opentracing by user id. ([\#9978](matrix-org/synapse#9978))
- Enable experimental support for [MSC2946](matrix-org/matrix-spec-proposals#2946) (spaces summary API) and [MSC3083](matrix-org/matrix-spec-proposals#3083) (restricted join rules) by default. ([\#10011](matrix-org/synapse#10011))

Bugfixes
--------

- Fix a bug introduced in v1.26.0 which meant that `synapse_port_db` would not correctly initialise some postgres sequences, requiring manual updates afterwards. ([\#9991](matrix-org/synapse#9991))
- Fix `synctl`'s `--no-daemonize` parameter to work correctly with worker processes. ([\#9995](matrix-org/synapse#9995))
- Fix a validation bug introduced in v1.34.0 in the ordering of spaces in the space summary API. ([\#10002](matrix-org/synapse#10002))
- Fixed deletion of new presence stream states from database. ([\#10014](matrix-org/synapse#10014), [\#10033](matrix-org/synapse#10033))
- Fixed a bug with very high resolution image uploads throwing internal server errors. ([\#10029](matrix-org/synapse#10029))

Updates to the Docker image
---------------------------

- Fix bug introduced in Synapse 1.33.0 which caused a `Permission denied: '/homeserver.log'` error when starting Synapse with the generated log configuration. Contributed by Sergio Miguéns Iglesias. ([\#10045](matrix-org/synapse#10045))

Improved Documentation
----------------------

- Add hardened systemd files as proposed in [#9760](matrix-org/synapse#9760) and added them to `contrib/`. Change the docs to reflect the presence of these files. ([\#9803](matrix-org/synapse#9803))
- Clarify documentation around SSO mapping providers generating unique IDs and localparts. ([\#9980](matrix-org/synapse#9980))
- Updates to the PostgreSQL documentation (`postgres.md`). ([\#9988](matrix-org/synapse#9988), [\#9989](matrix-org/synapse#9989))
- Fix broken link in user directory documentation. Contributed by @junquera. ([\#10016](matrix-org/synapse#10016))
- Add missing room state entry to the table of contents of room admin API. ([\#10043](matrix-org/synapse#10043))

Deprecations and Removals
-------------------------

- Removed support for the deprecated `tls_fingerprints` configuration setting. Contributed by Jerin J Titus. ([\#9280](matrix-org/synapse#9280))

Internal Changes
----------------

- Allow sending full presence to users via workers other than the one that called `ModuleApi.send_local_online_presence_to`. ([\#9823](matrix-org/synapse#9823))
- Update comments in the space summary handler. ([\#9974](matrix-org/synapse#9974))
- Minor enhancements to the `@cachedList` descriptor. ([\#9975](matrix-org/synapse#9975))
- Split multipart email sending into a dedicated handler. ([\#9977](matrix-org/synapse#9977))
- Run `black` on files in the `scripts` directory. ([\#9981](matrix-org/synapse#9981))
- Add missing type hints to `synapse.util` module. ([\#9982](matrix-org/synapse#9982))
- Simplify a few helper functions. ([\#9984](matrix-org/synapse#9984), [\#9985](matrix-org/synapse#9985), [\#9986](matrix-org/synapse#9986))
- Remove unnecessary property from SQLBaseStore. ([\#9987](matrix-org/synapse#9987))
- Remove `keylen` param on `LruCache`. ([\#9993](matrix-org/synapse#9993))
- Update the Grafana dashboard in `contrib/`. ([\#10001](matrix-org/synapse#10001))
- Add a batching queue implementation. ([\#10017](matrix-org/synapse#10017))
- Reduce memory usage when verifying signatures on large numbers of events at once. ([\#10018](matrix-org/synapse#10018))
- Properly invalidate caches for destination retry timings every (instead of expiring entries every 5 minutes). ([\#10036](matrix-org/synapse#10036))
- Fix running complement tests with Synapse workers. ([\#10039](matrix-org/synapse#10039))
- Fix typo in `get_state_ids_for_event` docstring where the return type was incorrect. ([\#10050](matrix-org/synapse#10050))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants