Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

json-serializing a sync response can block the reactor for tens of seconds #6998

Closed
richvdh opened this issue Feb 26, 2020 · 7 comments · Fixed by #8013
Closed

json-serializing a sync response can block the reactor for tens of seconds #6998

richvdh opened this issue Feb 26, 2020 · 7 comments · Fixed by #8013
Assignees
Labels
A-Performance Performance, both client-facing and admin-facing Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution z-p2 (Deprecated Label)

Comments

@richvdh
Copy link
Member

richvdh commented Feb 26, 2020

A response to an initial /sync can easily run to hundreds of megabytes of JSON, which takes tens of seconds of CPU time to serialise from the objects. This monopolises the Twisted reactor thread, causing all sorts of other problems (including dropped replication connections for worker-mode deployments).

To make matters worse, although we have a cache which is supposed to avoid redoing a lot of the hard work for duplicate requests to initialsync (though its effectiveness is disputed, see #3880), we cache the de-serialised version, so have to re-serialise the response for each request. (One reason we have to do this is to populate the age field).

@richvdh
Copy link
Member Author

richvdh commented Feb 26, 2020

A simple solution might be to farm out the json-serialisation to a threadpool?

@richvdh
Copy link
Member Author

richvdh commented Jul 21, 2020

Actually, I have a much better idea. JSONEncoder implements an iterencode method. Rather than try to generate all the JSON at once, we could use that to generate JSON incrementally.

One approach to this would be to implement, say, a JsonProducer as an implementation of Twisted's IProducer, which we then attach to the request instead of the NoRangeStaticProducer we use at the moment. Every time resumeProducing gets called, we just send the next chunk from the iterencode result instead.

A slight complication would be that we wouldn't know the content-length ahead of time, so we'd have to use chunked transfer-encoding for the response; but that's easy enough to do.

@richvdh
Copy link
Member Author

richvdh commented Jul 21, 2020

@auscompgeek this feels like the sort of thing that might interest you. Any interest in picking it up?

@richvdh richvdh added the Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution label Jul 21, 2020
@auscompgeek
Copy link
Contributor

I would agree that using iterencode would be the best solution here. I can't say I know enough Twisted to work on this though.

A small concern:

Every time resumeProducing gets called, we just send the next chunk from the iterencode result instead.

Between each value, iterencode would yield single-char strings, for example:

>>> list(enc.iterencode({'a': 'b', 'c': 'd'}))                                                                  
['{', '"a"', ':', '"b"', ',', '"c"', ':', '"d"', '}']

Does Twisted's HTTP chunking coalesce small chunks together?

@richvdh
Copy link
Member Author

richvdh commented Jul 22, 2020

I would agree that using iterencode would be the best solution here. I can't say I know enough Twisted to work on this though.

I don't think you should let that be the thing that puts you off picking this up - the amount of Twisted knowledge needed should be minimal [1]. I understand you will have other demands on your time though!

Does Twisted's HTTP chunking coalesce small chunks together?

I don't think Twisted has any support for generating chunked transfer-encoding (or if it does, I haven't found it), so we'd be implementing that part ourselves. You're right though, we'll need to coalesce small chunks, to avoid generating thousands of tiny HTTP chunks each of which get passed through openssl and send out as tiny TCP segments.

[1]: currently, to send responses, we wrap the json bytes in a BytesIO and then use NoRangeStaticProducer, which reads from a file-like object and sends the contents out. As you can see from the source it's pretty simple and it should be easy to implement a replacement which reads from a JSON iterencoder.

@clokep
Copy link
Member

clokep commented Jul 22, 2020

I don't think Twisted has any support for generating chunked transfer-encoding (or if it does, I haven't found it), so we'd be implementing that part ourselves.

Twisted uses chunked encoding automatically if you don't set a content-length header.

I really thought the Twisted docs had a minimal example of hooking incremental JSON producing up to an IProducer, but I think I'm thinking of an old project. It should be pretty straightforward to hook up the APIs.

@clokep
Copy link
Member

clokep commented Jul 31, 2020

I was curious to take a look at this...and ended up doing it, oops. See #8013. I haven't tested it to see if it helps perf wise.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Performance Performance, both client-facing and admin-facing Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution z-p2 (Deprecated Label)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants