Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoized packing for pushing identical messages to large groups #276

Closed
mhuebert opened this issue Oct 25, 2016 · 7 comments
Closed

Memoized packing for pushing identical messages to large groups #276

mhuebert opened this issue Oct 25, 2016 · 7 comments
Assignees

Comments

@mhuebert
Copy link
Collaborator

mhuebert commented Oct 25, 2016

Is there any reason the transit packer doesn't include memoization of payloads?

The recommended way to broadcast a message to a group of subscribers is to loop through a list of users:

  (doseq [uid (pubsub/subscribers :my-topic)]
      (chsk-send! uid [:my-app/action {:text "This Is Not A Tweet", :payload some-data}]))

But if we have 1,000 subscribers, the message data is serialized for transport independently 1,000 times. As it's relatively easy to have a data payload that takes 1ms to pack, these milliseconds add up.

In a quick test, I added memoize to this line, and that did the trick. To minimize memory usage, I imagine encore/memoize* with a small cache size might be a good choice?

Ideas/feedback welcome, maybe I'm missing something.

@ptaoussanis
Copy link
Member

Hey Matt, thanks for pinging about this!

You've spotted a definite inefficiency (just something I hadn't considered), and a good fix (with encore/memoize*).

One potential minor difficulty with using memoize* is that its caching/gc parameters aren't easily tuneable when used in a library.

In this case, think it might actually make sense to consider adding a new API fn for bulk broadcasting that'll just generate the payload once before looping.

Just have my hands quite full atm, but will think about it a little and get back to you.

Again, this was a good (and valuable) catch - and a nice clear report: much appreciated!

Cheers :-)

@ptaoussanis ptaoussanis self-assigned this Oct 26, 2016
ptaoussanis added a commit that referenced this issue Oct 26, 2016
Objectives:
  - [#276] Support easy+efficient generic un/pack caching.
    - In particular: support cache-friendly server->client event buffering.
  - [#16] Support client->server buffering with arb cbs.
  - Read+write performance.
  - Bandwidth efficiency.
  - Flexibility for possible future extensions/changes.
  - Human-readable on the wire (handy for debugging, etc.).
  - Lay groundwork for possible broadcast API fn.

Downsides:
  - Would involve a breaking incompatibility with old clients.
@mhuebert
Copy link
Collaborator Author

Hey, thanks for the quick reply + commit!

I figured this would touch on other concerns, like buffering. Thanks for the informative comments in the commit.

A bit off topic, but I'm not sure a separate issue is warranted or if this is covered elsewhere - are there tests that aren't committed into the repo? I see a testing profile in profile.clj, but I don't seem to see anything in the test directory. While experimenting locally I was wondering how you approached testing for performance w/ many connected clients.

ptaoussanis added a commit that referenced this issue Oct 26, 2016
Objectives:
  - [#276] Support easy+efficient generic un/pack caching.
    - In particular: support cache-friendly server->client event buffering.
  - [#16] Support client->server buffering with arb cbs.
  - Read+write performance.
  - Bandwidth efficiency.
  - Flexibility for possible future extensions/changes.
  - Human-readable on the wire (handy for debugging, etc.).
  - Lay groundwork for possible broadcast API fn.

Downsides:
  - Would involve a breaking incompatibility with old clients.
ptaoussanis added a commit that referenced this issue Oct 26, 2016
@theasp
Copy link
Collaborator

theasp commented Oct 26, 2016

I haven't looked at your changes, but I believe that Sente keeps the transit readers and writers for each client since #161. Is it still safe to memoize this? I think the memoized result would not be valid for any client other than the one it's writer is used for, based on how transit replaces data with shorter tokens.

@ptaoussanis
Copy link
Member

@theasp Thanks for pinging about this Andrew.

The readers and writer are indeed cached, but I don't see how additionally caching particular reads or writes would be a problem?

Last I checked the implementation, Transit can be stateful for the duration of a particular read/write, but shouldn't automatically (to my knowledge) be stateful between separate reads/writes.

For example, it uses token replacement to help deduplicate content within a particular payload. Unless you're specifically requesting something like it though (and providing the necessary data), I don't see how/why it'd try provide deduplication between payloads.

Does that make sense?

I don't use Transit myself though, so may be missing something obvious?

@theasp
Copy link
Collaborator

theasp commented Oct 28, 2016

Yes, it looks like you are correct.

cljs.user=> (require '[cognitect.transit :as t])
nil
cljs.user=> (def writer (t/writer :json))
#'cljs.user/writer
cljs.user=> (t/write writer [:msg {:a 1 :b 2}])
"[\"~:msg\",[\"^ \",\"~:a\",1,\"~:b\",2]]"
cljs.user=> (t/write writer [:msg {:a 1 :b 2}])
"[\"~:msg\",[\"^ \",\"~:a\",1,\"~:b\",2]]"

@ptaoussanis
Copy link
Member

@theasp Thanks for the confirmation :-)

BTW I'll note that there could (conceivably) be other instances where serialized output is not identical between calls with the same input, but that would still be okay so long as the deserialization doesn't depend on any out-of-band state.

If freeze(x)->y, then we only need thaw(y)->x; we don't otherwise care about the interim y value in Sente's case (or for read/write caching).

@ptaoussanis
Copy link
Member

Closing in favour of #421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants