Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When to recycle ZSTD_CCtx objects? #1796

Closed
QrczakMK opened this issue Sep 17, 2019 · 4 comments
Closed

When to recycle ZSTD_CCtx objects? #1796

QrczakMK opened this issue Sep 17, 2019 · 4 comments
Assignees
Labels

Comments

@QrczakMK
Copy link

Experiments indicate that recycling ZSTD_CCtx objects between separate streaming compression sessions (e.g. with ZSTD_CCtx_reset), instead of creating them anew, is much faster: compression takes microseconds instead of milliseconds.

It looks like this is because existing entries in a hash table can be cleared much faster than it takes to fill a newly allocated table to all zeros. Most expensive is a memset in ZSTD_reset_matchState, its size can be over ten megabytes.

But recycling ZSTD_CCtx objects is helpful only if compression options match. If compression options differ, ZSTD_CCtx performs a costly reset of internal tables, and recycling them is as expensive as creating them anew, while losing the opportunity to reuse the object with matching compression options later.

Our library does the recycling automatically, maintaining a pool of idle objects, and looking for an existing object with matching options instead of creating it anew: https://github.com/google/riegeli/blob/master/riegeli/base/recycling_pool.h. It is also used for other kinds of objects, but ZSTD_CCtx is one where proper recycling matters the most.

The problem is that determining what does it mean that compression options match, for the purpose of recycling ZSTD_CCtx objects, is not trivial. For example:

Extracting the key for comparing options is done here: https://github.com/google/riegeli/blob/8090178002b7f4e7fe5eba7cb756c526261921f0/riegeli/bytes/zstd_writer.h#L213-L233; https://github.com/google/riegeli/blob/8090178002b7f4e7fe5eba7cb756c526261921f0/riegeli/bytes/zstd_writer.cc#L95-L97.

This is fragile.

Recently I discovered that if the first ZSTD_compressStream2 call is with ZSTD_e_end, then the input size counts as specifying pledgedSrcSize for the purpose of recycling ZSTD_CCtx objects. This prompted a change to create the compressor lazily, so that when it is known whether the first compressed fragment is also the last one, parameters for looking up an idle ZSTD_CCtx object to recycle can be appropriately adjusted: google/riegeli@8090178

Can this be done better somehow?

I am not sure how this should be designed properly. Here are two possibilities:

  1. Maybe zstd can somehow make that memset not as costly, to make the whole problem less important. Maybe there is an OS-specific way to get a zeroed memory more efficiently. Maybe zstd could make an internal pool of these tables.

  2. If we keep the existing basic design (i.e. recycling ZSTD_CCtx objects is the responsibility of a higher level library than zstd), what I need is to extract a “key” of compression parameters which matter for this purpose, a key which has equality and hashing defined. Maybe zstd could offer a more reliable way for this.

@felixhandte felixhandte self-assigned this Sep 17, 2019
@felixhandte
Copy link
Contributor

Hey!

Great write-up! We have the same pattern in a few places at facebook (e.g.: https://github.com/facebook/hhvm/blob/master/hphp/util/compression-ctx-pool.h), with exactly the same problem.

My changesets #1712 and #1780 address this scenario. When possible (i.e., when there's enough room in the existing workspace buffer), the CCtx avoids the memset even when compression parameters change. This functionality will be in the next zstd release.

I hope that's useful. (In fact, if you tried your use case on the dev branch, it would be great to get confirmation that you're seeing the expected speed up.)

@Cyan4973
Copy link
Contributor

Cyan4973 commented Sep 17, 2019

Yeah, this is exactly the kind of topic we want to address for our next release, feedback will be highly appreciated.
@felixhandte made a big part of the required changes recently in dev branch. We have another set of patches coming for some complementary scenarios, but it seems that the scenarios you care about are already covered by Felix's merged work.

@QrczakMK
Copy link
Author

Funny: https://github.com/facebook/hhvm/blob/master/hphp/util/compression-ctx-pool.h was submitted one day after https://github.com/google/riegeli/blob/master/riegeli/base/recycling_pool.h

I verified with the dev branch and indeed, changing our code to recycle ZSTD_CCtx objects unconditionally, without verifying compression options, makes microbenchmarks of performing tiny compression tasks about 3% faster. In contrast, with v1.4.0 the same change makes some of these microbenchmarks 350 times slower.

@QrczakMK
Copy link
Author

The plan after #1780 is clear: always recycle, ignoring compression options. I am closing the issue. Thank you!

(Now I only have to wait for a zstd release, and then for us to update the version of our zstdlib dependency.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants