Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add (de)compression tracing functionality #2482

Merged
merged 1 commit into from
Feb 6, 2021
Merged

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Feb 4, 2021

This PR adds tracing hooks for zstd compression and decompression calls. Users can inject tracing by overriding the weakly linked tracing functions.

  • Add lib/common/trace.{h,c} - see this file for what is traced
  • Hook up tracing in lib/
  • Provide tracing for the zstd CLI in programs/zstdcli_tracing.{h,c} enabled by --trace FILE.

The tracing functionality in the CLI is optional, if we don't want it I can remove it. I figured it would both provide an example of how to hook up the tracing, and potentially be useful to measure (de)compression speed and output in a machine readable format in some scenarios.

E.g.

zstd --trace trace.csv -r path/to/dir
zstd --trace trace.csv -t path/to/dir
zstd --trace trace.csv -b1e19

/* We depend on the trace header to avoid duplicating the ZSTD_trace struct.
* But, we check the version so it is compatible with dynamic linking.
*/
#include "../lib/common/zstd_trace.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is zstd_trace.h a new "public" API ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this API would eventually become public. I was planning on leaving it out of the default installation for now, but publishing it internally.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Feb 4, 2021

The tracing functionality in the CLI is optional, if we don't want it I can remove it. I figured it would both provide an example of how to hook up the tracing, and potentially be useful to measure (de)compression speed

Agreed, the CLI is a great place to use, show and experiment with this new trace interface


#if ZSTD_TRACE && ZSTD_HAVE_WEAK_SYMBOLS

ZSTD_WEAK_ATTR int ZSTD_trace_compress_begin(ZSTD_CCtx const* cctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, by default, these functions exist but don't do anything,
they need to be replaced in target binary by real functions that are actually doing something
which is possible thanks to weak linking.

I wonder if there is a (future) opportunity here to have some "good enough" default functions, instead of empty ones,
although to be fair I don't know the topic nearly enough to tell if it makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (zstd) don't really know what to do with the data that is provided. So by default we just disable the tracing, so the cost is only one function call + one branch per (de)compression.

We could also introduce other methods of tracing in the future, like attaching the tracing functions to the cctx/dctx itself.

int ZSTD_trace_compress_begin(ZSTD_CCtx const* cctx)
{
int enabled = 0;
if (g_traceFile == NULL)
Copy link
Contributor

@Cyan4973 Cyan4973 Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this design mandates the use of global variables ?
Could it be mitigated by allowing passing some void* state parameter ?

Copy link
Contributor Author

@terrelln terrelln Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two options:

  1. Global state inside the tracing library.
  2. return void* from the begin() function, and consume it in the end() function.

I chose the former, because the void* would have to be malloc'd. That would mean we have to be certain never to leak the void*. Currently end() is only called on success, so we would leak it on errors. We could fix that by calling free() whenever we overwrite it with a new void* or free the context. But, that introduces extra complexity to both zstd and the tracing library. Because now, what if the tracing library is in C++. It couldn't use new to allocate the object, it would have to use malloc. And generally having one library allocate and another free seems complex.

So I decided to keep it simpler and have the tracing library keep track of the state. A more complex implementation would have a map<uintptr_t, State> to keep track of a separate state per-context, using the context pointer as a key.

Copy link
Contributor Author

@terrelln terrelln Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also change the zstd implementation to make it compulsory to call end(). Basically, if zstd detected that the void* tracingContext is not NULL when calling begin() or during ZSTD_freeCCtx() it would call end() and set the error member to some generic error (and we would have the flexibility to set a more specific error code later).

Copy link
Contributor

@Cyan4973 Cyan4973 Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the argument about simplicity.

It's just that, as it's a user API (even a hidden one), I'm easily worried that it could be more difficult to update it afterwards.

But overall, I'm fine with the idea of starting simple, experiment, and introduce complexity only when it feels more palatable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that, as it's a user API (even a hidden one), I'm easily worried that it could be more difficult to update it afterwards.

At this point I expect this API to change between versions. Once we have enough experience with it, we can think about what stability guarantees we want to provide.

@terrelln terrelln merged commit 856f38e into facebook:dev Feb 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants