-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: pass path to censor function #16
feat: pass path to censor function #16
Conversation
Also, I'm certainly no V8 performance expert, but I tried to be relatively performance minded in my implementation here. Still, if its performance is an issue, I imagine that a relatively simple way to take down the overhead further would be to test the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work!!!
Have you checked the benchmarks? It this change impacting them at all?
@mcollina Thanks for the review. I'll try to work on this more tonight. |
As far as the benchmarks go, here's what I got: Before
After
I honestly am not sure how to tell what's noise and what's meaningful. The above is just from one run of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mcollina @davidmarkclements What's the next step for getting this merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please create separate tests for path rather than adapting current ones?
Could you also add a benchmark with the censor function and a benchmark the covers all code added to specialSet and post results?
Growing the array later seems to be faster than allocating this string upfront and overwriting it later.
This gives back some of the performance cost of constructing the path, for people who pass in censor functions that only take one argument.
`isCensorFct` prop is never passed in as part of `o` (see index.js), and the builder already had an object with a `secret` key, making the subsequent builder.push redundant.
This creates a new object before each test, which should make the benchmarks more reliable (since state/hidden class transitions on the object from prior tests could effect the speed of the redaction). This doesn’t go as far as to create a new object right before each redaction call (i.e., in the `for` loop) as that seems like it would add more noise/difficulty interpreting the final results, since a bigger chunk of each benchmark’s work would be this object creation. This also means that the `serialize: false` tests (and the pino-noir tests, which are effectively serialize false) don’t leave the object in a different state for the subsequent tests, which is required for adding tests for davidmarkclements#16 Most of the benchmarks remained similar before and after this change, with some of the fast-redact benchmarks getting 10-30% faster [but idk how much of that is noise]. A few benchmarks seemed to show consistent differences of greater than 30%: - `benchNoirV2` (the first case): about 50% faster [consistently], for reasons I don’t quite understand. - `benchFastRedactRestore`: about 50% faster - `benchFastRedactIntermediateWild`: about 50% slower - `benchNoirV2Wild`,`benchNoirV2CensorFunction`, `benchNoirV2Wild`, `benchNoirV2CensorFunction`: 2-4x as fast - `benchFastRedactCensorFunction`: 50-100% faster Here are the full results: # Before [ran twice, and the results were pretty similar] // benchNoirV2*500: 84.279ms // benchFastRedact*500: 14.280ms benchFastRedactRestore*500: 22.773ms benchNoirV2Wild*500: 105.580ms benchFastRedactWild*500: 40.533ms benchFastRedactWildRestore*500: 41.080ms benchFastRedactIntermediateWild*500: 107.193ms benchFastRedactIntermediateWildRestore*500: 92.896ms benchJSONStringify*500: 323.536ms benchNoirV2Serialize*500: 407.667ms benchFastRedactSerialize*500: 337.723ms benchNoirV2WildSerialize*500: 380.418ms benchFastRedactWildSerialize*500: 372.057ms benchFastRedactIntermediateWildSerialize*500: 417.458ms benchFastRedactIntermediateWildMatchWildOutcomeSerialize*500: 572.464ms benchFastRedactStaticMatchWildOutcomeSerialize*500: 348.632ms benchNoirV2CensorFunction*500: 70.804ms benchFastRedactCensorFunction*500: 59.476ms benchNoirV2*500: 48.808ms benchFastRedact*500: 11.550ms benchFastRedactRestore*500: 13.436ms benchNoirV2Wild*500: 61.383ms benchFastRedactWild*500: 31.472ms benchFastRedactWildRestore*500: 34.325ms benchFastRedactIntermediateWild*500: 128.071ms benchFastRedactIntermediateWildRestore*500: 128.602ms benchJSONStringify*500: 317.474ms benchNoirV2Serialize*500: 395.179ms benchFastRedactSerialize*500: 323.449ms benchNoirV2WildSerialize*500: 369.051ms benchFastRedactWildSerialize*500: 356.204ms benchFastRedactIntermediateWildSerialize*500: 426.994ms benchFastRedactIntermediateWildMatchWildOutcomeSerialize*500: 531.789ms benchFastRedactStaticMatchWildOutcomeSerialize*500: 340.540ms benchNoirV2CensorFunction*500: 38.953ms benchFastRedactCensorFunction*500: 53.361ms # After [ran twice, and the results were pretty similar] // benchNoirV2*500: 54.057ms // benchFastRedact*500: 15.949ms benchFastRedactRestore*500: 13.749ms benchNoirV2Wild*500: 34.787ms benchFastRedactWild*500: 44.642ms benchFastRedactWildRestore*500: 42.949ms benchFastRedactIntermediateWild*500: 162.311ms benchFastRedactIntermediateWildRestore*500: 106.921ms benchJSONStringify*500: 306.766ms benchNoirV2Serialize*500: 403.440ms benchFastRedactSerialize*500: 321.294ms benchNoirV2WildSerialize*500: 366.650ms benchFastRedactWildSerialize*500: 342.946ms benchFastRedactIntermediateWildSerialize*500: 437.797ms benchFastRedactIntermediateWildMatchWildOutcomeSerialize*500: 573.556ms benchFastRedactStaticMatchWildOutcomeSerialize*500: 336.521ms benchNoirV2CensorFunction*500: 27.110ms benchFastRedactCensorFunction*500: 47.454ms benchNoirV2*500: 38.437ms benchFastRedact*500: 10.272ms benchFastRedactRestore*500: 9.693ms benchNoirV2Wild*500: 18.504ms benchFastRedactWild*500: 30.266ms benchFastRedactWildRestore*500: 35.108ms benchFastRedactIntermediateWild*500: 131.794ms benchFastRedactIntermediateWildRestore*500: 110.691ms benchJSONStringify*500: 299.861ms benchNoirV2Serialize*500: 384.236ms benchFastRedactSerialize*500: 314.049ms benchNoirV2WildSerialize*500: 365.485ms benchFastRedactWildSerialize*500: 344.158ms benchFastRedactIntermediateWildSerialize*500: 426.421ms benchFastRedactIntermediateWildMatchWildOutcomeSerialize*500: 537.079ms benchFastRedactStaticMatchWildOutcomeSerialize*500: 340.104ms benchNoirV2CensorFunction*500: 16.021ms benchFastRedactCensorFunction*500: 31.100ms
This creates a new object before each test, which should make the benchmarks more reliable (since state/hidden class transitions on the object from prior tests could effect the speed of the redaction). This doesn’t go as far as to create a new object right before each redaction call (i.e., in the `for` loop) as that seems like it would add more noise/difficulty interpreting the final results, since a bigger chunk of each benchmark’s work would be this object creation. This also means that the `serialize: false` tests (and the pino-noir tests, which are effectively serialize false) don’t leave the object in a different state for the subsequent tests, which is required for adding tests for davidmarkclements#16 Most of the benchmarks remained similar before and after this change, with some of the fast-redact benchmarks getting 10-30% faster [but idk how much of that is noise]. A few benchmarks seemed to show consistent differences of greater than 30%: - `benchNoirV2` (the first case): about 50% faster [consistently], for reasons I don’t quite understand. - `benchFastRedactRestore`: about 50% faster - `benchFastRedactIntermediateWild`: about 50% slower - `benchNoirV2Wild`,`benchNoirV2CensorFunction`, `benchNoirV2Wild`, `benchNoirV2CensorFunction`: 2-4x as fast - `benchFastRedactCensorFunction`: 50-100% faster Here are the full results: benchNoirV2*500: 84.279ms benchFastRedact*500: 14.280ms benchFastRedactRestore*500: 22.773ms benchNoirV2Wild*500: 105.580ms benchFastRedactWild*500: 40.533ms benchFastRedactWildRestore*500: 41.080ms benchFastRedactIntermediateWild*500: 107.193ms benchFastRedactIntermediateWildRestore*500: 92.896ms benchJSONStringify*500: 323.536ms benchNoirV2Serialize*500: 407.667ms benchFastRedactSerialize*500: 337.723ms benchNoirV2WildSerialize*500: 380.418ms benchFastRedactWildSerialize*500: 372.057ms benchFastRedactIntermediateWildSerialize*500: 417.458ms benchFastRedactIntermediateWildMatchWildOutcomeSerialize*500: 572.464ms benchFastRedactStaticMatchWildOutcomeSerialize*500: 348.632ms benchNoirV2CensorFunction*500: 70.804ms benchFastRedactCensorFunction*500: 59.476ms benchNoirV2*500: 48.808ms benchFastRedact*500: 11.550ms benchFastRedactRestore*500: 13.436ms benchNoirV2Wild*500: 61.383ms benchFastRedactWild*500: 31.472ms benchFastRedactWildRestore*500: 34.325ms benchFastRedactIntermediateWild*500: 128.071ms benchFastRedactIntermediateWildRestore*500: 128.602ms benchJSONStringify*500: 317.474ms benchNoirV2Serialize*500: 395.179ms benchFastRedactSerialize*500: 323.449ms benchNoirV2WildSerialize*500: 369.051ms benchFastRedactWildSerialize*500: 356.204ms benchFastRedactIntermediateWildSerialize*500: 426.994ms benchFastRedactIntermediateWildMatchWildOutcomeSerialize*500: 531.789ms benchFastRedactStaticMatchWildOutcomeSerialize*500: 340.540ms benchNoirV2CensorFunction*500: 38.953ms benchFastRedactCensorFunction*500: 53.361ms benchNoirV2*500: 54.057ms benchFastRedact*500: 15.949ms benchFastRedactRestore*500: 13.749ms benchNoirV2Wild*500: 34.787ms benchFastRedactWild*500: 44.642ms benchFastRedactWildRestore*500: 42.949ms benchFastRedactIntermediateWild*500: 162.311ms benchFastRedactIntermediateWildRestore*500: 106.921ms benchJSONStringify*500: 306.766ms benchNoirV2Serialize*500: 403.440ms benchFastRedactSerialize*500: 322.238ms benchNoirV2WildSerialize*500: 355.420ms benchFastRedactWildSerialize*500: 364.779ms benchFastRedactIntermediateWildSerialize*500: 399.256ms benchFastRedactIntermediateWildMatchWildOutcomeSerialize*500: 573.556ms benchFastRedactStaticMatchWildOutcomeSerialize*500: 336.521ms benchNoirV2CensorFunction*500: 27.110ms benchFastRedactCensorFunction*500: 47.454ms benchNoirV2*500: 38.437ms benchFastRedact*500: 10.272ms benchFastRedactRestore*500: 9.693ms benchNoirV2Wild*500: 18.504ms benchFastRedactWild*500: 30.266ms benchFastRedactWildRestore*500: 35.108ms benchFastRedactIntermediateWild*500: 131.794ms benchFastRedactIntermediateWildRestore*500: 110.691ms benchJSONStringify*500: 299.861ms benchNoirV2Serialize*500: 384.236ms benchFastRedactSerialize*500: 305.921ms benchNoirV2WildSerialize*500: 354.217ms benchFastRedactWildSerialize*500: 337.948ms benchFastRedactIntermediateWildSerialize*500: 399.507ms benchFastRedactIntermediateWildMatchWildOutcomeSerialize*500: 537.079ms benchFastRedactStaticMatchWildOutcomeSerialize*500: 340.104ms benchNoirV2CensorFunction*500: 16.021ms benchFastRedactCensorFunction*500: 31.100ms as
Add benchmarks for path arg to censor fn
This could effect the performance
@davidmarkclements I've pushed a commit that adds these benchmarks. There was already one benchmark using a censor function, but I've added a couple others that test the combination of a censor function (with and without the second Here are the results.... BeforeThese results are with none of the changes from this PR, but with the rework to the benchmarks from #22. These are taken straight from the OP in #22.
AfterSame benchmarks, but with the code changes from this PR.
The results look within noise to me, which I assume is because this PR is now pretty aggressive about not constructing/passing the path if censor isn't a function and (a new check) if its For the wholly-new benchmarks, here are the results I got:
Done |
@davidmarkclements Can you take another look at this? I'd love to get it merged. |
@davidmarkclements Ping :) |
sorry this took so long to get through @ethanresnick - thanks for the PR! |
released as v3.0.0 (I know it's likely a minor but we need to ensure we don't break anything in pino, so we'll do a manual upgrade other there) |
This PR calls the user's censor function with the path being censored as the second argument. I'm trying to support two different use cases here:
Sometimes different keys need to be redacted in different ways. For example, on an object representing a bank account, you might censor the account number by
xxxx
-ing out most of the numbers (but still leave a few for debugging), while censoring the account owner's name by removing it completely:Because fast-redact serializes the redacted object rather than returning a deep clone (for well-explained reasons), it's difficult to compose redaction logic. For example, suppose I have a few different business objects, like bank accounts, clients, etc., and that I want to define a separate redaction function for each object (in the same place where I define the object's schema etc). For example, I might define
redactAccount
as:Now, suppose I need to log a redacted version of an object like:
If my
redactAccount
et al functions returned censored objects, rather than strings, this would be as simple as:So, my thought is that, by passing the path to
censor
functions, it becomes possible to support cases like this. I.e., I can now define acomposedRedact
function like so:Fwiw, I explored approaches to composition that leveraged
serialize: false
andrestore()
, but ran up against the wall that the state stored internally for each redaction function returned byfastRedact
only covers the last object redacted, so it wasn't possible to redact a bunch of objects without serialization, collect and serialize the results, and then restore all of them.Given that limitation, I couldn't think of an approach for supporting composition that, at least absent a major refactor, wouldn't, under the hood, do more or less what the above
opts.censor
andopts.paths
are doing explicitly. (Although I certainly can imagine some more convenient APIs that could be added to abstract that basic logic.)