-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removing sensitive data from json form #277
Comments
There are 2 serialization modes: all and "minimal" So what I would look into is introducing new Then gen code for "conditional/minimal" serialization would inject this hardcoded methods for evaluation instead. I'm assuming we would want to allow this custom method per property which would be used over object defined one when defined. Plus a way to make it always serialize without going through the code and checking it. |
Hi it would need more tests Not sure that I have understood the relationship between the ObjectFormatPolicy and the setting in the DSL Happy to discuss The are a few extra methods added to the
|
I'll try to find some more time to review it this or next week, but on a first glance here are some comments:
All in all, you seem to be on a good track, but try to stick to this two rules: no allocation allowed by default and sealed and final should stay as such |
Hi I havent profiled it, but I don't think that there are allocations outside of the static initialiser. I was trying to leave to current path unaffected to retain performance. Injecting some property into the |
the changes are incomplete and here 93e4f14 adding this functionality without subclassing JsonWriter, and having a policy to allow filtering is more invasive, and the code is only vetting the key values, not any writes to the values. Doing the writes to the values would slow down the write path, and I would have to disambiguate the writes to be vetted from the writes that are 'safe' the generated code is now a static block for all of the class data ( which would be used in any option I think
and the generated controlled writer looks like
I would appreciate comments on if this is the way that you want to proceed. I have left much of the other implementation in there still (I could not see a reason to remove it yet) |
@zapov do you have any comments? |
Hi, was a bit busy, but it should be better now. So... I'm not a fan of introducing various structures into dsl-json which are meant for this logic. People have all kind of use cases and they should be managing the logic via their own data structures. So those ClassInfo and similar is not something I want to have in the library. But... I think we should go a bit slower (not only because the PR was too large - would much prefer 2 PRs: one for rename and other for logic)... so lets try to cover the basics first. Can you provide some usage examples how people would configure that and use it within dsl-json (though pseudo-code, not via PR). When we agree on that we can move forward. So, first of all.. is the goal of this to have serialization which is:
or something in between? So for me... currently you have this logic in generated code
I would prefer implementation which mostly does in this case something along the lines of
I'm even ok if there is an option to have that in class (eg when external serializer is not defined) - which is aligned with naming strategy
In the end... what seems to me that it makes sense is that you are able to define this rule on the property itself in a similar manner (but we can leave that for later) |
Hi I only introduced the So the requirements that I have as that the redaction/filtering of the data must be at run time (we can do some compile time filtering as well), but we have to be able to determine that we have to hide some data that is leaking and remediate that, without having to re-release code. The data configuring what to look for and how to interact with replacing that is regex expressions, and some constant strings We have full control of the DSL and the environment that is used for this writing, and we don't need to make these setting affect all json serialisation on that JVM. Its does have to be thread safe, as used in logging code There are 2 types of data that we want to filter
We generally want to leave a marker to say that the data was redacted, rather then just hiding it, and downstream processes then kick in and we can do the compile time filtering potentially, in the slower cycle on the next release We need to maintain the json structure as valid in the output, clearly So in some cases we want to ignore a key/value entirely, and in some cases we want to write a different value, with some/all of the text changed, so its harder than just Because you don't want to subclass the You asked to have this implementation leveraging the minimum logic, which meant that the logic need to determine if the value is default (null, o etc) without boxing, which causes some bloating of the interface The reordering the fields that I built isn't a requirement, but it some of came for free with the other infra that I built, but we can drop that Agreed that it would be good to have this on a property level, but that is a custom writer (as I see it) and its not the pressing requirement that I have, its an improvement (for me) to use something like
we would have to read |
For me those are 2 separate problems. What dsl-json does not have ATM is this logic when you don't want to put anything out. Thats why I was suggesting to only cover that case. Technically if we extend the API to pass in writer, you can write something out and then tell dsl-json not to serialize the regular property. I'm kind of ok with it because I think it would work out of the box, although I think it would be better not to allow it for now (as there might come some future need to prevent it). |
I think that we are talking at cross purposes For specific field types - For masking the values - You are right that there are 2 separate requirements, but you solution, if I understand it only addresses one of then, and requires a rebuild, which makes it unappropraite for our needs |
So what that sounds to me is like we can certainly improve the signature of DslJson's tryFindReader, tryFindWriter and tryFindBinder to pass in more information there (location of the object, eg its class and property name) so you can handle this rules whenever something is trying to be written. This way you can instead return a writer which will mask the value, while knowing in which object and for which property this is, to be able to apply the rules - do not leak this property of this object. The signature I suggested for deciding if we want to write out the field at all would certainly let you do the same (as you have the instance - so can take the instance class, its passing the property name and you have the DslJson instance which can contain this rules (as you can provide your own DslJson instance). |
So - what are you expecting the generated code to look like (roughly) |
Are you asking about I would add
deprecate the old methods and redirect them to this one.
have something like
btw. you didn't say does this address your need in full (at least around this masking needs) |
I presume you meant this would allow us to control the writing, and allow us to whitelist some types (e.g. for values that are numbers and booleans) It would mean that we end up duplicating the writing code for array, maps etc, as we would need to validate the data prior to writing it, rather than interception the write. I don't think that there is a lot of code here though It would mean that it would not work with a customed writer as I see it. E.g. one that writes something not generated by the plugin, but hand crafted It would also mean that we could not omit a field, or change the property name written Looking in Have I understood what you are proposing? Just want to confirm I have understood, and that we are talking about the same thing, before I spend much effort of thinking & coding |
Yes, while I'm mostly trying to confirm that extending the tryFind API in such a way does resolve your use case (your second point in requirements) As for the second part, I've looked over the API and it gets a bit more complicated with managing boolean hasWritten if we allow passing the writer around, so I would avoid that for now. This does prevent you from writing some other name instead, but seems simpler to reason and good enough. Anyway... I would support this by having
which would point to class such as Generated code would look something like
The other question is should we just leave Unrelated, but by looking at the code it seems to me that there is a bug in this serialized code and there is a missing check to ensure size of the buffer. Eg in front of
there should be
just to make sure that when flushed to stream we can still go back one place due to conditional ending. |
Its a bit similar to the problem in #266
I have two requirements to not expose sensitive data
Ideally we would do (2) based on data at runtime
I could imagine that we could do (1) by some annotation that runs at compile time (but I am new to this library), and currently this is also expected to be a runtime setting. Maybe it could be both
For (2) I would imagine that the easiest solution would be subclassing JsonWriter (but this is a final class).
Happy to work on this with someone if that helps.
My first thought is that if we could make JsonWriter non final, I could do (2)
And then expose the field that we are writing (as a complile switch?) then this could do (1) as well, but I could cope without this probably
What are you thoughts
The text was updated successfully, but these errors were encountered: