-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(data-masking): add support for Pydantic models, dataclasses, and standard classes #6413
base: develop
Are you sure you want to change the base?
feat(data-masking): add support for Pydantic models, dataclasses, and standard classes #6413
Conversation
…and standard classes (aws-powertools#3473)
@leandrodamascena, for now I applied the prepare_data() function as you suggested in the Issue, but I have an idea for making the function more robust and covering more edge cases, it would be like
If that is more relevant, will implement this one. |
Hey @VatsalGoel3, can you show me an example with some pseucode? Is the idea here like a raw dict containing keys that can be dict, Pydantic models and data class models? If so, I like the idea and would like to see an example. |
Hi @VatsalGoel3, thanks a lot for another great contribution addressing complex issues in Powertools that will help customers. I'll review this tomorrow. I was wondering if you are aware of the AWS Community Builder program. It sounds like you might want to check out this program and maybe apply. This program is for people who are helping the entire AWS ecosystem grow, creating content, making contributions, and for sure your contributions have actually helped customers using Powertools in TypeScript and Python. Please note that I don't run this program, so I'm not saying whether you'll get accepted or not, but it's definitely worth checking out. |
@leandrodamascena Yes, exactly – the idea is to take an input that might be a raw dictionary with keys whose values could be dictionaries, Pydantic models, dataclass instances, or even custom objects with a dict() method, and recursively convert all of them into plain dictionaries or simple types. Here’s some pseudocode to illustrate the concept:
|
@leandrodamascena, thank you for letting me know, I was not aware of this, I have just applied while I believe the applications for this year is over, would love to be part of the program next year, also is there any way I can DM you for some advice. Thank you |
Thanks for sharing this! I really like this idea! We have something like this in this method https://github.com/aws-powertools/powertools-lambda-python/blob/develop/aws_lambda_powertools/event_handler/openapi/encoders.py#L29. In this case, we call this function recursively for each item in the JSON, I don't know if it makes sense here. What do you think? |
Sure, send me an email at aws-powertools-maintainers@amazon.com and I’ll be more than happy to share my calendar with you. We can then schedule a meeting to talk about your contributions to Powertools, how we build community at Powertools, your challenges building workloads on AWS, and any other topics you’d like to share and we can help with. |
@leandrodamascena Yes, I think a recursive approach is exactly the right idea. It ensures that every nested element is processed and converted into a plain type that the data masking logic can handle. |
Super nice, please go ahead! Just please try to comment the code of this function to make it easier to understand for future changes. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #6413 +/- ##
========================================
Coverage 96.33% 96.33%
========================================
Files 243 243
Lines 11758 11770 +12
Branches 871 874 +3
========================================
+ Hits 11327 11339 +12
Misses 337 337
Partials 94 94 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ta() with and updated tests
@leandrodamascena, I have the code with the function as we discussed also uodated the test code to provide more robust checking, I am unclear if I need to update any documentation for this, please let me know, If I can help with that |
In this section we say that we don't support Pydantic/Dataclass and other data types, so it would be nice if we updated this with examples using Pydantic, Dataclass and other things. You can submit a first version of the modification and then I can review it to refine it. Thanks again for this fantastic work. |
📄 Documentation UpdateUpdate the "Current limitations" section under
|
There is room to improve this, but just sent the commit and we can work together, ok? |
…ustom classes and updated test code
@leandrodamascena, I have updated the docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @VatsalGoel3! I left some comments before we have another round of review.
@@ -26,7 +26,72 @@ | |||
|
|||
logger = logging.getLogger(__name__) | |||
|
|||
|
|||
def prepare_data(data: Any, _visited: set[int] | None = None) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruff is complaining that this function is complex with too many returns (https://docs.astral.sh/ruff/rules/too-many-return-statements/). Although I understand that returning early avoids if
checks and stuff like that. Do you see room to improve this function? If you want, I can try to optimize this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have resolved the other issues, would you help me with this, I would love your input on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for resolving the comments @VatsalGoel3! Let me work in this function to see if I can improve it.
…ts for supported input types and updated codebase
…ts for supported input types
|
Issue number: #3473
Summary
Changes
This PR adds support to the DataMasking utility to handle complex Python input types such as:
.dict()
methodTo support this, a new
prepare_data
function was introduced, which performs type introspection and converts the input data into a dictionary before processing.This function is now invoked at the beginning of the
erase
,encrypt
, anddecrypt
methods, allowing these methods to seamlessly accept structured objects in addition to primitive types likedict
,str
,list
, etc.User experience
Before:
After:
This allows customers to use the utility directly with modern application architectures that use type-safe data structures.
Checklist
Is this a breaking change?
RFC issue number: N/A
Checklist:
Acknowledgment
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.