feat(data-masking): add support for Pydantic models, dataclasses, and standard classes #6413

VatsalGoel3 · 2025-04-06T09:22:59Z

Issue number: #3473

Summary

Changes

This PR adds support to the DataMasking utility to handle complex Python input types such as:

Pydantic models
Dataclasses
Standard Python classes with .dict() method

To support this, a new prepare_data function was introduced, which performs type introspection and converts the input data into a dictionary before processing.

This function is now invoked at the beginning of the erase, encrypt, and decrypt methods, allowing these methods to seamlessly accept structured objects in addition to primitive types like dict, str, list, etc.

User experience

Before:

from aws_lambda_powertools.utilities.data_masking import DataMasking
from pydantic import BaseModel

class MyModel(BaseModel):
    name: str
    age: int

data = MyModel(name="powertools", age=5)
masker = DataMasking()
masked = masker.erase(data, fields=["age"])  # ❌ This raised errors or did not work

After:

# ✅ Now works correctly and returns: {'name': 'powertools', 'age': '*****'}
masked = masker.erase(data, fields=["age"])

This allows customers to use the utility directly with modern application architectures that use type-safe data structures.

Checklist

Meet tenets criteria
I have performed a self-review of this change
Changes have been tested
Changes are documented
PR title follows conventional commit semantics

Is this a breaking change?

RFC issue number: N/A

Checklist:

Migration process documented
Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

…and standard classes (aws-powertools#3473)

VatsalGoel3 · 2025-04-06T09:24:02Z

@leandrodamascena, for now I applied the prepare_data() function as you suggested in the Issue, but I have an idea for making the function more robust and covering more edge cases, it would be like

"""
Recursively convert complex objects into dictionaries (or simple types) so that they can be
processed by the data masking utility. This function handles:

- Dataclasses (using dataclasses.asdict)
- Pydantic models (using model_dump)
- Custom classes with a dict() method
- Fallback to using __dict__ if available
- Recursively traverses dicts, lists, tuples, and sets
- Guards against circular references

Parameters
----------
data : Any
    The input data which may be a complex type.
_visited : set, optional
    Internal set of visited object IDs to prevent infinite recursion on cyclic references.

Returns
-------
Any
    A primitive type, or a recursively converted structure (dict, list, etc.)
"""

If that is more relevant, will implement this one.

leandrodamascena · 2025-04-06T11:22:51Z

@leandrodamascena, for now I applied the prepare_data() function as you suggested in the Issue, but I have an idea for making the function more robust and covering more edge cases, it would be like

"""
Recursively convert complex objects into dictionaries (or simple types) so that they can be
processed by the data masking utility. This function handles:

- Dataclasses (using dataclasses.asdict)
- Pydantic models (using model_dump)
- Custom classes with a dict() method
- Fallback to using __dict__ if available
- Recursively traverses dicts, lists, tuples, and sets
- Guards against circular references

Parameters
----------
data : Any
    The input data which may be a complex type.
_visited : set, optional
    Internal set of visited object IDs to prevent infinite recursion on cyclic references.

Returns
-------
Any
    A primitive type, or a recursively converted structure (dict, list, etc.)
"""

If that is more relevant, will implement this one.

Hey @VatsalGoel3, can you show me an example with some pseucode? Is the idea here like a raw dict containing keys that can be dict, Pydantic models and data class models? If so, I like the idea and would like to see an example.

leandrodamascena · 2025-04-06T11:29:07Z

Hi @VatsalGoel3, thanks a lot for another great contribution addressing complex issues in Powertools that will help customers. I'll review this tomorrow.

I was wondering if you are aware of the AWS Community Builder program. It sounds like you might want to check out this program and maybe apply. This program is for people who are helping the entire AWS ecosystem grow, creating content, making contributions, and for sure your contributions have actually helped customers using Powertools in TypeScript and Python.

Please note that I don't run this program, so I'm not saying whether you'll get accepted or not, but it's definitely worth checking out.

VatsalGoel3 · 2025-04-06T18:22:02Z

@leandrodamascena Yes, exactly – the idea is to take an input that might be a raw dictionary with keys whose values could be dictionaries, Pydantic models, dataclass instances, or even custom objects with a dict() method, and recursively convert all of them into plain dictionaries or simple types.

Here’s some pseudocode to illustrate the concept:

def prepare_data(data, _visited=None):
    # Initialize _visited set to keep track of seen objects and avoid circular references.
    if _visited is None:
        _visited = set()

    # If data is a simple type (str, int, float, bool, None), return it immediately.
    if isinstance(data, (str, int, float, bool, type(None))):
        return data

    # If we've seen this object already (by id), return it to avoid infinite recursion.
    if id(data) in _visited:
        return data
    _visited.add(id(data))

    # If data is a dataclass, use dataclasses.asdict() and recursively process it.
    if hasattr(data, "__dataclass_fields__"):
        return prepare_data(asdict(data), _visited=_visited)

    # If data is a Pydantic model, call model_dump() and process recursively.
    if callable(getattr(data, "model_dump", None)):
        return prepare_data(data.model_dump(), _visited=_visited)

    # If data has a dict() method and isn’t already a dict, use that.
    if callable(getattr(data, "dict", None)) and not isinstance(data, dict):
        return prepare_data(data.dict(), _visited=_visited)

    # If data is a dict, recursively process keys and values.
    if isinstance(data, dict):
        return {prepare_data(key, _visited=_visited): prepare_data(value, _visited=_visited)
                for key, value in data.items()}

    # If data is an iterable (list, tuple, or set), process each element recursively.
    if isinstance(data, (list, tuple, set)):
        return type(data)(prepare_data(item, _visited=_visited) for item in data)

    # If data has __dict__, use that as a fallback.
    if hasattr(data, "__dict__"):
        return prepare_data(vars(data), _visited=_visited)

    # If none of the above, return data as is.
    return data

VatsalGoel3 · 2025-04-06T18:23:48Z

Hi @VatsalGoel3, thanks a lot for another great contribution addressing complex issues in Powertools that will help customers. I'll review this tomorrow.

I was wondering if you are aware of the AWS Community Builder program. It sounds like you might want to check out this program and maybe apply. This program is for people who are helping the entire AWS ecosystem grow, creating content, making contributions, and for sure your contributions have actually helped customers using Powertools in TypeScript and Python.

Please note that I don't run this program, so I'm not saying whether you'll get accepted or not, but it's definitely worth checking out.

@leandrodamascena, thank you for letting me know, I was not aware of this, I have just applied while I believe the applications for this year is over, would love to be part of the program next year, also is there any way I can DM you for some advice.

Thank you

leandrodamascena · 2025-04-06T19:17:02Z

@leandrodamascena Yes, exactly – the idea is to take an input that might be a raw dictionary with keys whose values could be dictionaries, Pydantic models, dataclass instances, or even custom objects with a dict() method, and recursively convert all of them into plain dictionaries or simple types.

Here’s some pseudocode to illustrate the concept:

Thanks for sharing this! I really like this idea! We have something like this in this method https://github.com/aws-powertools/powertools-lambda-python/blob/develop/aws_lambda_powertools/event_handler/openapi/encoders.py#L29. In this case, we call this function recursively for each item in the JSON, I don't know if it makes sense here. What do you think?

leandrodamascena · 2025-04-06T19:19:18Z

@leandrodamascena, thank you for letting me know, I was not aware of this, I have just applied while I believe the applications for this year is over, would love to be part of the program next year, also is there any way I can DM you for some advice.

Thank you

Sure, send me an email at aws-powertools-maintainers@amazon.com and I’ll be more than happy to share my calendar with you. We can then schedule a meeting to talk about your contributions to Powertools, how we build community at Powertools, your challenges building workloads on AWS, and any other topics you’d like to share and we can help with.

VatsalGoel3 · 2025-04-06T19:29:05Z

@leandrodamascena Yes, exactly – the idea is to take an input that might be a raw dictionary with keys whose values could be dictionaries, Pydantic models, dataclass instances, or even custom objects with a dict() method, and recursively convert all of them into plain dictionaries or simple types.
Here’s some pseudocode to illustrate the concept:

Thanks for sharing this! I really like this idea! We have something like this in this method https://github.com/aws-powertools/powertools-lambda-python/blob/develop/aws_lambda_powertools/event_handler/openapi/encoders.py#L29. In this case, we call this function recursively for each item in the JSON, I don't know if it makes sense here. What do you think?

@leandrodamascena Yes, I think a recursive approach is exactly the right idea. It ensures that every nested element is processed and converted into a plain type that the data masking logic can handle.

leandrodamascena · 2025-04-06T19:30:58Z

@leandrodamascena Yes, I think a recursive approach is exactly the right idea. It ensures that every nested element is processed and converted into a plain type that the data masking logic can handle.

Super nice, please go ahead! Just please try to comment the code of this function to make it easier to understand for future changes.

codecov · 2025-04-06T19:38:22Z

Codecov Report

Attention: Patch coverage is 86.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 96.32%. Comparing base (ccdd002) to head (095f625).
Report is 2 commits behind head on develop.

Files with missing lines	Patch %	Lines
...s_lambda_powertools/utilities/data_masking/base.py	86.66%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #6413      +/-   ##
===========================================
- Coverage    96.33%   96.32%   -0.02%     
===========================================
  Files          243      243              
  Lines        11797    11812      +15     
  Branches       878      881       +3     
===========================================
+ Hits         11365    11378      +13     
- Misses         337      338       +1     
- Partials        95       96       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ta() with and updated tests

VatsalGoel3 · 2025-04-07T07:30:53Z

@leandrodamascena, I have the code with the function as we discussed also uodated the test code to provide more robust checking, I am unclear if I need to update any documentation for this, please let me know, If I can help with that

leandrodamascena · 2025-04-07T07:38:57Z

@leandrodamascena, I have the code with the function as we discussed also uodated the test code to provide more robust checking, I am unclear if I need to update any documentation for this, please let me know, If I can help with that

In this section we say that we don't support Pydantic/Dataclass and other data types, so it would be nice if we updated this with examples using Pydantic, Dataclass and other things. You can submit a first version of the modification and then I can review it to refine it.

Thanks again for this fantastic work.

VatsalGoel3 · 2025-04-07T08:12:32Z

📄 Documentation Update

Update the "Current limitations" section under `### Choosing parts of your data`

Replace:

We support JSON data types only - see data serialization for more details

With:

We support JSON data types and common Python objects such as Pydantic models, Dataclasses, and custom classes with dict() or __dict__.

✅ Add a dedicated `### Supported input types` section)

Supported input types

You can now use the erase operation on a variety of common Python object types. These are recursively converted into dictionaries so their fields can be masked appropriately.

Supported input types:

✅ Dictionaries & JSON strings
✅ Dataclasses
✅ Pydantic models (v2+ via .model_dump())
✅ Custom classes implementing dict() method
✅ Custom classes with __dict__ attribute

Pydantic Example

from pydantic import BaseModel
from aws_lambda_powertools.utilities.data_masking import DataMasking

class User(BaseModel):
    username: str
    password: str

masked = DataMasking().erase(User(username="test", password="123"), fields=["password"])
# Output: {'username': 'test', 'password': '*****'}

Dataclass Example

from dataclasses import dataclass
from aws_lambda_powertools.utilities.data_masking import DataMasking

@dataclass
class Customer:
    name: str
    ssn: str

masked = DataMasking().erase(Customer(name="Jane", ssn="123-45-6789"), fields=["ssn"])
# Output: {'name': 'Jane', 'ssn': '*****'}

Custom Class with dict()

class MyClass:
    def __init__(self):
        self.secret = "top"
        self.name = "public"

    def dict(self):
        return {"secret": self.secret, "name": self.name}

masked = DataMasking().erase(MyClass(), fields=["secret"])
# Output: {'secret': '*****', 'name': 'public'}

@leandrodamascena, I think these would be good, I did not wanted to make the changes directly in the repo, let me know what you think of this for first version

leandrodamascena · 2025-04-07T08:30:22Z

@leandrodamascena, I think these would be good, I did not wanted to make the changes directly in the repo, let me know what you think of this for first version

There is room to improve this, but just sent the commit and we can work together, ok?

…ustom classes and updated test code

VatsalGoel3 · 2025-04-07T08:58:55Z

@leandrodamascena, I have updated the docs

leandrodamascena

Hey @VatsalGoel3! I left some comments before we have another round of review.

aws_lambda_powertools/utilities/data_masking/base.py

docs/utilities/data_masking.md

…ts for supported input types and updated codebase

…ts for supported input types

leandrodamascena · 2025-04-09T22:09:09Z

I'm going to review this tomorrow morning.

sonarqubecloud · 2025-04-11T14:08:43Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

leandrodamascena

Hey @VatsalGoel3 thanks a lot for your contribution here, this is a super nice improvement!!

VatsalGoel3 · 2025-04-11T16:25:26Z

@leandrodamascena , you added the finishing touches. Thank you, was fun working with you on this.

… standard classes (#6413) * feat(data-masking): support masking of Pydantic models, dataclasses, and standard classes (#3473) * feat(data_masking): support complex input types via robust prepare_data() with and updated tests * docs(data-masking): add support docs for Pydantic, dataclasses, and custom classes and updated test code * docs(data-masking): update examples to use Lambda function entry points for supported input types and updated codebase * refactoring prepare_data method --------- Co-authored-by: Leandro Damascena <lcdama@amazon.pt>

feat(data-masking): support masking of Pydantic models, dataclasses, …

d7b1881

…and standard classes (aws-powertools#3473)

VatsalGoel3 requested a review from a team as a code owner April 6, 2025 09:23

VatsalGoel3 requested a review from anafalcao April 6, 2025 09:23

boring-cyborg bot added the tests label Apr 6, 2025

pull-request-size bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 6, 2025

Merge branch 'develop' into feat/masking-input-types-support

69c5ade

leandrodamascena assigned VatsalGoel3 Apr 6, 2025

leandrodamascena linked an issue Apr 6, 2025 that may be closed by this pull request

Feature request: Add support to mask/encrypt/decrypt Pydantic models, Dataclasses, and standard Python classes in the DataMasking utility #3473

Closed

2 tasks

leandrodamascena changed the title ~~feat(data-masking): support masking of Pydantic models, dataclasses, and standard classes (#3473)~~ feat(data-masking): add support for Pydantic models, dataclasses, and standard classes Apr 6, 2025

leandrodamascena requested review from leandrodamascena and removed request for anafalcao April 6, 2025 19:33

github-actions bot added the feature New feature or functionality label Apr 6, 2025

feat(data_masking): support complex input types via robust prepare_da…

33e0a09

…ta() with and updated tests

pull-request-size bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 7, 2025

Merge branch 'develop' into feat/masking-input-types-support

cc7d4e8

Merge branch 'develop' into feat/masking-input-types-support

e32da30

docs(data-masking): add support docs for Pydantic, dataclasses, and c…

a472ec9

…ustom classes and updated test code

boring-cyborg bot added the documentation Improvements or additions to documentation label Apr 7, 2025

Merge branch 'develop' into feat/masking-input-types-support

9710740

leandrodamascena requested changes Apr 7, 2025

View reviewed changes

VatsalGoel3 added 2 commits April 7, 2025 12:19

docs(data-masking): update examples to use Lambda function entry poin…

cd76226

…ts for supported input types and updated codebase

docs(data-masking): update examples to use Lambda function entry poin…

1538a57

…ts for supported input types

boring-cyborg bot added commons dependencies Pull requests that update a dependency file labels Apr 7, 2025

Merge branch 'develop' into feat/masking-input-types-support

9e64a21

leandrodamascena added 2 commits April 11, 2025 11:45

refactoring prepare_data method

b9d44ee

refactoring prepare_data method

095f625

anafalcao approved these changes Apr 11, 2025

View reviewed changes

leandrodamascena self-requested a review April 11, 2025 14:51

leandrodamascena approved these changes Apr 11, 2025

View reviewed changes

leandrodamascena merged commit 86cfdae into aws-powertools:develop Apr 11, 2025
10 of 12 checks passed

feat(data-masking): add support for Pydantic models, dataclasses, and standard classes #6413

feat(data-masking): add support for Pydantic models, dataclasses, and standard classes #6413

Uh oh!

Conversation

VatsalGoel3 commented Apr 6, 2025

Summary

Changes

User experience

Checklist

Acknowledgment

Uh oh!

VatsalGoel3 commented Apr 6, 2025

Uh oh!

leandrodamascena commented Apr 6, 2025

Uh oh!

leandrodamascena commented Apr 6, 2025

Uh oh!

VatsalGoel3 commented Apr 6, 2025

Uh oh!

VatsalGoel3 commented Apr 6, 2025

Uh oh!

leandrodamascena commented Apr 6, 2025

Uh oh!

leandrodamascena commented Apr 6, 2025

Uh oh!

VatsalGoel3 commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leandrodamascena commented Apr 6, 2025

Uh oh!

codecov bot commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

VatsalGoel3 commented Apr 7, 2025

Uh oh!

leandrodamascena commented Apr 7, 2025

Uh oh!

VatsalGoel3 commented Apr 7, 2025

📄 Documentation Update

Update the "Current limitations" section under ### Choosing parts of your data

✅ Add a dedicated ### Supported input types section)

Supported input types

Pydantic Example

Dataclass Example

Custom Class with dict()

Uh oh!

leandrodamascena commented Apr 7, 2025

Uh oh!

VatsalGoel3 commented Apr 7, 2025

Uh oh!

leandrodamascena left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leandrodamascena commented Apr 9, 2025

Uh oh!

sonarqubecloud bot commented Apr 11, 2025

Quality Gate passed

Uh oh!

leandrodamascena left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VatsalGoel3 commented Apr 11, 2025

Uh oh!

Uh oh!

VatsalGoel3 commented Apr 6, 2025 •

edited

Loading

codecov bot commented Apr 6, 2025 •

edited

Loading

Update the "Current limitations" section under `### Choosing parts of your data`

✅ Add a dedicated `### Supported input types` section)