Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(data-masking): add custom mask functionalities #5837

Merged
merged 26 commits into from
Feb 11, 2025

Conversation

anafalcao
Copy link
Contributor

@anafalcao anafalcao commented Jan 7, 2025

Issue number:
#5826

Summary

This PR enhances the data masking tool by introducing flexible masking options. These new features allow for dynamic, pattern-based, and regex-based masking, providing users with greater control over how sensitive data is obscured in using the erase method.

Changes

New flags for erase():

  • dynamic_mask (bool): Enables dynamic masking behavior when set to True, by maintaining the original length and structure of the text replacing with *.
    Example: dynamic_mask = True for 'Avenue St' is '****** **'
  • custom_mask (str): Specifies a simple pattern for masking data. This pattern is applied directly to the input string, replacing all the original characters.
    For example, with a mask_pattern of "XX-XX" applied to "12345", the result would be "XX-XX".
  • regex_pattern (str): Defines a regular expression pattern used to identify parts of the input string that should be masked. This allows for more complex and flexible masking rules. It's used in conjunction with mask_format.
  • mask_format (str): Specifies the format to use when replacing parts of the string matched by regex_pattern. It can include placeholders (like \1, \2) to refer to captured groups in the regex pattern, allowing some parts of the original string to be preserved.
    For example: 'example@email.com' could become 'e*****@email.com'
  • masking_rules (dict): Apply different rules (formats) for each data field.

User experience

Previously, users had limited options for masking sensitive data. The erase() function provided basic masking capabilities, typically replacing entire fields or values with a fixed mask (e.g., '*****').
With the new masking options, users now have much more control over how their sensitive data is obscured. The enhanced erase() function offers a range of flexible masking techniques to suit various use cases, including different techniques for each field:

from __future__ import annotations

from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.typing import LambdaContext

masker = DataMasking()

masking_rules = {
    "credit_card": {"custom_mask": "XX"},
    "street": {"dynamic_mask": True},
    "email": {"regex_pattern": r"(\w)[\w.-]+@([\w.-]+)", "mask_format": r"\1****@\2"}
}
masked_data = masker.erase(
    data={"credit_card": "1234-5678-9012-3456", "street": "Avenue St", "email": "user@example.com"},
    masking_rules=masking_rules
)
# Result: {"credit_card": "XX", "street": "****** **" , "email": "u****@example.com"}

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 7, 2025
@github-actions github-actions bot added the feature New feature or functionality label Jan 7, 2025
@boring-cyborg boring-cyborg bot added documentation Improvements or additions to documentation tests labels Jan 8, 2025
@anafalcao
Copy link
Contributor Author

Hi @leandrodamascena ! Can I have your help here with mypy? Thanks!

@anafalcao anafalcao marked this pull request as ready for review January 13, 2025 18:09
@anafalcao anafalcao requested review from a team and leandrodamascena January 13, 2025 18:09
@anafalcao
Copy link
Contributor Author

Hi @leandrodamascena!
I just converted to Ready for review. I've been having issues regarding Incompatible types in assignment with mypy. Can you take a look?
I also created some tests for the new functionalities, but I also may need to implement some more after fixing this types issues

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @anafalcao! Another round of review. This is a nice work, we just need to fix some things. 🚀

@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Jan 29, 2025
@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Jan 30, 2025
@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Feb 4, 2025
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Feb 4, 2025
Copy link
Contributor

github-actions bot commented Feb 4, 2025

⚠️Large PR detected⚠️

Please consider breaking into smaller PRs to avoid significant review delays. Ignore if this PR has naturally grown to this size after reviews.

@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Feb 4, 2025
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Feb 4, 2025
Copy link
Contributor

github-actions bot commented Feb 4, 2025

⚠️Large PR detected⚠️

Please consider breaking into smaller PRs to avoid significant review delays. Ignore if this PR has naturally grown to this size after reviews.

@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Feb 4, 2025
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Feb 4, 2025
Copy link
Contributor

github-actions bot commented Feb 4, 2025

⚠️Large PR detected⚠️

Please consider breaking into smaller PRs to avoid significant review delays. Ignore if this PR has naturally grown to this size after reviews.

@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Feb 4, 2025
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Feb 4, 2025
Copy link
Contributor

github-actions bot commented Feb 4, 2025

⚠️Large PR detected⚠️

Please consider breaking into smaller PRs to avoid significant review delays. Ignore if this PR has naturally grown to this size after reviews.

@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Feb 10, 2025
@leandrodamascena leandrodamascena self-requested a review February 11, 2025 10:14
Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVING it @anafalcao! THANK YOU SO MUCH!

Copy link

sonarqubecloud bot commented Feb 11, 2025

Quality Gate Passed Quality Gate passed

Issues
0 New issues
2 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

@leandrodamascena leandrodamascena merged commit 6ff9f11 into aws-powertools:develop Feb 11, 2025
12 checks passed
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Feb 11, 2025
sinofseven pushed a commit to sinofseven/powertools-lambda-python-my-extend that referenced this pull request Feb 13, 2025
)

* add custom mask functionalities

* change flags name to more intuitive

* fix type check error

* add draft documentation

* change doc examples

* style: format code with black

* fix format base

* add tests for new masks

* sub header for custom mask in docs

* masking rules to handle complex nest

* add test for masking rules

* modifications based on the feedback

* mypy and tests modification

* create more tests

* Refactoring tests

* Refactoring tests

* Refactoring tests

* Adding docstring + arg parameter

* Adding docstring + arg parameter

* Removing unnecessary code

* Removing unnecessary code

* Removing unnecessary code

---------

Co-authored-by: Leandro Damascena <lcdama@amazon.pt>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or functionality size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: support for custom masking with regx pattern or custom masking chars
3 participants