Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding document anonymizer #249

Merged

Conversation

dristysrivastava
Copy link
Collaborator

@dristysrivastava dristysrivastava commented Feb 29, 2024

Requirement: To anonymize document snippets in Pebblo report. As Pebblo is considered for environments beyond dev, anonymization will help distribute the report to more app stakeholders.

Issue Link: #224

Note:
To use this function:

from pebblo.entity_classifier.entity_classifier import EntityClassifier

text = <Document text>
entity_classifier_obj = EntityClassifier()
entities, total_count, anonymized_text = entity_classifier_obj.entity_classifier_obj.presidio_entity_classifier_and_anonymizer(text,anonymize_all_entities)
print(f"Entity Group: {entity_groups}")
print(f"Entity Count: {total_entity_count}")
print(f"Anonymized Text: {anonymized_text}")

Example:

text = """
Hello, My name is John. 
My AWS Access Key is: AKIAQIPT4PDORIRTV6PH
"""

Anomyzed Response: 
Hello, My name is  <PERSON>
My AWS Access Key is: <AWS_ACCESS_KEY>

Copy link
Collaborator

@shreyas-damle shreyas-damle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add one sample pdf report and/or Local UI screenshots showing anonymized contents.

pebblo/app/enums/enums.py Outdated Show resolved Hide resolved
pebblo/app/config/config.yaml Outdated Show resolved Hide resolved
pebblo/entity_classifier/entity_classifier.py Outdated Show resolved Hide resolved
pebblo/entity_classifier/entity_classifier.py Show resolved Hide resolved
pebblo/entity_classifier/entity_classifier.py Show resolved Hide resolved
pebblo/entity_classifier/entity_classifier.py Show resolved Hide resolved
@dristysrivastava dristysrivastava force-pushed the enhancement_document_anomyzer branch 2 times, most recently from 14a5f1c to a1bda3c Compare March 5, 2024 05:10
@dristysrivastava
Copy link
Collaborator Author

please add one sample pdf report and/or Local UI screenshots showing anonymized contents.

Adding report with anonymized document snippet
pebblo_report.pdf

@dristysrivastava dristysrivastava force-pushed the enhancement_document_anomyzer branch from 459daf3 to d69be22 Compare March 5, 2024 11:13
docs/gh_pages/docs/config.md Outdated Show resolved Hide resolved
pebblo/app/service/doc_helper.py Outdated Show resolved Hide resolved
tests/entity_classifier/mock_response.py Show resolved Hide resolved
pebblo/entity_classifier/entity_classifier.py Outdated Show resolved Hide resolved
@dristysrivastava dristysrivastava requested a review from Raj725 March 6, 2024 12:48
@dristysrivastava dristysrivastava requested a review from Raj725 March 7, 2024 06:09
@dristysrivastava dristysrivastava force-pushed the enhancement_document_anomyzer branch from 90ebb61 to 9bfa1f0 Compare March 7, 2024 06:34
@shreyas-damle shreyas-damle merged commit 43276a3 into daxa-ai:main Mar 7, 2024
12 of 16 checks passed
@dristysrivastava dristysrivastava changed the title Adding document anomyzer Adding document anonymizer Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants