Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROD-2134: Adds validation for taxonomy labels #4982

Merged
merged 7 commits into from
Jun 14, 2024

Conversation

erosselli
Copy link
Contributor

@erosselli erosselli commented Jun 13, 2024

Closes PROD-2134

Description Of Changes

When creating or updating a System, we now validate that it references DataUses, DataCategories, and DataSubjects that both exist and have their active attribute set to True.

Code Changes

  • Updated the validate_privacy_declarations method to add validation for DataCategories and DataSubjects and to update the validation for DataUses
  • Added relevant unit tests

Steps to Confirm

Testing for invalid taxonomy labels

  1. Go to the POST api/v1/system endpoint in the Swagger UI, or alternatively make the request from Postman
  2. Use an invalid category for a privacy declaration in the request body, see example JSON below (invalid category is "user.contact.company_website"):
{
    "system_type": "Third Party",
    "fides_key": "salesforce_system",
    "name": "Salesforce",
    "description": "Salesforce operates as a business development and lead generation tool (Salesforce Interaction Studio and Salesforce Sales Cloud), it is also used for customer operations (support) personnel management and ticket management for customers (Salesforce Service Cloud)!",
    "dataset_references": [],
    "tags": [
        "Customer-SM"
    ],
    "processes_personal_data": true,
    "exempt_from_privacy_regulations": false,
    "privacy_declarations": [
        {
            "name": "Marketing Campaigns",
            "data_categories": [
                "user.contact",
                "user.contact.company_website"
            ],
            "data_use": "marketing",
            "data_subjects": [
                "visitor"
            ],
            "dataset_references": [
                "saas_salesforce"
            ],
            "egress": null,
            "ingress": null,
            "features": [],
            "flexible_legal_basis_for_processing": false,
            "legal_basis_for_processing": "Consent",
            "impact_assessment_location": null,
            "retention_period": "No retention or erasure policy",
            "processes_special_category_data": false,
            "special_category_legal_basis": null,
            "data_shared_with_third_parties": false,
            "third_parties": null,
            "shared_categories": [],
            "cookies": [],
            "id": "pri_b688c9e9-59f1-44f7-a16b-33681bd3725b"
        }
    ],
    "vendor_id": null,
    "ingress": [
        {
            "fides_key": "platform_system",
            "type": "system",
            "data_categories": []
        },
        {
            "fides_key": "hubspot_system",
            "type": "system",
            "data_categories": []
        }
    ],
    "egress": [
        {
            "fides_key": "de_snowflake",
            "type": "system",
            "data_categories": []
        },
        {
            "fides_key": "outreach_system",
            "type": "system",
            "data_categories": null
        },
        {
            "fides_key": "hubspot_system",
            "type": "system",
            "data_categories": null
        }
    ],
    "meta": null,
    "fidesctl_meta": null,
    "organization_fides_key": "default_organization",
    "dpa_progress": null,
    "previous_vendor_id": null,
    "cookies": [],
    "uses_cookies": false,
    "cookie_refresh": false,
    "uses_non_cookie_access": false,
    "uses_profiling": false,
    "does_international_transfers": true,
    "legal_basis_for_transfers": [],
    "requires_data_protection_assessments": false,
    "legal_name": null,
    "legal_address": "",
    "responsibility": [
        "Controller"
    ],
    "dpo": "",
    "data_security_practices": "",
    "administrating_department": "Business Systems and Sales & Success Teams"
}

  1. Should receive a 400 response with message
{
  "detail": "Invalid privacy declaration referencing unknown DataCategory user.contact.company_website"
}

The same can be done for invalid data uses and data subjects.

Testing for disabled taxonomy labels

  1. In the Admin UI, create the "user.contact.company_website" Data Category and mark it as disabled using the toggle button
  2. Go to the POST api/v1/system endpoint in the Swagger UI, or alternatively make the request from Postman
  3. Use the disabled category "user.contact.company_website" for a privacy declaration in the request body, see example JSON below:
{
    "system_type": "Third Party",
    "fides_key": "salesforce_system",
    "name": "Salesforce",
    "description": "Salesforce operates as a business development and lead generation tool (Salesforce Interaction Studio and Salesforce Sales Cloud), it is also used for customer operations (support) personnel management and ticket management for customers (Salesforce Service Cloud)!",
    "dataset_references": [],
    "tags": [
        "Customer-SM"
    ],
    "processes_personal_data": true,
    "exempt_from_privacy_regulations": false,
    "privacy_declarations": [
        {
            "name": "Marketing Campaigns",
            "data_categories": [
                "user.contact",
                "user.contact.company_website"
            ],
            "data_use": "marketing",
            "data_subjects": [
                "visitor"
            ],
            "dataset_references": [
                "saas_salesforce"
            ],
            "egress": null,
            "ingress": null,
            "features": [],
            "flexible_legal_basis_for_processing": false,
            "legal_basis_for_processing": "Consent",
            "impact_assessment_location": null,
            "retention_period": "No retention or erasure policy",
            "processes_special_category_data": false,
            "special_category_legal_basis": null,
            "data_shared_with_third_parties": false,
            "third_parties": null,
            "shared_categories": [],
            "cookies": [],
            "id": "pri_b688c9e9-59f1-44f7-a16b-33681bd3725b"
        }
    ],
    "vendor_id": null,
    "ingress": [
        {
            "fides_key": "platform_system",
            "type": "system",
            "data_categories": []
        },
        {
            "fides_key": "hubspot_system",
            "type": "system",
            "data_categories": []
        }
    ],
    "egress": [
        {
            "fides_key": "de_snowflake",
            "type": "system",
            "data_categories": []
        },
        {
            "fides_key": "outreach_system",
            "type": "system",
            "data_categories": null
        },
        {
            "fides_key": "hubspot_system",
            "type": "system",
            "data_categories": null
        }
    ],
    "meta": null,
    "fidesctl_meta": null,
    "organization_fides_key": "default_organization",
    "dpa_progress": null,
    "previous_vendor_id": null,
    "cookies": [],
    "uses_cookies": false,
    "cookie_refresh": false,
    "uses_non_cookie_access": false,
    "uses_profiling": false,
    "does_international_transfers": true,
    "legal_basis_for_transfers": [],
    "requires_data_protection_assessments": false,
    "legal_name": null,
    "legal_address": "",
    "responsibility": [
        "Controller"
    ],
    "dpo": "",
    "data_security_practices": "",
    "administrating_department": "Business Systems and Sales & Success Teams"
}

  1. Should receive a 400 response with message
{
  "detail": "Invalid privacy declaration referencing inactive DataCategory user.contact.company_website"
}
  1. If you enable the created Data Category, you should be able to re-trigger the request and have it succeed.

The same can be done for disabled data uses and data subjects.

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation:
    • documentation complete, PR opened in fidesdocs
    • documentation issue created in fidesdocs
    • if there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md
  • For API changes, the Postman collection has been updated
  • If there are any database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!

Copy link

vercel bot commented Jun 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Jun 14, 2024 2:30pm

Comment on lines -9 to +10
from alembic import op
import sqlalchemy as sa

from alembic import op
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running the static_checks command caused this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including this cleanup here 👍

from fides.api.schemas.messaging.messaging import MessagingMethod, MessagingActionType
from fides.api.schemas.messaging.messaging import MessagingActionType, MessagingMethod
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same happened here

@@ -48,27 +52,40 @@ def get_system(db: Session, fides_key: str) -> System:
return system


async def validate_data_labels(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure what the best name for this was so suggestions are appreciated 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, a docstring added to this function will add further clarification

Copy link
Contributor Author

@erosselli erosselli Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the docstring in 61b7cea

Copy link

cypress bot commented Jun 13, 2024

Passing run #8330 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Merge f5091b2 into c9124dd...
Project: fides Commit: 179584e157 ℹ️
Status: Passed Duration: 00:34 💡
Started: Jun 14, 2024 2:41 PM Ended: Jun 14, 2024 2:41 PM

Review all test suite changes for PR #4982 ↗︎

Copy link

codecov bot commented Jun 13, 2024

Codecov Report

Attention: Patch coverage is 73.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 86.59%. Comparing base (c9124dd) to head (f5091b2).

Files Patch % Lines
src/fides/api/db/system.py 69.23% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4982      +/-   ##
==========================================
- Coverage   86.59%   86.59%   -0.01%     
==========================================
  Files         349      349              
  Lines       21620    21628       +8     
  Branches     2867     2869       +2     
==========================================
+ Hits        18722    18728       +6     
- Misses       2394     2395       +1     
- Partials      504      505       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@erosselli erosselli force-pushed the PROD-2134_validate_taxonomy_labels branch from 06e0963 to ce35806 Compare June 13, 2024 17:45
@erosselli erosselli marked this pull request as ready for review June 13, 2024 17:47
@pattisdr pattisdr self-requested a review June 13, 2024 17:56
@erosselli erosselli force-pushed the PROD-2134_validate_taxonomy_labels branch from 81d86af to 9df6aaf Compare June 13, 2024 18:42
@pattisdr
Copy link
Contributor

Starting review -

@@ -48,27 +52,47 @@ def get_system(db: Session, fides_key: str) -> System:
return system


async def validate_data_labels(
db: AsyncSession, sql_model: Base, labels: List[FidesKey]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically I'd want to type sql_model with something more specific than Base since I'm assuming that resource has an active property but wasn't sure if this was something I could do 😅 I'm not too familiar with Python type annotations so don't know how complex I can get with the types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most things inherit from Base so it's technically correct, but I think it might be too broad here, especially since Base isn't guaranteed to have the active property. For now, since you're calling this in one place, I might annotate with the known resources you're calling with, a data category, a data subject, and a use, like:

sql_model: Union[Type[DataUse], Type[DataSubject], Type[DataCategory]]

The Type is there because it's not an instance of a DataUse you're passing in, rather it's just a class definition

Copy link
Contributor Author

@erosselli erosselli Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this in 61b7cea 😄

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent first PR - detailed test coverage and manual testing steps for QA, and a self-review to help the code reviewer focus more in certain areas. Love it.

A couple of very minor things requested, there's a few failing tests still, and then you'll need to add an entry to CHANGELOG.md as well -

Comment on lines -9 to +10
from alembic import op
import sqlalchemy as sa

from alembic import op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including this cleanup here 👍

@@ -48,27 +52,40 @@ def get_system(db: Session, fides_key: str) -> System:
return system


async def validate_data_labels(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, a docstring added to this function will add further clarification

@@ -48,27 +52,47 @@ def get_system(db: Session, fides_key: str) -> System:
return system


async def validate_data_labels(
db: AsyncSession, sql_model: Base, labels: List[FidesKey]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most things inherit from Base so it's technically correct, but I think it might be too broad here, especially since Base isn't guaranteed to have the active property. For now, since you're calling this in one place, I might annotate with the known resources you're calling with, a data category, a data subject, and a use, like:

sql_model: Union[Type[DataUse], Type[DataSubject], Type[DataCategory]]

The Type is there because it's not an instance of a DataUse you're passing in, rather it's just a class definition

status_code=HTTP_400_BAD_REQUEST,
detail=f"Invalid privacy declaration referencing unknown DataUse {privacy_declaration.data_use}",
)
await validate_data_labels(db, DataUse, [privacy_declaration.data_use])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, putting this in a list so validate_data_labels can be reused 👍

@erosselli erosselli force-pushed the PROD-2134_validate_taxonomy_labels branch from df721d6 to 2026d64 Compare June 14, 2024 14:29
@erosselli erosselli force-pushed the PROD-2134_validate_taxonomy_labels branch from 2026d64 to f5091b2 Compare June 14, 2024 14:30
Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @erosselli! Test coverage looks good to me. This can be merged 👍

@erosselli erosselli merged commit 3b92387 into main Jun 14, 2024
41 of 42 checks passed
@erosselli erosselli deleted the PROD-2134_validate_taxonomy_labels branch June 14, 2024 16:56
Copy link

cypress bot commented Jun 14, 2024

Passing run #8332 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

PROD-2134: Adds validation for taxonomy labels (#4982)
Project: fides Commit: 3b92387d4e
Status: Passed Duration: 00:34 💡
Started: Jun 14, 2024 5:07 PM Ended: Jun 14, 2024 5:08 PM

Review all test suite changes for PR #4982 ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants