Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Policy multiple datatypes and tag values #12

Merged

Conversation

ChrisBRAC
Copy link
Contributor

resolves #

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

We are implementing Masking Policies to protect the PII in our Snowflake DWH and as we are using dbt we wanted to use it to be able to manage our tag-based masking policies. This package allows us to do that, but had some limitations which I have built new functionality for.

The first limitation was not being able to set multiple policies against a single tag. This is an issue for us because we have masking policies which do the exact same thing, but just have different input and output data types. For example, masking all date parts apart from the year value in a date such as a date of birth field. We have a DATE data type version and a TIMESTAMP_NTZ version, just in case any source systems we use have it as a datetime for whatever reason. Same thing for a generic NULL mask, we don't want to have a different tag and assigned policy for each different data type possible and have to figure out which tag to apply based on the data type of the column we're applying it to. We want a single tag that we apply every data type masking policy to and it just works, whichever data type the column is.
To fix this I have implemented a new optional list var in the project for policy data types. If you don't set it for any of the tags then it will default to how it worked before - looking for a MP with the same name as the tag. However, if you set a list of values for the tag in the var, then it will loop through those and add those to the suffix of the tag name and expect those as masking policies. I have added examples of this in the getting started guide.

The second limitation was not being able to set values for tags. We need this because we use the function system$get_tag_on_current_column() to be able to create a dynamic masking policy rather than creating multiple different masking policies. The example here is if we wanted to have a MP which only shows the last x characters, we don't want to create a masking policy for last 1 char, last 2 chars, last 3 chars, etc. Therefore this allows us to have a single masking policy and use the value assigned on the tag to make its application dynamic. Again, if you don't use the functionality that I have created in this new feature, then it defaults to how it worked before, assigning the name of the column as the tag value.
This is done by using the ~ separator when assigning the tag in the model schema yml file after the tag name, then defining the value. If you use the ~ character in any of your tag names, there is also an optional var defined which allows you to choose another character.

The unapply_mp_from_tags and drop_tags macros both did not work for their intended purpose (I'm basing this off their name), so I have fixed these and they now actually do as intended. This is as well as a few other macros and functions, which didn't seem to fully work as expected.

I also added some notes in the contribution guide as doing this in Windows, I came across multiple issues that took a while to figure out and resolve. I fixed the dbt-utils package version as I was having issues with running dbt deps, but this can be changed back if needed.

I have updated all documentation and tests, running through the integration tests multiple times to make sure it works as intended. Again, if you don't use any features in this release, then it works as it did prior to these new features being added.

Checklist

  • This code is associated with an Issue which has been triaged and accepted for development
  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • Snowflake
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)

@il-dat
Copy link
Collaborator

il-dat commented Jul 16, 2024

Thanks @ChrisBRAC

The first sign looks impressive! Great job here and thanks for actively looking into this new feat!
Let me review and get back to you as soon as possible!

Thanks & sorry for this late resp 🙌

Copy link
Collaborator

@il-dat il-dat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ChrisBRAC you did a really great job here! Thanks again for making this PR 👍

There are a few comments which should be a minor adjustment, can you help to look into them, I will ask for another peer review from my colleague if he/she is available by today while waiting for your updates.

By then, happy to release v1.1 this week 🚀

packages.yml Outdated Show resolved Hide resolved
macros/resources/tags/drop_tags.sql Outdated Show resolved Hide resolved
docs/getting-started.md Outdated Show resolved Hide resolved
macros/apply-tags/unapply_mps_from_tags.sql Outdated Show resolved Hide resolved
macros/apply-tags/apply_mps_to_tags.sql Outdated Show resolved Hide resolved
docs/getting-started.md Show resolved Hide resolved
@il-dat
Copy link
Collaborator

il-dat commented Jul 16, 2024

Thanks @ChrisBRAC again! I will recheck it and get back to you tmr. If it's all good, let go v1.1 right away!

I removed some of the information from the tag comment because it was related to the specific column it was found on in the config, however if it is applied to multiple columns, this is fairly useless info.

This is perfect, the prev intention is 1 tag - 1 masking policy, but with this new thing, I see it to be useless too

Also, not sure how useful the target name is in the comment of the tag either?

It's intentional to support multiple environments as we normally use dev,ci,uat,prod in the profiles.yml. Imagine when you'd like to manually clean up the tags, in order to mitigate the human mistake dropping prod tags, for example.

@il-dat
Copy link
Collaborator

il-dat commented Jul 17, 2024

Run CI manually and get ✅ result (TODO: be able to run CI with User keypair instead)

@thu-IL
Copy link
Collaborator

thu-IL commented Jul 17, 2024

Awesome works! All looks great to me except 2 things:

  1. verify_if_.. models are having duplicates, perhaps due to multiple data types in one masking policy, which I believe can be resolved quickly.
  2. unapply_mps_from_tags macro is triggering an error

Thank you very much @ChrisBRAC for your contribution.
cc: @il-dat

@il-dat
Copy link
Collaborator

il-dat commented Jul 17, 2024

FYI @ChrisBRAC @thu-IL

I just pushed some commits to:

  1. upgrade the dev deps
  2. fix the unapply & drop macros

For (2), the original intention is to help for cleaning up the existing DWH tags&masking policies, including ones which are not managed by dbt-tags package. Since they're not perfectly working though, so let's move forward with current limitation (already mentioned in docs) and I created #14 for handling this need later on.

@ChrisBRAC: Could you help to look at the changes again and Thu's comments, feel free to commit your new changes based on them if necessarily. I don't think there is a blocker now to release v1.1, but just let me know.

Thank you!

@ChrisBRAC
Copy link
Contributor Author

@il-dat
Don't know if you want to incorporate the unapply and drop macros as part of the poe dbt-tags-test run? I should've checked that it was actually in it as I assumed it was being tested and so missed these errors. Thanks for picking them up.

Thinking about that intention it might be risky to try and have a macro which removes all tags and MPs that aren't a part of this package, as it may unknowingly remove some manual tags to objects in Snowflake, non-pii related or it could remove some manually applied policies. So it might be worth keeping it to dbt-tags tags only for now?

@il-dat
Copy link
Collaborator

il-dat commented Jul 17, 2024

@il-dat Don't know if you want to incorporate the unapply and drop macros as part of the poe dbt-tags-test run? I should've checked that it was actually in it as I assumed it was being tested and so missed these errors. Thanks for picking them up.

Thinking about that intention it might be risky to try and have a macro which removes all tags and MPs that aren't a part of this package, as it may unknowingly remove some manual tags to objects in Snowflake, non-pii related or it could remove some manually applied policies. So it might be worth keeping it to dbt-tags tags only for now?

@ChrisBRAC make sense with the first point, will add those as part of poe dbt-tags-test command or a similar one
For the 2nd point, it should be separate macros to support that purpose, and yes, it should be keeping it to dbt-tags tags only for now 👍

@il-dat il-dat merged commit cf90c03 into infinitelambda:main Jul 17, 2024
@il-dat
Copy link
Collaborator

il-dat commented Jul 17, 2024

v1.1 should be GA now 🎉

release | docs | hub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In Getting Started page, add how to label specific columns with tags
3 participants