-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Policy multiple datatypes and tag values #12
Feat: Policy multiple datatypes and tag values #12
Conversation
Thanks @ChrisBRAC The first sign looks impressive! Great job here and thanks for actively looking into this new feat! Thanks & sorry for this late resp 🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ChrisBRAC you did a really great job here! Thanks again for making this PR 👍
There are a few comments which should be a minor adjustment, can you help to look into them, I will ask for another peer review from my colleague if he/she is available by today while waiting for your updates.
By then, happy to release v1.1 this week 🚀
…ng of PR comments
Thanks @ChrisBRAC again! I will recheck it and get back to you tmr. If it's all good, let go v1.1 right away!
This is perfect, the prev intention is 1 tag - 1 masking policy, but with this new thing, I see it to be useless too
It's intentional to support multiple environments as we normally use dev,ci,uat,prod in the profiles.yml. Imagine when you'd like to manually clean up the tags, in order to mitigate the human mistake dropping prod tags, for example. |
Run CI manually and get ✅ result (TODO: be able to run CI with User keypair instead) |
Awesome works! All looks great to me except 2 things:
Thank you very much @ChrisBRAC for your contribution. |
integration_tests/models/behavior/verify_if_masking_policies_applied_correctly.sql
Show resolved
Hide resolved
FYI @ChrisBRAC @thu-IL I just pushed some commits to:
For (2), the original intention is to help for cleaning up the existing DWH tags&masking policies, including ones which are not managed by dbt-tags package. Since they're not perfectly working though, so let's move forward with current limitation (already mentioned in docs) and I created #14 for handling this need later on. @ChrisBRAC: Could you help to look at the changes again and Thu's comments, feel free to commit your new changes based on them if necessarily. I don't think there is a blocker now to release v1.1, but just let me know. Thank you! |
@il-dat Thinking about that intention it might be risky to try and have a macro which removes all tags and MPs that aren't a part of this package, as it may unknowingly remove some manual tags to objects in Snowflake, non-pii related or it could remove some manually applied policies. So it might be worth keeping it to dbt-tags tags only for now? |
@ChrisBRAC make sense with the first point, will add those as part of |
resolves #
This is a:
All pull requests from community contributors should target the
main
branch (default).Description & motivation
We are implementing Masking Policies to protect the PII in our Snowflake DWH and as we are using dbt we wanted to use it to be able to manage our tag-based masking policies. This package allows us to do that, but had some limitations which I have built new functionality for.
The first limitation was not being able to set multiple policies against a single tag. This is an issue for us because we have masking policies which do the exact same thing, but just have different input and output data types. For example, masking all date parts apart from the year value in a date such as a date of birth field. We have a DATE data type version and a TIMESTAMP_NTZ version, just in case any source systems we use have it as a datetime for whatever reason. Same thing for a generic NULL mask, we don't want to have a different tag and assigned policy for each different data type possible and have to figure out which tag to apply based on the data type of the column we're applying it to. We want a single tag that we apply every data type masking policy to and it just works, whichever data type the column is.
To fix this I have implemented a new optional list var in the project for policy data types. If you don't set it for any of the tags then it will default to how it worked before - looking for a MP with the same name as the tag. However, if you set a list of values for the tag in the var, then it will loop through those and add those to the suffix of the tag name and expect those as masking policies. I have added examples of this in the getting started guide.
The second limitation was not being able to set values for tags. We need this because we use the function system$get_tag_on_current_column() to be able to create a dynamic masking policy rather than creating multiple different masking policies. The example here is if we wanted to have a MP which only shows the last x characters, we don't want to create a masking policy for last 1 char, last 2 chars, last 3 chars, etc. Therefore this allows us to have a single masking policy and use the value assigned on the tag to make its application dynamic. Again, if you don't use the functionality that I have created in this new feature, then it defaults to how it worked before, assigning the name of the column as the tag value.
This is done by using the ~ separator when assigning the tag in the model schema yml file after the tag name, then defining the value. If you use the ~ character in any of your tag names, there is also an optional var defined which allows you to choose another character.
The unapply_mp_from_tags and drop_tags macros both did not work for their intended purpose (I'm basing this off their name), so I have fixed these and they now actually do as intended. This is as well as a few other macros and functions, which didn't seem to fully work as expected.
I also added some notes in the contribution guide as doing this in Windows, I came across multiple issues that took a while to figure out and resolve. I fixed the dbt-utils package version as I was having issues with running dbt deps, but this can be changed back if needed.
I have updated all documentation and tests, running through the integration tests multiple times to make sure it works as intended. Again, if you don't use any features in this release, then it works as it did prior to these new features being added.
Checklist