Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Policy multiple datatypes and tag values #12

Merged
2 changes: 2 additions & 0 deletions .github/workflows/ci-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ jobs:
poetry config installer.max-workers 1
poetry config virtualenvs.in-project true
poetry install
mkdir ~/.dbt
cp integration_tests/ci/ci.profiles.yml ~/.dbt/profiles.yml

- name: Code Quality
run: |
Expand Down
3 changes: 3 additions & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,6 @@ vars:
# dbt_tags__schema: tags # optional, target.schema if not specified
# dbt_tags__allowed_tags: [] # optional, allow all
dbt_tags__resource_types: ["model", "snapshot", "source"] # mandatory, find tags for only configured dbt resource types
# dbt_tags__policy_data_types: # optional, defaults to same policy name as tag name if not specified per policy
# - <tag-name>: ['datatype','list'] # list of tag names, assign list of datatypes. Suffixes policy name with datatypes on assigning policies to tags
# dbt_tags__tag_name_separator: ~ # optional, the default value is tilda. If you use ~ in your tag names then you can set a different separator character here
8 changes: 7 additions & 1 deletion docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,11 @@ poetry shell
poe git-hooks
```

ℹ️ If you receive an error on poetry install when trying to install dbt-tags, create an empty file `dbt-tags` in the root directory and try again, this should fix the issue.

### Get dbt profile ready

Please help to check [the sample script](https://github.com/infinitelambda/dbt-tags/blob/main/integration_tests/ci/sf-init.sql) to initialize Snowflake environment in `integreation_tests/ci` directory, and get your database freshly created.
Please help to check [the sample script](https://github.com/infinitelambda/dbt-tags/blob/main/integration_tests/ci/sf-init.sql) to initialize Snowflake environment in `integration_tests/ci` directory, and get your database freshly created.

Next, you should follow [dbt profile instruction](https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles) and setting up your dedicated profile. Again, you could [try our sample](https://github.com/infinitelambda/dbt-tags/blob/main/integration_tests/ci/profiles.yml) in the same above directory.

Expand Down Expand Up @@ -89,6 +91,10 @@ See here for details for running existing integration tests and adding new ones:

Once you've added all of these files, in the `poetry shell`, you should be able to run:

ℹ️ If you are using Windows OS, make sure that you [Developer Mode](https://learn.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development) activated on your machine. This is so that symlinks can be created by the command line when installing the main `dbt-tags` project into the `integration_tests` sub-project. If it is not enabled, it will go into an endless loop of installing the packages inside each other over and over again until you run out of HD space.

ℹ️ If you get an error when running the below due to a conflict when it is trying to run `dbt deps` from within `integration_tests` when using vs code, then this is likely due to a conflict with one or more extensions. Either disable extensions in vs code, such as dbt power user, or you can close vs code and run `dbt deps` directly from a command prompt window once you've cd'd into the correct folder.

```bash
poe dbt-tags-test
```
Expand Down
79 changes: 75 additions & 4 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ For each tag name, we need a corresponding macro that holds the masking policy d
Given a sample, we have a tag named `pii_name`, we'll create a macro file as below:

```sql
-- File path: /marcos/mp-ddl/create_masking_policy__pii_name.sql
-- File path: /macros/mp-ddl/create_masking_policy__pii_name.sql
{% macro create_masking_policy__pii_name(ns) -%}

create masking policy if not exists {{ ns }}.pii_name as (val string)
Expand All @@ -60,7 +60,78 @@ Given a sample, we have a tag named `pii_name`, we'll create a macro file as bel

> `{{ ns }}` or `ns` stands for the schema namespace, let's copy the same!

## 4. Deploy resources (tags, masking policies)
ℹ️ If you want to have multiple masking policies of different data types (they must be different data types) to a single tag, follow these steps:

Given a sample, we have a tag named `pii_null`, we'll create a macro file as below:

```sql
-- File path: /macros/mp-ddl/create_masking_policy__pii_null.sql
{% macro create_masking_policy__pii_null(ns) -%}

create masking policy if not exists {{ ns }}.pii_null_varchar as (val string)
returns string ->
case --/ your definition start here /--
when is_role_in_session('ROLE_HAS_PII_ACCESS') then val
else null
end;

create masking policy if not exists {{ ns }}.pii_null_number as (val number)
returns number ->
case --/ your definition start here /--
when is_role_in_session('ROLE_HAS_PII_ACCESS') then val
else null
end;

{%- endmacro %}
```

We then must modify the optional var `dbt_tags__policy_data_types` in the `dbt_project.yml` file:

```yml
vars:
dbt_tags__policy_data_types:
- pii_null: ['varchar','number']
```

These must match the exact same data_type suffix that has been applied to the name of the masking policies in the create macro. This will then assign both of these to the single tag, rather than having to manage multiple of the same tag for the different data types.

Leaving any tags out of the `dbt_tags__policy_data_types` var definition means that it will expect only a single masking policy which has the exact same name as the tag.

## 4. Set tags on columns

To assign tags to columns, you follow the same process as you would apply dbt tags to columns normally. This is done in the model schema yaml files.

By default, this package assigns the name of the column as the value of the tag. Because of how dbt tags work, there is no out of the box way to assign values for the Snowflake tags, so a separator ("~") has been configured within `dbt_tags` to facilitate this.

Setting a value for a tag can be useful for Security Governance querying in Snowflake. Or it can be used within a masking policy to allow some dynamic functionality using the Snowflake function `system$get_tag_on_current_column('fully.qualified.tag-name')`.

Looking at a model's schema yaml file:

- If you don't need a tag value

```yml
columns:
- name: first_name
description: Customer's first name. PII.
tags:
- pii_name
```

- If you do need a tag value

```yml
columns:
- name: membership_number
description: Customer's membership number. PII.
tags:
- pii_mask_last_x_characters~4
```

The value is then available to use either in the masking policy or in Snowflake.

ℹ️ `dbt tags` will only deploy tags that have been set on columns. If you have tags or masking policies which aren't assigned to columns, they won't be deployed.

## 5. Deploy resources (tags, masking policies)

❗We don't want to repeat this step on every dbt run(s).

Expand All @@ -82,7 +153,7 @@ Instead, let's do it as a step in the Production Release process (or manually).
dbt run-operation create_masking_policies --args '{debug: true}'
ChrisBRAC marked this conversation as resolved.
Show resolved Hide resolved
```

## 5. Apply tags to columns
## 6. Apply tags to columns

ℹ️ Currently, only column tags are supported!

Expand All @@ -98,7 +169,7 @@ models:
{% endif %}
```

## 6. Apply masking policies to tags
## 7. Apply masking policies to tags

ℹ️ Skip this step if you decide not to use masking policies, but only tags!

Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ Here are the full list of built-in variables:
- `dbt_tags__schema`
- `dbt_tags__allowed_tags`
- `dbt_tags__resource_types`
- `dbt_tags__policy_data_types`

## How to Contribute ❤️

Expand Down
24 changes: 12 additions & 12 deletions docs/util-drop-tag.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,19 @@ dbt run-opertion drop_tags \

- It scans all the Object Tags that were created in `analytics.demo` schema. Behind the scene script is:

```sql
show tags in schema analytics.demo;
select "database_name" || '.' || "schema_name" || '.' || "name" as tag_name
from table(result_scan(last_query_id()))
where "database_name" || '.' || "schema_name" ilike 'analytics.demo';
```
```sql
show tags in schema analytics.demo;
select "database_name" || '.' || "schema_name" || '.' || "name" as tag_name
from table(result_scan(last_query_id()))
where "database_name" || '.' || "schema_name" ilike 'analytics.demo';
```

- If exists any tags:
- Create a dummy masking policy function (A)
- For each object tag:
- Set masking policy to tag with the above (A) with _Force_
- Unset masking policy from tag with (A)
- Drop the tag
- Drop (A)
- For each object tag:
- Check if there are multiple datatypes for the associated masking policy
- Unset masking policy(ies) from the tag
- Drop the tag

- Done!

Note this will currently only drop masking policies which are assigned to the tags via the `dbt_tags` package. If you have manually assigned a masking policy to the tag, it will currently not unset it before trying to drop, and will fail.
7 changes: 2 additions & 5 deletions docs/util-unset-masking-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,7 @@ dbt run-opertion unapply_mps_from_tags \
```

- If exists any tags:
- Create a dummy masking policy function (A)
- For each object tag:
- Set masking policy to tag with the above (A) with _Force_
- Unset masking policy from tag with (A)
- Drop (A)
- Checks if masking policy has multiple data types
- Unset masking policy from tag with

- Done!
6 changes: 5 additions & 1 deletion integration_tests/ci/sf-init.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
use role sysadmin;
use warehouse wh_compute;
use warehouse compute_wh;
create or replace database dbt_tags with comment = 'Database for dbt_tags';

use role accountadmin;
Expand Down Expand Up @@ -36,5 +36,9 @@ grant all privileges on future views in database dbt_tags to role role_dbt_tags;
grant usage, create schema on database dbt_tags to role role_dbt_tags;
grant role role_dbt_tags to role sysadmin;

use role accountadmin;
grant apply masking policy on account to role role_dbt_tags;
grant apply tag on account to role role_dbt_tags;

use role role_dbt_tags;
use database dbt_tags;
7 changes: 7 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@ vars:
dbt_tags__database: dbt_tags
dbt_tags__schema: common
dbt_tags__resource_types: ["model", "snapshot", "source"]
dbt_tags__allowed_tags:
- pii_name
- pii_amount
- pii_null
- abc
dbt_tags__policy_data_types:
- pii_null: ['varchar','number']

models:
dbt_tags_test:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{% macro create_masking_policy__pii_null(ns) -%}

create masking policy if not exists {{ ns }}.pii_null_varchar as (val string)
returns string ->
case
when is_role_in_session('ROLE_DBT_TAGS') then val
else null
end;

create masking policy if not exists {{ ns }}.pii_null_number as (val number)
returns number ->
case
when is_role_in_session('ROLE_DBT_TAGS') then val
else null
end;

{%- endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,40 @@
{% if execute %}
{%- set dbt_project_tags = dbt_tags.get_dbt_tags() %}
{% endif %}
{% set policy_data_types_list = var('dbt_tags__policy_data_types', []) -%}
{%- set ns = dbt_tags.get_resource_ns() %}
{%- set database_name = ns.split('.')[0] %}
{%- set schema_name = ns.split('.')[1] %}

with dbt_project_tags as (

{%- for item in dbt_project_tags if ".column" in item['level'] -%}
{%- for item in dbt_project_tags if (".column" in item['level'] and dbt_tags.get_masking_policy_for_tag(item.tag)) -%}

{%- set masking_policy = dbt_tags.get_masking_policy_for_tag(item.tag) %}
{%- if masking_policy %}
{%- set masking_policy_name = masking_policy.get_name().split('__')[1] %}
{% for policy_data_types in policy_data_types_list if item.tag in policy_data_types.keys() %}
ChrisBRAC marked this conversation as resolved.
Show resolved Hide resolved
{% for datatype in policy_data_types.values() | first %}
{%- set masking_policy_name = masking_policy.get_name().split('__')[1] ~ "_" ~ datatype %}

select
lower('{{ item.tag }}') as tag,
lower('{{ masking_policy_name }}') as masking_policy_name,
lower('{{ database_name }}') as database_name,
lower('{{ schema_name }}') as schema_name

{%- if not loop.last %}
union all
{%- endif %}
{% endfor %}
{% else %}
{%- set masking_policy_name = masking_policy.get_name().split('__')[1] %}

select
lower('{{ item.tag }}') as tag,
lower('{{ masking_policy_name }}') as masking_policy_name,
lower('{{ database_name }}') as database_name,
lower('{{ schema_name }}') as schema_name
lower('{{ item.tag }}') as tag,
lower('{{ masking_policy_name }}') as masking_policy_name,
lower('{{ database_name }}') as database_name,
lower('{{ schema_name }}') as schema_name
{% endfor %}

{%- if not loop.last %}
union all
Expand All @@ -35,20 +53,37 @@ with dbt_project_tags as (

adapter_tags as (

{%- for item in dbt_project_tags if ".column" in item['level'] %}
{%- for item in dbt_project_tags if (".column" in item['level'] and dbt_tags.get_masking_policy_for_tag(item.tag)) %}

{%- set masking_policy = dbt_tags.get_masking_policy_for_tag(item.tag) %}
{%- if masking_policy %}
{% for policy_data_types in policy_data_types_list if item.tag in policy_data_types.keys() %}
{% for datatype in policy_data_types.values() | first %}
select
lower(ref_entity_name) as tag,
lower(policy_name) as masking_policy_name,
lower(ref_database_name) as database_name,
lower(ref_schema_name) as schema_name
from table(information_schema.policy_references(policy_name => '{{ database_name }}.{{ schema_name }}.{{ item.tag }}'))
from table(information_schema.policy_references(policy_name => '{{ database_name }}.{{ schema_name }}.{{ item.tag }}_{{ datatype }}'))
where true
and ref_entity_domain = 'TAG'
and lower(ref_entity_name) = '{{ item.tag }}'

{%- if not loop.last %}
union all
{%- endif %}
{% endfor %}
{% else %}
select
lower(ref_entity_name) as tag,
lower(policy_name) as masking_policy_name,
lower(ref_database_name) as database_name,
lower(ref_schema_name) as schema_name
from table(information_schema.policy_references(policy_name => '{{ database_name }}.{{ schema_name }}.{{ item.tag }}'))
where true
and ref_entity_domain = 'TAG'
and lower(ref_entity_name) = '{{ item.tag }}'
{% endfor %}
{%- if not loop.last %}
union all
{%- endif %}
Expand All @@ -60,7 +95,7 @@ adapter_tags as (

select config.database_name || '.' || config.schema_name || '.' || config.masking_policy_name as masking_policy,
config.database_name || '.' || config.schema_name || '.' || config.tag as dbt_project_tag,
actual.database_name || '.' || actual.schema_name || '.' || actual.masking_policy_name as adapter_tag
actual.database_name || '.' || actual.schema_name || '.' || actual.tag as adapter_tag
from dbt_project_tags as config
full join adapter_tags as actual
on actual.database_name = config.database_name
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ models:
- name: verify_if_masking_policies_applied_correctly
tests:
- dbt_utils.expression_is_true:
expression: adapter_tag is not null
expression: ifnull(adapter_tag, '') = dbt_project_tag
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,40 @@
{% if execute %}
{%- set dbt_project_tags = dbt_tags.get_dbt_tags() %}
{% endif %}
{% set policy_data_types_list = var('dbt_tags__policy_data_types', []) -%}
{%- set ns = dbt_tags.get_resource_ns() %}
{%- set database_name = ns.split('.')[0] %}
{%- set schema_name = ns.split('.')[1] %}

with dbt_project_masking_policies as (

{%- for item in dbt_project_tags if ".column" in item['level'] %}
{%- for item in dbt_project_tags if (".column" in item['level'] and dbt_tags.get_masking_policy_for_tag(item.tag)) %}
{%- set masking_policy = dbt_tags.get_masking_policy_for_tag(item.tag) %}
{%- set masking_policy_name = masking_policy.get_name().split('__')[1] if masking_policy else "" %}
{% for policy_data_types in policy_data_types_list if item.tag in policy_data_types.keys() %}
{% for datatype in policy_data_types.values() | first %}
{%- set masking_policy_name = masking_policy.get_name().split('__')[1] ~ "_" ~ datatype %}
select
lower('{{ item.tag }}') as tag,
lower('{{ masking_policy_name }}') as masking_policy_name,
iff(masking_policy_name = '', null, lower('{{ database_name }}')) as database_name,
iff(masking_policy_name = '', null, lower('{{ schema_name }}')) as schema_name

{%- if not loop.last %}
union all
{%- endif %}
{%- if not loop.last %}
union all
{%- endif %}
{% endfor %}
{% else %}

{%- set masking_policy_name = masking_policy.get_name().split('__')[1] if masking_policy else "" %}
select
lower('{{ item.tag }}') as tag,
lower('{{ masking_policy_name }}') as masking_policy_name,
iff(masking_policy_name = '', null, lower('{{ database_name }}')) as database_name,
iff(masking_policy_name = '', null, lower('{{ schema_name }}')) as schema_name
{% endfor %}
{%- if not loop.last %}
union all
{%- endif %}

{%- endfor %}
),
Expand Down
Loading
Loading