Skip to content

Commit

Permalink
Redshift Query Execution [#74] (#131)
Browse files Browse the repository at this point in the history
* Add foundation to make Redshift access request queries.

* Fix connection config and dataset config tests that look at a new dataset being added.

* Add support for performing redshift erasures.

* Only mask state field in erasure test - contact is too broad, and was preventing the customer record from getting cleaned up.

* Remove unnecessary assertion and access values as attributes of RowProxy not tuples.

* Add Amazon Redshift example to docs.

* Remove breakpoint, argh.

* Use SQLAlchemy TextClause instead of passing in raw string.

* Add request attribute to mask_data and pass into generate_update_stmt.
  • Loading branch information
pattisdr authored Dec 22, 2021
1 parent ed7ea41 commit 22a838c
Show file tree
Hide file tree
Showing 13 changed files with 547 additions and 27 deletions.
1 change: 1 addition & 0 deletions .github/workflows/unsafe_pr_checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,6 @@ jobs:
- name: Integration Tests (External)
env:
REDSHIFT_TEST_URI: ${{ secrets.REDSHIFT_TEST_URI }}
REDSHIFT_TEST_DB_SCHEMA: ${{ secrets.REDSHIFT_TEST_DB_SCHEMA }}
SNOWFLAKE_TEST_URI: ${{ secrets.SNOWFLAKE_TEST_URI }}
run: make pytest-integration-external
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ pytest-integration-erasure: compose-build
# These tests connect to external third-party test databases
pytest-integration-external: compose-build
@echo "Running tests that connect to external third party test databases"
@docker-compose run -e REDSHIFT_TEST_URI -e SNOWFLAKE_TEST_URI $(IMAGE_NAME) \
@docker-compose run -e REDSHIFT_TEST_URI -e SNOWFLAKE_TEST_URI -e REDSHIFT_TEST_DB_SCHEMA $(IMAGE_NAME) \
pytest $(pytestpath) -m "integration_external"


Expand Down
225 changes: 225 additions & 0 deletions data/dataset/redshift_example_test_dataset.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
dataset:
- fides_key: redshift_example_test_dataset
name: Redshift Example Test Dataset
description: Example of a Redshift dataset containing a variety of related tables like customers, products, addresses, etc.
collections:
- name: address
fields:
- name: city
data_categories: [user.provided.identifiable.contact.city]
- name: house
data_categories: [user.provided.identifiable.contact.street]
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: state
data_categories: [user.provided.identifiable.contact.state]
- name: street
data_categories: [user.provided.identifiable.contact.street]
- name: zip
data_categories: [user.provided.identifiable.contact.postal_code]

- name: customer
fields:
- name: address_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: address.id
direction: to
- name: created
data_categories: [system.operations]
- name: email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
data_type: string
- name: id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
primary_key: True
- name: name
data_categories: [user.provided.identifiable.name]
fidesops_meta:
data_type: string
length: 40

- name: employee
fields:
- name: address_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: address.id
direction: to
- name: email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
data_type: string
- name: id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
primary_key: True
- name: name
data_categories: [user.provided.identifiable.name]
fidesops_meta:
data_type: string

- name: login
fields:
- name: customer_id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: time
data_categories: [user.derived.nonidentifiable.sensor]

- name: order
fields:
- name: customer_id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: shipping_address_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: address.id
direction: to

# order_item
- name: order_item
fields:
- name: order_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: order.id
direction: from
- name: product_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: product.id
direction: to
- name: quantity
data_categories: [system.operations]

- name: payment_card
fields:
- name: billing_address_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: address.id
direction: to
- name: ccn
data_categories: [user.provided.identifiable.financial.account_number]
- name: code
data_categories: [user.provided.identifiable.financial]
- name: customer_id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: name
data_categories: [user.provided.identifiable.financial]
- name: preferred
data_categories: [user.provided.nonidentifiable]

- name: product
fields:
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: name
data_categories: [system.operations]
- name: price
data_categories: [system.operations]

- name: report
fields:
- name: email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
data_type: string
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: month
data_categories: [system.operations]
- name: name
data_categories: [system.operations]
- name: total_visits
data_categories: [system.operations]
- name: year
data_categories: [system.operations]

- name: service_request
fields:
- name: alt_email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
data_type: string
- name: closed
data_categories: [system.operations]
- name: email
data_categories: [system.operations]
fidesops_meta:
identity: email
data_type: string
- name: employee_id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
references:
- dataset: redshift_example_test_dataset
field: employee.id
direction: from
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: opened
data_categories: [system.operations]

- name: visit
fields:
- name: email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
data_type: string
- name: last_visit
data_categories: [system.operations]
16 changes: 16 additions & 0 deletions docs/fidesops/docs/guides/database_connectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,22 @@ PUT api/v1/connection/my_mongo_db/secret?verify=false`
}
```

#### Example 3: Amazon Redshift: Set URL and Schema

This Amazon Redshift example sets the database secrets as a `url` property and a `db_schema` property. Redshift
databases have one or more schemas, with the default being named `public`. If you need to set a different schema,
specify `db_schema` for Redshift and it will be set as the `search_path` when querying.


```
PUT api/v1/connection/my_redshift_db/secret`
{
"url": "redshift+psycopg2://username@host.amazonaws.com:5439/database",
"db_schema": "my_test_schema"
}
```

### Testing your connection

You can verify that a ConnectionConfig's secrets are valid at any time by calling the [Test a ConnectionConfig's Secrets](/fidesops/api#operations-Connections-test_connection_config_secrets_api_v1_connection__connection_key__test_get) operation:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ class RedshiftSchema(ConnectionConfigSecretsSchema):
database: Optional[str] = None
user: Optional[str] = None
password: Optional[str] = None
db_schema: Optional[str] = None

_required_components: List[str] = ["host", "user", "password"]

Expand Down
13 changes: 13 additions & 0 deletions src/fidesops/service/connectors/query_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,19 @@ def get_formatted_update_stmt(
return f'UPDATE "{self.node.address.collection}" SET {",".join(update_clauses)} WHERE {" AND ".join(pk_clauses)}'


class RedshiftQueryConfig(SQLQueryConfig):
"""Generates SQL in Redshift's custom dialect."""

def get_formatted_query_string(
self,
field_list: str,
clauses: List[str],
) -> str:
"""Returns a query string with double quotation mark formatting for tables that have the same names as
Redshift reserved words."""
return f'SELECT {field_list} FROM "{self.node.node.collection.name}" WHERE {" OR ".join(clauses)}'


MongoStatement = Tuple[Dict[str, Any], Dict[str, Any]]
"""A mongo query is expressed in the form of 2 dicts, the first of which represents
the query object(s) and the second of which represents fields to return.
Expand Down
Loading

0 comments on commit 22a838c

Please sign in to comment.