diff --git a/README.md b/README.md index 979d163c09..50da65c5d3 100644 --- a/README.md +++ b/README.md @@ -48,35 +48,6 @@ If you're looking for a more detailed introduction to Fides, we recommend follow Ping Successful! ``` -1. Now we can seed the database with the default taxonomy: - - ```bash - root@0f13dd1c5834:/fides/fidesctl# fidesctl apply default_taxonomy/ - Loading resource manifests from: default_taxonomy/ - Taxonomy successfully created. - ---------- - Processing data_subject resources... - CREATED 15 data_subject resources. - UPDATED 0 data_subject resources. - SKIPPED 0 data_subject resources. - ---------- - Processing data_qualifier resources... - CREATED 5 data_qualifier resources. - UPDATED 0 data_qualifier resources. - SKIPPED 0 data_qualifier resources. - ---------- - Processing data_use resources... - CREATED 18 data_use resources. - UPDATED 0 data_use resources. - SKIPPED 0 data_use resources. - ---------- - Processing data_category resources... - CREATED 77 data_category resources. - UPDATED 0 data_category resources. - SKIPPED 0 data_category resources. - ---------- - ``` - 1. Run `ls demo_resources/` to inspect the contents of the demo directory, which includes some pre-made examples of the core Fides resource files (systems, datasets, policies, etc.) ```bash @@ -97,7 +68,7 @@ If you're looking for a more detailed introduction to Fides, we recommend follow data_categories: - user.provided.identifiable.contact - user.derived.identifiable.device.cookie_id - data_use: improve_product_or_service + data_use: improve.system data_subjects: - customer data_qualifier: identified_data @@ -113,7 +84,7 @@ If you're looking for a more detailed introduction to Fides, we recommend follow data_categories: #- user.provided.identifiable.contact # uncomment to add this category to the system - user.derived.identifiable.device.cookie_id - data_use: marketing_advertising_or_promotion + data_use: advertising data_subjects: - customer data_qualifier: identified_data @@ -177,7 +148,7 @@ If you're looking for a more detailed introduction to Fides, we recommend follow data_uses: inclusion: ANY values: - - marketing_advertising_or_promotion + - advertising data_subjects: inclusion: ANY values: @@ -188,66 +159,61 @@ If you're looking for a more detailed introduction to Fides, we recommend follow 1. Lastly, let's modify our annotations in a way that would fail this automated privacy policy: -- Edit `demo_resources/demo_system.yml` and uncomment the line that adds `provided_contact_information` to the list of `data_categories` for the `demo_marketing_system` -- Re-run `fidesctl evaluate demo_resources` which will raise an evaluation failure! - - ```bash - root@fa175a43c077:/fides/fidesctl# vim demo_resources/demo_system.yml - - root@fa175a43c077:/fides/fidesctl# git diff demo_resources/demo_system.yml - diff --git a/fidesctl/demo_resources/demo_system.yml b/fidesctl/demo_resources/demo_system.yml - index a707df4..e84a637 100644 - --- a/fidesctl/demo_resources/demo_system.yml - +++ b/fidesctl/demo_resources/demo_system.yml - @@ -24,7 +24,7 @@ system: - privacy_declarations: - - name: Collect data for marketing - data_categories: - - #- user.provided.identifiable.contact # uncomment to add this category to the system - + - user.provided.identifiable.contact # uncomment to add this category to the system - - user.derived.identifiable.device.cookie_id - data_use: marketing_advertising_or_promotion - data_subjects: - - root@fa175a43c077:/fides/fidesctl# fidesctl evaluate demo_resources - Loading resource manifests from: demo_resources - Taxonomy successfully created. - ---------- - Processing registry resources... - CREATED 0 registry resources. - UPDATED 0 registry resources. - SKIPPED 1 registry resources. - ---------- - Processing system resources... - CREATED 0 system resources. - UPDATED 1 system resources. - SKIPPED 1 system resources. - ---------- - Processing policy resources... - CREATED 0 policy resources. - UPDATED 0 policy resources. - SKIPPED 1 policy resources. - ---------- - Processing dataset resources... - CREATED 0 dataset resources. - UPDATED 0 dataset resources. - SKIPPED 1 dataset resources. - ---------- - Loading resource manifests from: demo_resources - Taxonomy successfully created. - Evaluating the following policies: - demo_privacy_policy - ---------- - Checking for missing resources... - Executing evaluations... - { - "status": "FAIL", - "details": [ - "Declaration (Collect data for marketing) of System (demo_marketing_system) failed Rule (Reject Direct Marketing) from Policy (demo_privacy_policy)" - ], - "message": null - } - ``` + - Edit `demo_resources/demo_system.yml` and uncomment the line that adds `provided_contact_information` to the list of `data_categories` for the `demo_marketing_system` + + ```diff + privacy_declarations: + - name: Collect data for marketing + data_categories: + - #- user.provided.identifiable.contact # uncomment to add this category to the system + + - user.provided.identifiable.contact # uncomment to add this category to the system + - user.derived.identifiable.device.cookie_id + data_use: marketing_advertising_or_promotion + data_subjects: + ``` + + - Re-run `fidesctl evaluate demo_resources` which will raise an evaluation failure! + + ```bash + root@fa175a43c077:/fides/fidesctl# fidesctl evaluate demo_resources + Loading resource manifests from: demo_resources + Taxonomy successfully created. + ---------- + Processing registry resources... + CREATED 0 registry resources. + UPDATED 0 registry resources. + SKIPPED 1 registry resources. + ---------- + Processing system resources... + CREATED 0 system resources. + UPDATED 1 system resources. + SKIPPED 1 system resources. + ---------- + Processing policy resources... + CREATED 0 policy resources. + UPDATED 0 policy resources. + SKIPPED 1 policy resources. + ---------- + Processing dataset resources... + CREATED 0 dataset resources. + UPDATED 0 dataset resources. + SKIPPED 1 dataset resources. + ---------- + Loading resource manifests from: demo_resources + Taxonomy successfully created. + Evaluating the following policies: + demo_privacy_policy + ---------- + Checking for missing resources... + Executing evaluations... + { + "status": "FAIL", + "details": [ + "Declaration (Collect data for marketing) of System (demo_marketing_system) failed Rule (Reject Direct Marketing) from Policy (demo_privacy_policy)" + ], + "message": null + } + ``` At this point, you've seen some of the core concepts in place: declaring systems, evaluating policies, and re-evaluating policies on every code change. But there's a lot more to discover, so we'd recommend following [the tutorial](https://ethyca.github.io/fides/tutorial/) to keep learning. diff --git a/docs/fides/docs/ci_reference.md b/docs/fides/docs/ci_reference.md new file mode 100644 index 0000000000..c3337348f8 --- /dev/null +++ b/docs/fides/docs/ci_reference.md @@ -0,0 +1,3 @@ +# CI Reference Implementations + +(todo) \ No newline at end of file diff --git a/docs/fides/docs/css/stylesheet.css b/docs/fides/docs/css/stylesheet.css new file mode 100644 index 0000000000..9db96c35d6 --- /dev/null +++ b/docs/fides/docs/css/stylesheet.css @@ -0,0 +1,24 @@ +body { + font-family: 'Source Sans Pro', sans-serif !important; +} + +.md-header { + background-color: #0861ce; +} + +.md-header .md-logo img { + height: 1.2rem; + width: auto; +} + +.md-content h1, +.md-content h2 { + font-weight: 600; + color: #111439; +} + +.md-footer { + background-color: #303036; + padding: 30px 50px; + color: white; +} \ No newline at end of file diff --git a/docs/fides/docs/getting_started/docker.md b/docs/fides/docs/getting_started/docker.md index 5ff1a38181..2efcb02439 100644 --- a/docs/fides/docs/getting_started/docker.md +++ b/docs/fides/docs/getting_started/docker.md @@ -14,10 +14,10 @@ The recommended way to get Fidesctl running is to launch it using the supplied ` The following commands should all be run from the top-level `fides` directory (where the Makefile is): -1. `make init-db` -> Builds the required images, spins up the database, and runs the initialization scripts: +1. `make cli` -> This will spin up the entire project and open a shell within the `fidesctl` container, with the `fidesapi` being accessible. This command will "hang" for a bit, as `fidesctl` will wait for the API to be healthy before launching the shell. Once you see the `fidesctl#` prompt, you know you're ready to go: ```bash - ~/git/fides% make init-db + ~/git/fides% make cli Build the images required in the docker-compose file... ... Building fidesapi @@ -26,40 +26,27 @@ The following commands should all be run from the top-level `fides` directory (w ... Building docs ... - Reset the db and run the migrations... - ... - Tearing down the dev environment... - ... - Teardown complete + root@1a742083cedf:/fides/fidesctl# ``` -2. `make cli` -> This will spin up the entire project and open a shell within the `fidesctl` container, with the `fidesapi` being accessible. This command will "hang" for a bit, as `fidesctl` will wait for the API to be healthy before launching the shell. Once you see the `fidesctl#` prompt, you know you're ready to go: +1. `fidesctl init-db` -> Builds the required images, spins up the database, and runs the initialization scripts: ```bash - ~/git/fides% make cli - Build the images required in the docker-compose file... - ... - Building fidesapi - ... - Building fidesctl - ... - Building docs - ... - Check for new migrations to run... - ... - root@1a742083cedf:/fides/fidesctl# + ~/git/fides% fidesctl init-db + INFO [alembic.runtime.migration] Context impl PostgresqlImpl. + INFO [alembic.runtime.migration] Will assume transactional DDL. ``` -3. `fidesctl ping` -> This confirms that your `fidesctl` CLI can reach the server and everything is ready to go! +1. `fidesctl ping` -> This confirms that your `fidesctl` CLI can reach the server and everything is ready to go! ```bash root@796cfde906f1:/fides/fidesctl# fidesctl ping Pinging http://fidesapi:8080... - Ping Successful! + Fidesctl is healthy! ``` ## Next Steps Now that you're up and running, you can use `fidesctl` from the shell to get a list of all the possible CLI commands. You're now ready to start enforcing privacy with Fidesctl! -See the [Tutorial](../tutorial.md) page for a step-by-step guide on setting up a Fidesctl data privacy workflow. +See the [Tutorial](../tutorial/tutorial.md) page for a step-by-step guide on setting up a Fidesctl data privacy workflow. diff --git a/docs/fides/docs/getting_started/local.md b/docs/fides/docs/getting_started/local.md index 8f500a210a..f6498dd5cb 100644 --- a/docs/fides/docs/getting_started/local.md +++ b/docs/fides/docs/getting_started/local.md @@ -8,4 +8,4 @@ The guide for getting Fidesctl up and running locally is the same as is describe ## Next Steps -See the [Tutorial](../tutorial.md) page for a step-by-step guide on setting up a Fides data privacy workflow. +See the [Tutorial](../tutorial/tutorial.md) page for a step-by-step guide on setting up a Fides data privacy workflow. diff --git a/docs/fides/docs/img/BestPizzaCo_DataEcosystem.png b/docs/fides/docs/img/BestPizzaCo_DataEcosystem.png new file mode 100644 index 0000000000..00ce14b49d Binary files /dev/null and b/docs/fides/docs/img/BestPizzaCo_DataEcosystem.png differ diff --git a/docs/fides/docs/img/BestPizzaCo_FidesModel.png b/docs/fides/docs/img/BestPizzaCo_FidesModel.png new file mode 100644 index 0000000000..64eba67154 Binary files /dev/null and b/docs/fides/docs/img/BestPizzaCo_FidesModel.png differ diff --git a/docs/fides/docs/index.md b/docs/fides/docs/index.md index be0b865b50..b591b108d3 100644 --- a/docs/fides/docs/index.md +++ b/docs/fides/docs/index.md @@ -83,4 +83,4 @@ Fides defines data privacy in four dimensions, called Data Privacy Classifiers. For further context on how to setup and configure Fides, visit the `Getting Started` page ([Getting Started with Docker](getting_started/docker.md) or [Getting Started Locally](getting_started/local.md)). -For an in-depth tutorial, visit the [Tutorial](tutorial.md) page. +For an in-depth tutorial, visit the [Tutorial](tutorial/tutorial.md) page. diff --git a/docs/fides/docs/taxonomy.md b/docs/fides/docs/taxonomy.md new file mode 100644 index 0000000000..69c282d2c3 --- /dev/null +++ b/docs/fides/docs/taxonomy.md @@ -0,0 +1,3 @@ +(todo) + +Add page here describing what the taxonomy is, why we created it, what it's used for, and how to use it (broadly). Points back to Tutorial/Taxonomy for detailed instructions of used diff --git a/docs/fides/docs/tutorial.md b/docs/fides/docs/tutorial.md deleted file mode 100644 index afcab525cf..0000000000 --- a/docs/fides/docs/tutorial.md +++ /dev/null @@ -1,176 +0,0 @@ -# Tutorial - -This tutorial walks you through the process of getting up and running with Fidesctl. - -## Getting Started - -Use either the [Docker](getting_started/docker.md) (**recommended**) or [Local](getting_started/local.md) guide to get Fidesctl up and running on your machine. - -## Writing Manifest Files - -The next step is to write the manifest files that describe your privacy data usage with the Fides privacy ontology. Manifest files are written in YAML and are used to create and update resources via the Fidesctl API. - -First create a directory for the manifests to live in: - -`mkdir fides_resources/` - -Next, you'll need to write a System manifest file and a Policy manifest file. These are the only two required resources for Fidesctl to function. For an exhaustive set of example manifests see the [Fides Resources](fides_resources.md) page. Included below are the examples we'll assume are being used for the sake of the tutorial. - -=== fides_resources/policy.yml - - ```yaml - policy: - - organization_fides_key: 1 - fides_key: demo_privacy_policy - name: Demo Privacy Policy - description: The main privacy policy for the organization. - rules: - - organization_fides_key: 1 - fides_key: reject_direct_marketing - name: Reject Direct Marketing - description: Disallow collecting any user contact info to use for marketing. - data_categories: - inclusion: ANY - values: - - user.provided.identifiable.contact - data_uses: - inclusion: ANY - values: - - marketing_advertising_or_promotion - data_subjects: - inclusion: ANY - values: - - customer - data_qualifier: identified_data - action: REJECT - ``` - -### fides_resources/dataset.yml - - ```yaml - dataset: - - organization_fides_key: 1 - fides_key: demo_users_dataset - name: Demo Users Dataset - description: Data collected about users for our analytics system. - dataset_type: MySQL - location: US East - fields: - - name: first_name - description: User's first name - path: demo_users_dataset.first_name - data_categories: - - user.provided.identifiable.name - - name: email - description: User's Email - path: demo_users_dataset.email - data_categories: - - user.provided.identifiable.contact.email - - name: state - description: User's State - path: demo_users_dataset.state - data_categories: - - user.provided.identifiable.contact.state - - name: food_preference - description: User's favorite food - path: demo_users_dataset.food_preference - data_categories: - - user.provided.nonidentifiable - - name: created_at - description: User's creation timestamp - path: demo_users_dataset.created_at - data_categories: - - system.operations - - name: uuid - description: User's unique ID - path: demo_users_dataset.uuid - data_categories: - - user.derived.identifiable.unique_id - ``` - -=== fides_resources/system.yml - - ```yaml - system: - - organization_fides_key: 1 - fides_key: demo_analytics_system - name: Demo Analytics System - description: A system used for analyzing customer behaviour. - system_type: Service - privacy_declarations: - - name: Analyze customer behaviour for improvements. - data_categories: - - user.provided.identifiable.contact - - user.derived.identifiable.device.cookie_id - data_use: improve_product_or_service - data_subjects: - - customer - data_qualifier: identified_data - dataset_references: - - demo_users_dataset - - - organization_fides_key: 1 - fides_key: demo_marketing_system - name: Demo Marketing System - description: Collect data about our users for marketing. - system_type: Service - privacy_declarations: - - name: Collect data for marketing - data_categories: - # - user.provided.identifiable.contact # uncomment to add this category to the system - - user.derived.identifiable.device.cookie_id - data_use: marketing_advertising_or_promotion - data_subjects: - - customer - data_qualifier: identified_data - ``` - -## Applying Manifest Files - -Once you've finished writing your manifest files, it's time to apply them to the server. This is done with a single `fidesctl` command that handles both creating _and_ updating resources. If a resource with the same type and fides_key already exists, that resource will be updated if a change has been made. - -If we assume the same directory name as before for where our manifests are located, the command would be: - -`fidesctl apply fides_resources/` - -This will load all files ending in either `.yaml` or `.yml` within that directory. Any invalid resource definitions within the manifests will be caught and shown to the user. - -## Evaluation - -Systems and Registries have a slightly different workflow as they are also designed to be incorporated into CI pipelines. - -Now that you've created your initial manifest files via the `apply` command, it's time to evaluate if that initial system is compliant. Use the following command to evaluate your system: - -`fidesctl evaluate system demo_analytics_system` - -If that command returns a PASS evaluation, then you're now in a known-good state and ready to set up automated CI workflows to make sure your application stays compliant with each PR. - -## Setting up CI/CD - -To set up CI/CD for Fides evaluations, there are a few suggested steps to follow: - -### Pull Request - - 1. Set up a new CI workflow that gets triggered whenever a system or registry file gets changed within a pull request. - 1. Configure the new workflow to run `fidesctl dry-evaluate fides_manifests/ ` when it gets triggered. - 1. The command will trigger a non-zero exit if the evaluation fails. - - Use the result of this job to determine whether or not a system change is safe to merge or not. If the command fails, check the error messages to see why the evaluation failed. - -### Merge Event - - 1. Set up a new CI workflow that gets triggered whenever something in your manifests directory changes and the branch gets merged to the main branch. - 1. Configure the new workflow to run two few jobs: - 1. `fidesctl apply fides_manifest/` - 1. `fidesctl evaluate system ` - - This will apply all of your manifests to the API and then evaluate the current state of your system on the main branch. - -## Next Steps - -Congratulations, you've walked through all of the steps to get a simple but complete Fidesctl instance running! Here are some possible next steps to continue building out your Fidesctl deployment: - -1. Set up more Systems -1. Create a Registry to assign systems to -1. Extend the Privacy Classifiers as needed -1. Add additional Policy Rules diff --git a/docs/fides/docs/tutorial/ci.md b/docs/fides/docs/tutorial/ci.md new file mode 100644 index 0000000000..4ea614169b --- /dev/null +++ b/docs/fides/docs/tutorial/ci.md @@ -0,0 +1,25 @@ +# Getting integrated with your CI tools +_In this section, we'll reference a few examples and best practices for setting up your CI._ + +Fides is meant to be a part of your CI pipeline jobs in order to enforce your organization's privacy policy on data before software is released to the world. We recommend setting up 2 different events to trigger during a CI pipeline run. + +## Pull Request + + 1. Set up a new CI workflow that gets triggered whenever a system or registry file gets changed within a pull request. + 2. Configure the new workflow to run `fidesctl evaluate --dry fides_manifests/` when it gets triggered. + 3. The command will trigger a non-zero exit if the evaluation fails. + + Use the result of this job to determine whether or not a system change is safe to merge or not. If the command fails, check the error messages to see why the evaluation failed. + +## Merge Event + + 1. Set up a new CI workflow that gets triggered whenever something in your manifests directory changes and the branch gets merged to the main branch. + 1. Configure the new workflow to run two few jobs: + 1. `fidesctl apply fides_manifest/` + 1. `fidesctl evaluate system ` + + This will apply all of your manifests to the API and then evaluate the current state of your system on the main branch. + +## Additional Resources + +We have compiled a few reference implementations for some popular CI tools, which you can find here. diff --git a/docs/fides/docs/tutorial/dataset.md b/docs/fides/docs/tutorial/dataset.md new file mode 100644 index 0000000000..630fe0841e --- /dev/null +++ b/docs/fides/docs/tutorial/dataset.md @@ -0,0 +1,100 @@ +# Getting started with datasets +_In this section, we'll review what a dataset resource is, why it's needed, and how it's created and managed._ + +Fundamentally, your data ecosystem is built on data that is stored _somewhere_. In Fides, Datasets are used for granular, field-level annotations of exactly what data your systems are storing and where. For example, you might declare one dataset for your Postgres application database, a second dataset for your Mongo orders collection, and a third dataset for some CSV files in your storage buckets. The Dataset resource provides a database-agnostic way to annotate the fields stored in these systems with Data Categories, providing a metadata layer for other tooling to consume. + +For Best Pizza Co, you can see that their 3 Datasets, `postgres appdb`, `firestore auth`, and `redshift analyticsdb` are aligned with data storage services in their data ecosystem: +![alt text](img/BestPizzaCo_DataEcosystem.png "Best Pizza Co's Data Ecosystem") + +At Best Pizza Co, we'll have to create a `dataset` record for each of the 3 datasets above, starting with the first database, the Postgres Application DB. + +## Generating a Dataset Resource +First, let's retrieve the database schema of the dataset we want to annotate. Using the `generate-dataset` command, Fides will connect directly to your database only to read its schema: +```bash +root@0419219d14e1:/fides/fidesctl# fidesctl generate-dataset postgresql://USERNAME:PASSWORD@best-pizza-co.cwiy9dtqovxa.us-east-1.rds.amazonaws.com:5432/postgres dataset1.yml +Generated dataset manifest written to dataset1.yml +``` +Fides has stored the structure of that database as a yaml file in the location you stored in output_filename. This file will serve as the first building block in creating Privacy as Code at the lowest level. + +```yaml +dataset: +- fides_key: appdb + organization_fides_key: default_organization + name: Postgres App Database + description: 'Fides Generated Description for Dataset: Postgres App Database' + meta: null + data_categories: null + data_qualifiers: + - aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified + collections: + - name: users + description: 'Fides Generated Description for Table: users' + fields: + - name: first_name + description: 'Fides Generated Description for Column: first_name' + data_categories: [] + data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified + - name: zip_code + description: 'Fides Generated Description for Column: zip_code' + data_categories: [] + data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified +``` + +## Understanding the Dataset Resource +This YAML serves as the foundation of the Fides language; it answers the questions of "_What data and kinds of data do we have?_" and "_How is it organized?_". The language is built on the declaring what types of data are found in storage for your organization. + +In traditional SQL, Fides defines the following: +* "datasets" as database schemas +* "collections" as database tables +* "fields" as database columns + +For NoSQL datasets, Fides defines the following: +* "dataset" +* "collection" as a logical grouping of data fields (ie: in MongoDB, this is called a "Collection") +* "fields" as a reference to an individual data element (ie: in MongoDB, this is called a "field") + +Further, fideslang has attributes that describe what kind of data is contained in this dataset. We use the following attributes to describe the data: + +| Name | Type | Description | +| --- | --- | --- | +| name | String | The name of this field | +| description | String | A description of what this field contains | +| data_categories | List[FidesKey] | The data categories, or types of sensitive data as defined in the taxonomy, that can be found in this field | +| data_qualifier | FidesKey | Data qualifier describes the level of deidentification for the dataset | + + +## Create Dataset Annotations +As you can see, `fidesctl generate-dataset` has already pre-filled the required attributes for this exported YAML file. We can update the YAML file with some information that might be appropriate for your organization, such as: + +```yaml +dataset: +- fides_key: appdb + organization_fides_key: default_organization + name: Postgres App Database + description: 'This is our primary web application database' + collections: + - name: users + description: 'Table that contains all user account data as entered by the user' + fields: + - name: first_name + description: 'Fides Generated Description for Column: first_name' + data_categories: + - user.provided.identifiable.name + data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified + - name: zip_code + description: 'Fides Generated Description for Column: zip_code' + data_categories: + - user.provided.identifiable.contact.postal_code + data_qualifier: aggregated.anonymized.unlinked_pseudonymized +``` + +--- +**PRO TIP** + +As you're progressing with the tutorial, we recommend installing our [Fides' VS Code plugin](https://marketplace.visualstudio.com/items?itemName=fidesctl-plugin-publisher.fidesctl-config-parser), which will validate the syntax in real-time as you're writing your resource files! + +--- + + +## Maintaining a Dataset Resource +As your business grows, you will add more databases and other services where you will be storing potentially sensitive data. We recommend that updating this resource file become a part of the development process when building a new feature. diff --git a/docs/fides/docs/tutorial/evaluate.md b/docs/fides/docs/tutorial/evaluate.md new file mode 100644 index 0000000000..dec92ae570 --- /dev/null +++ b/docs/fides/docs/tutorial/evaluate.md @@ -0,0 +1,63 @@ +# Evaluate your policy +_In this section, we'll review how to evaluate your policy and address any errors._ + +Now that we're done with all the setup, it's time to put your policy to the test! + + +# Run an Evaluation + +Running an evaluation locally is as easy as running a single line command from the CLI. + +Since we're running Fides locally, you can use the following command to evaluate your system: + +```bash +fidesctl evaluate +``` + +Or, alternatively, if you'd like to see the Evaulation result from just a single resource type, you might try: + +```bash +fidesctl evaluate system +``` + +If that command returns a *PASS* evaluation, then you're now in a known-good state and ready to set up automated CI workflows to make sure your application stays compliant with each PR. However, if that command returns a *FAILED* evaluation, you should have received feedback as to why it failed. + +```bash +root@fa175a43c077:/fides/fidesctl# fidesctl evaluate demo_resources +Loading resource manifests from: demo_resources +Taxonomy successfully created. +---------- +Processing registry resources... +CREATED 1 registry resources. +UPDATED 0 registry resources. +SKIPPED 0 registry resources. +---------- +Processing dataset resources... +CREATED 1 dataset resources. +UPDATED 0 dataset resources. +SKIPPED 0 dataset resources. +---------- +Processing policy resources... +CREATED 1 policy resources. +UPDATED 0 policy resources. +SKIPPED 0 policy resources. +---------- +Processing system resources... +CREATED 2 system resources. +UPDATED 0 system resources. +SKIPPED 0 system resources. +---------- +Loading resource manifests from: demo_resources +Taxonomy successfully created. +Evaluating the following policies: +demo_privacy_policy +---------- +Checking for missing resources... +Executing evaluations... +Sending the evaluation results to the server... +Evaluation passed! +``` + +Congratulations, you've now created your annotated datasets, system-level business cases, and your policy for enforcement — You've laid the groundwork for a comprehensive data privacy software program at your organization! This is a great starting point for training your peers and colleagues on so they can evaluate their new code locally prior to committing any code to the repo. + +The final step will be to [integrate this with your CI](taxonomy.md) so that you can fully realize Fides' potential. Allowing Fides evaluate calls to be triggered from your pipeline will be critical for automatically assessing compliance at build time going forward. diff --git a/docs/fides/docs/tutorial/overview.md b/docs/fides/docs/tutorial/overview.md new file mode 100644 index 0000000000..3cb37f7a9c --- /dev/null +++ b/docs/fides/docs/tutorial/overview.md @@ -0,0 +1,16 @@ +# Tutorial Overview + +In this tutorial, we'll imagine that your business sells pizza online. "Best Pizza Co" has an ecommerce web application that you sell pizza through and an analytics tool that you use to maintain a constant inventory of pizzas to send and understand their buyers market. Your data ecosystem might look like this: + +![alt text](img/BestPizzaCo_DataEcosystem.png "Best Pizza Co's Data Ecosystem") + +When looking to expand to other international markets, you've decided to be intentional in how you scale your technology. Using Fides, we will show you how to: + +1. Declare what categories of PII you have in your 3 databases using Dataset annotations +2. Create business-function related groupings for your applications using System privacy declarations +3. Build a set of rules dictating what Best Pizza Co deems to be allowed use of PII data in your Policy + +When you're done creating these Datasets, Systems, and Policy, your Fides data model might look something like this: +![alt text](img/BestPizzaCo_FidesModel.png "Best Pizza Co's modeled in Fides") + +Using these basic principles, you'll begin building a practice of data privacy awareness amongst the software teams at your company. Let's get started with the fundamentals, building from the data layer with [Datasets](dataset.md). \ No newline at end of file diff --git a/docs/fides/docs/tutorial/policy.md b/docs/fides/docs/tutorial/policy.md new file mode 100644 index 0000000000..18b936f5cb --- /dev/null +++ b/docs/fides/docs/tutorial/policy.md @@ -0,0 +1,82 @@ +# Writing a privacy policy (as code) + +In this section, we'll review what the Policy resource is, how to create it and what it's used for. + +More than likely, there is someone in your organization that is responsible for creating privacy policies for protecting the company from a legal standpoint. *The purpose of this privacy policy is to state what types of data are allowed for certain purposes of use.* + +## Understanding the policy +This policy is comprised of rules that your system's privacy declarations are evaluated against. You might be able to help your legal counsel make this, or you can handle the creation of this if you understand the legal requirements well enough. + +Fides' privacy declarations provide rich metadata about your systems, the data categories they process, the data uses for that data, etc. Policies allow you to declare constraints on these declarations to decide what combinations to allow or reject at your company, providing a layer of automation to control data privacy at the source. + +For example, you might want policies like: +- "we only allow systems that use anonymized data for product analytics purposes" +- "we do not allow systems to combine user-derived demographic and location data for marketing use" +- "we never collect biometric data" + +These are examples of "policies" that might be formal rules you follow today, or maybe they are already part of your code review or privacy review practices. Fides allows us to turn these into automated policy rules that are evaluated against your privacy declarations. + + +| Name | Type | Description | +| --- | --- | --- | +| fides_key | FidesKey | A fides key is an identifier label that must be unique within your organization. A fides_key can only contain alphanumeric characters and '_' || +| data_categories | List[DataRule] | The data categories, or types of sensitive data as defined in the taxonomy | +| data_uses | List[DataRule] | Data use describes the various categories of data processing and operations at your organization | +| data_subjects | List[DataRule] | The data subjects, or individual persons whose data your rule pertains to | +| data_qualifier | String | The data qualifier describes the acceptable or non-acceptable level of deidentification | +| action | Choice | A string, either `ACCEPT` or `REJECT` | + +## Writing your first policy + +To put these rules to the test, for example, if you know that you cannot use identifiable contact information for directly marketing to your customers, but you can use anonymized data for analytics purposes, you might write something like this: + +```yaml +policy: + - fides_key: main_privacy_policy + name: Main Privacy Policy + description: The main privacy policy for the organization. + rules: + - fides_key: reject_direct_marketing + name: Reject Direct Marketing + description: Do not allow collection or storage of any identifiable contact info to use for marketing. + data_categories: + inclusion: ANY + values: + - user.provided.identifiable.contact + data_uses: + inclusion: ANY + values: + - marketing_advertising_or_promotion + data_subjects: + inclusion: ANY + values: + - customer + data_qualifier: identified_data + action: REJECT + - fides_key: allow_anon_analytics + name: Use only anonymized data for analytics + description: Allow only anonymized data to be used for analytics purposes. + data_categories: + inclusion: ANY + values: + - user.provided.nonidentifiable + data_uses: + inclusion: ANY + values: + - improve_product_or_service + data_subjects: + inclusion: ANY + values: + - customer + data_qualifier: aggregated.anonymized + action: ALLOW +``` + +This policy will evaluate the data subjects, data category, and data qualifier values against data use cases, which generates a boolean output to either allow or reject the process from proceeding. + + +## Maintaining a Policy +As global privacy laws change and your business scales, your company's policy will evolve with them. We recommend that updating this resource file become a regular part of the development planning process when building a new feature. + + +In the next section, we'll put all the pieces together to see the policy execute against all your resources. diff --git a/docs/fides/docs/tutorial/system.md b/docs/fides/docs/tutorial/system.md new file mode 100644 index 0000000000..3e1a7cbc21 --- /dev/null +++ b/docs/fides/docs/tutorial/system.md @@ -0,0 +1,68 @@ +# Creating Systems +_In this section, we'll review what a system resource is, why it's needed, and how it's created and managed._ + +Now that we've built out the underlying databases that describe how the data is stored and what type of data is there, we're going to start grouping these into application-level "systems", another critical Fides resource. + +For Best Pizza Co, you can see that they have 2 business-unit specific applications, `Web Application` and `Analytics`: +![alt text](img/BestPizzaCo_DataEcosystem.png "Best Pizza Co's Data Ecosystem") + +At Best Pizza Co, we'll have to create a `system` resource for each of the 2 systems above. + +## Understanding Systems +In Fides, Systems are used to model the applications, services, 3rd party APIs, etc that process data for your organization. Systems describe how these datasets are used for business functions around your organization. These dataset groupings are not mutually exclusive and answer the questions of "_How and why are these datasets being used?_" At Best Pizza Co, you might also have a "Marketing" system and a "Financial data database" (separate from the other dbs!), + +Systems use the following attributes: + +| Name | Type | Description | +| --- | --- | --- | +| data_categories | List[FidesKey] | The data categories, or types of sensitive data as defined in the taxonomy | +| data_subjects | List[FidesKey] | The data subjects, or individual persons whose data resides in your datasets | +| data_use | List[FidesKey] | Data use describes the various categories of data processing and operations at your organization | +| data_qualifier | List[FidesKey] | Data qualifier describes the level of deidentification for the dataset | +| dataset_refereneces | List[FidesKey] | The fides_key(s) of the dataset fields used in this Privacy Declaration. | + +As you can see, the System resource groups the lowest level of data (your datasets) with your business use cases and associates qualitative attributes describing what type of data is being used. + +## Creating a System Resource +Let's take a look at the following system annotations for a data analytics and marketing system: + +```yaml + system: + - fides_key: web_app + name: Pizza Ordering Web Application + description: A system used to order pizza from Best Pizza Co + system_type: Service + privacy_declarations: + - name: Provide services and order tracking for customers. + data_categories: + - user.provided.identifiable.contact + data_use: provide_product_or_service + data_subjects: + - customer + data_qualifier: identified_data + dataset_references: + - appdb + + - fides_key: analytics + name: Analytics system + description: Provide BI and insights on customer, order and inventory data + system_type: Service + privacy_declarations: + - name: Collect data for business intelligence + data_categories: + - user.provided.identifiable.contact + - user.derived.identifiable.device + data_use: improve_product_or_service + data_subjects: + - customer + data_qualifier: identified_data +``` + +As you can see, the system is comprised of Privacy Declarations. These can be read colloquially as "This system uses sensitive data types of `data_categories` for `data_subjects` with the purpose of `data_use` at a deidentification level of `data_qualifier`". + +You can create as many systems you'd like to cover all of your company's business applications. + +## Maintaining a System Resource +As business use cases evolve, your systems' data subjects, data categories and data uses will change with them. We recommend that updating this resource file become a regular part of the development planning process when building a new feature. + +As you add more systems to your ever-changing data ecosystem, you might want to consider grouping your systems into another Fides resource type, called a "Registry". This is just a logical grouping of Systems. \ No newline at end of file diff --git a/docs/fides/docs/tutorial/taxonomy.md b/docs/fides/docs/tutorial/taxonomy.md new file mode 100644 index 0000000000..722ab66715 --- /dev/null +++ b/docs/fides/docs/tutorial/taxonomy.md @@ -0,0 +1,44 @@ +# Getting Acquainted with the Fides Taxonomy +_In this section, we'll review what the Fides taxonomy is, how it was created, when and how it should be used._ + +(todo link to taxonomy reference) + +The Fides taxonomy for data categories is a standard adapted from [ISO 19944](https://www.iso.org/standard/79573.html). This taxonomy provides descriptions of the types of sensitive, personally identifiable, or non-identifiable data that an organization could hold for any data subject. The hierarchical nature of the Fides taxonomy has a few notable benefits: + +* Consistency: the taxonomy is used as a shared resource across your Fidesctl deployment(s). Because the taxonomy is derived from an international standard, it enables interoperability inside and outside of your organization. +* Natural inheritance: the hierarchy allows ease of reference to multiple subcategories or uncertain categorizations, simply by using a more superior data category. +* Extensibility: if the taxonomy is missing any data categories specific to your business, you can extend the taxonomy with whatever new values you need. + +The Fides Taxonomy is used across the Fides ecosystem of projects, fidesctl and fidesops. + +## Why did we create the Fides Taxonomy? +The Fides taxonomy was created because the industry is distinctly lacking a common definition of what Personal Data is, what identifiable data is, and how anonymized data has to be to be unidentifiable. The taxonomy provides this common classification and is a key component of implementing Privacy as Code. + +## How to use the Fides Taxonomy +The Fides project comes with 4 taxonomies of privacy attributes by default: + +* Data Categories +* Data Subjects +* Data Uses +* Data Qualifiers + +Fidesctl comes loaded with these taxonomies by default and they can be found here `fidesctl/src/fideslang/default_taxonomy.py`. To extend this taxonomy for your business uses, you might want to add additional data categories to cover all the types of PII your business collects, or additional legal uses for the data. At Best Pizza Co, since we're expanding to new countries, we need to support Province, for example, as part of the user's provided address for delivery. We could accomplish this by adding the additional data category directly to `default_taxonomy.py`: + + ```diff + + DataCategory( + + fides_key="user.provided.identifiable.contact.province", + + organization_fides_key="default_organization", + + name="User Provided Province", + + description="User's province.", + + parent_key="user.provided.identifiable.contact", + + ), + DataCategory( + fides_key="user.provided.identifiable.contact.state", + organization_fides_key="default_organization", + name="User Provided State", + description="User's state level address data.", + parent_key="user.provided.identifiable.contact", + ), + ``` + +This will add the `user.provided.identifiable.contact.province` as a data category type as a subcategory of `user.provided.identifiable.contact` for your organization. You can add and remove any privacy attributes as you see fit for your organization. For a more in-depth definition of these privacy attributes, please refer to [the Fides Resources documentation](../fides_resources.md). diff --git a/docs/fides/mkdocs.yml b/docs/fides/mkdocs.yml index 4440c440d3..b7f81e9ca2 100644 --- a/docs/fides/mkdocs.yml +++ b/docs/fides/mkdocs.yml @@ -7,7 +7,14 @@ nav: - Getting Started: - getting_started/docker.md - getting_started/local.md - - Tutorial: tutorial.md + - Tutorial: + - Tutorial Overview: tutorial/overview.md + - Understanding the Taxonomy: tutorial/taxonomy.md + - Annotate your Datasets: tutorial/dataset.md + - Create a System: tutorial/system.md + - Create a Policy: tutorial/policy.md + - Evaluate your Resources: tutorial/evaluate.md + - Integrate your CI: tutorial/ci.md - Deployment: deployment.md - CLI: cli.md - Development: @@ -21,6 +28,8 @@ nav: - GitHub: community/github.md - Code of Conduct: community/code_of_conduct.md theme: + palette: + primary: blue name: material favicon: img/favicon.png logo: img/logo.png @@ -40,4 +49,4 @@ markdown_extensions: extra_javascript: - https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.7.2/highlight.min.js extra_css: - - https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.7.2/styles/default.min.css + - css/stylesheet.css