Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

163 yaml datasets #284

Merged
merged 15 commits into from
May 18, 2022
Merged

163 yaml datasets #284

merged 15 commits into from
May 18, 2022

Conversation

nschoenbrot
Copy link
Contributor

@nschoenbrot nschoenbrot commented Mar 14, 2022

Purpose

Allow datasets to be patched using YAML in place of JSON.

Changes

-Creates a YAML path for patching datasets. Since FastAPI does not have built in support for YAML in the way it does for JSON, the YAML endpoint will directly access the request object and deserialize the body. But the JSON and YAML paths will still hit the same underlying code. (I wanted to try and route based on content type so that the same path could be shared, from what I found content type based routing has not be built into FastAPI yet)
-Was able to get a successful response from the YAML Datasets endpoint locally.
-TODO Add more tests to cover edge cases such as bad content type, bad YAML file, multiple items listed under the YAML dataset.
-TODO Fix type checking issues found by mypy. There are typing issues around the deserialized request body.
-TODO Update postman collection.
-TODO Come up with a strategy on how to generate custom OpenAPI spec for YAML endpoints.

Checklist

  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #163

@nschoenbrot nschoenbrot changed the title Draft 163 yaml datasets (Do not merge) Draft: 163 yaml datasets (Do not merge) Mar 14, 2022
@nschoenbrot nschoenbrot marked this pull request as draft March 14, 2022 16:00
@nschoenbrot
Copy link
Contributor Author

@seanpreston It looks like I will be working on this ticket again during this sprint.

For this ticket how do we want to handle documentation for the YAML endpoints? We could just leave it out of the Open API spec, or start manually writing the Open API spec? See: https://fastapi.tiangolo.com/advanced/extending-openapi/

including @conceptualshark since there is a documentation question here.

@seanpreston
Copy link
Contributor

We could just leave it out of the Open API spec

Great suggestion. Let's ship this as a v0, then follow-up with manually adding the OpenAPI spec as you've suggested as part of a subsequent ticket.

@nschoenbrot nschoenbrot force-pushed the 163_Yaml_Datasets branch from d423b5a to cc64433 Compare May 5, 2022 19:48
@nschoenbrot nschoenbrot force-pushed the 163_Yaml_Datasets branch from cc64433 to 3adbb2a Compare May 5, 2022 20:08
"script": {
"exec": [
"pm.collectionVariables.set(\"client_token\", pm.response.json()[\"access_token\"])"
],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script will take the access token from the response and update the collection variable. That way we don't need to manually copy and paste.

return BulkPutDataset(
succeeded=created_or_updated,
failed=failed,
)


def create_or_update_dataset(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was extracted from the Patch dataset function. So that both dataset PATCHing endpoints (JSON and YAML) can call the same underlying code.

@nschoenbrot nschoenbrot marked this pull request as ready for review May 11, 2022 15:16
@nschoenbrot nschoenbrot changed the title Draft: 163 yaml datasets (Do not merge) 163 yaml datasets May 11, 2022
@Kelsey-Ethyca Kelsey-Ethyca requested a review from galvana May 11, 2022 15:37
response_model=BulkPutDataset,
include_in_schema=False # Not including this path in the schema.
# Since this yaml function needs to access the request, the open api spec will not be generated correctly.
# To include this path, extend open api: https://fastapi.tiangolo.com/advanced/extending-openapi/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now but we should find a way to expose this information in our Swagger API UI if we add additional YAML endpoints. Can we document this endpoint somewhere if we haven't already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought this up with @seanpreston & @conceptualshark. We can have a subsequent ticket.

I would think our solution would depend on how many resources and actions we will support in the yaml API. For me there are open questions:

  • How much will the JSON and YAML APIs diverge?
  • Is it going to be a large API? If so we might want to find a tool to autogenerate open api spec.
  • Do we want a separate spec for the yaml api?

Copy link
Collaborator

@galvana galvana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change to help with future troubleshooting, looks good to me!

@@ -68,6 +69,9 @@
DATASETS = CONNECTION_BY_KEY + "/dataset"
DATASET_BY_KEY = CONNECTION_BY_KEY + "/dataset/{fides_key}"

# YAML Collection URLs
YAML_DATASETS = YAML + DATASETS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This will result in the URL /api/v1/yml/connection/{key}/dataset — could we append YAML here instead of prepend for the result /api/v1/connection/{key}/dataset/yml/, to reflect the hierarchy of what's being executed.

@nschoenbrot nschoenbrot merged commit 889a950 into main May 18, 2022
@nschoenbrot nschoenbrot deleted the 163_Yaml_Datasets branch May 18, 2022 14:40
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* 163 Initial commit showing that yaml dataset can be saved.

* 163 Refactor the yaml dataset endpoint and added pytests.

* 163 Exclude yaml dataset endpoint from open api spec.

Since this endpoint needs to access the request, the open api spec will not be generated correctly. To include this, extend open api: https://fastapi.tiangolo.com/advanced/extending-openapi/

* 163 Formatted with Black.

* 163 Fixes for mypy.

* 163 Added happy path test for patching yaml dataset.

* 163 respond with 415 if the wrong content type is specified a patch to the yaml dataset endpoint.

* 163 Respond with 400 when the dataset yaml cannot be loaded.

* 163 Added example of a YAML PATCH request for dataset. Named: Create/Update Postgres Dataset YAML

* 163 Added example of a YAML PATCH request for dataset. Named: Create/Update Postgres Dataset YAML

* 163 Test patching multiple datasets at a time.

* 163 Return specific yaml error in http response.

* 163 Ran make black.

* 163 Trying to get linter to pass.

Co-authored-by: Nelson Schoenbrot <nelson@ethyca.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support YAML format for configuring datasets on fidesops
4 participants