Skip to content

Commit

Permalink
262 saas connector pagination strategies (#286)
Browse files Browse the repository at this point in the history
* Implementations of offset, link, and cursor pagination

* Adding pagination to SaaS connector workflow
Updating documentation and Postman collection

* Fixing Pylint warning

* Updating unwrap postprocessor to accepts lists in addition to dicts
Accounting for the use case where the list of objects is at the root level of the response and does not need a data_path

* Adding missing test case

Co-authored-by: Adrian Galvan <adrian@ethyca.com>
  • Loading branch information
galvana and galvana authored Mar 17, 2022
1 parent 492fdce commit 96df91d
Show file tree
Hide file tree
Showing 25 changed files with 1,211 additions and 289 deletions.
21 changes: 9 additions & 12 deletions data/saas/config/mailchimp_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,8 @@ saas_config:
- dataset: mailchimp_connector_example
field: conversations.id
direction: from
data_path: conversation_messages
postprocessors:
- strategy: unwrap
configuration:
data_path: conversation_messages
- strategy: filter
configuration:
field: from_email
Expand All @@ -59,10 +57,13 @@ saas_config:
- name: placeholder
type: query
identity: email
postprocessors:
- strategy: unwrap
configuration:
data_path: conversations
data_path: conversations
pagination:
strategy: offset
configuration:
incremental_param: offset
increment_by: 1000
limit: 10000
- name: member
requests:
read:
Expand All @@ -71,11 +72,7 @@ saas_config:
- name: query
type: query
identity: email
data_type: string
postprocessors:
- strategy: unwrap
configuration:
data_path: exact_matches.members
data_path: exact_matches.members
update:
path: /3.0/lists/<list_id>/members/<subscriber_hash>
request_params:
Expand Down
16 changes: 5 additions & 11 deletions docs/fidesops/docs/guides/saas_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,8 @@ saas_config:
- dataset: mailchimp_connector_example
field: conversations.id
direction: from
data_path: conversation_messages
postprocessors:
- strategy: unwrap
configuration:
data_path: conversation_messages
- strategy: filter
configuration:
field: from_email
Expand All @@ -78,10 +76,7 @@ saas_config:
- name: placeholder
type: query
identity: email
postprocessors:
- strategy: unwrap
configuration:
data_path: conversations
data_path: conversations
- name: member
requests:
read:
Expand All @@ -91,10 +86,7 @@ saas_config:
type: query
identity: email
data_type: string
postprocessors:
- strategy: unwrap
configuration:
data_path: exact_matches.members
data_path: exact_matches.members
update:
path: /3.0/lists/<list_id>/members/<subscriber_hash>
request_params:
Expand Down Expand Up @@ -180,7 +172,9 @@ This is where we define how we are going to access and update each collection in
- `references` These are the same as `references` in the Dataset schema. It is used to define the source of the value for the given request_param.
- `identity` This denotes the identity value that this request_param should take.
- `default_value` Hard-coded default value for a `request_param`. This is most often used for query params since a static path param can just be included in the `path`.
- `data_path`: The expression used to access the collection information from the raw JSON response.
- `postprocessors` An optional list of response post-processing strategies. We will ignore this for the example scenarios below but an in depth-explanation can be found under [SaaS Post-Processors](saas_postprocessors.md)
- `pagination` An optional strategy used to get the next set of results from APIs with resources spanning multiple pages. Details can be found under [SaaS Pagination](saas_pagination.md)

## Example scenarios
#### Dynamic path with dataset references
Expand Down
95 changes: 95 additions & 0 deletions docs/fidesops/docs/guides/saas_pagination.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# SaaS Pagination

These pagination strategies allow Fidesops to incrementally retrieve content from SaaS APIs. APIs can vary in the way subsequent pages are accessed so these configurable options aim to cover a majority of common use cases.

## Supported Strategies
- `offset`: Iterates through the available pages by incrementing the value of a query param.
- `link`: Uses links returned in the headers or the body to get to the next page.
- `cursor`: Uses a value from the last-retrieved object to use as a query param pointing to the next set of results.

### Offset
This strategy can be used to iterate through pages, or to define the offset for a batch of results. In either case, this strategy increments the specified query param by the `increment_by` value until no more results are returned or the `limit` is reached.

#### Configuration Details
- `incremental_param` (_str_): The query param to increment the value for.
- `increment_by` (_int_): The value to increment the `incremental_param` after each set of results.
- `limit` (_str_): The max value the `incremental_param` can reach.

#### Example
This example would take the `page` query param and increment it by 1 until the limit of 10 is reached or no more results are returned (whichever comes first).
```yaml
pagination:
strategy: offset
configuration:
incremental_param: page
increment_by: 1
limit: 10
```
### Link
This strategy is used when the link to the next page is provided as part of the API response. The link is read from the headers or the body and used to get the next page of results.
#### Configuration Details
- `source` (_str_): The location to get the link from, can be either `headers` or `body`.
- `path` (_str_): The expression used to refer to the location of the link within the headers or the body.

#### Examples
The source value of `headers` is meant to be used with responses following [RFC 5988](https://datatracker.ietf.org/doc/html/rfc5988#page-6).
```
Link: <https://api.host.com/conversations?page_ref=ad6f38r3>; rel="next",
<https://api.host.com/conversations?page_ref=gss8ap4g>; rel="prev"
```
Given this Link header, we can specify a path of `link.next` (case-insensitive). This indicates that we are looking in the `Link` header with a `rel` of next.
```yaml
pagination:
strategy: link
configuration:
source: headers
path: link.next
```

We can also access links returned in the body. If we receive this value in the body:
```
{
...
"next_page": {
"url": "https://api.host.com/conversations?page_ref=ad6f38r3"
}
...
}
```
We can use the path value of `next_page.url` as the expression to access the url.
```yaml
pagination:
strategy: link
configuration:
source: body
path: next_page.url
```
### Cursor
This strategy is used when a specific value from a response object is used as a cursor to determine the starting point for the next set of results.
#### Configuration Details
- `cursor_param` (_str_): The name of the query param to assign the cursor value to.
- `field` (_str_): The field to read from the most recently retrieved object to use as the cursor value.

#### Examples
If an API request returns the following:
```
{
"messages": [
{"id": 1, msg: "this is"},
{"id": 2, msg: "a"}
{"id": 3, msg: "test"}
]
}
```
This strategy will take the field `id` from the last item returned and generate a new request with a query param of `after=3`
```yaml
pagination:
strategy: cursor
configuration:
cursor_param: after
field: id
```
Loading

0 comments on commit 96df91d

Please sign in to comment.