Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[No code connector builder] cannot use next_page_token in request JSON body interpolation #40697

Open
iliyasned opened this issue Jul 3, 2024 · 14 comments

Comments

@iliyasned
Copy link

iliyasned commented Jul 3, 2024

Topic

Possible bug with next_page_token interpolation (Airbyte Cloud)

Relevant information

Issue description

When I try to use the next_page_token variable in a freeform json request body, it is not interpolated and as a result not set in the outgoing request

Here is the request body:

{
  "query": {
    "operator": "AND",
    "value": [
      {
        "field": "created_at",
        "operator": ">",
        "value": "1306054154"
      }
    ]
  },
  "pagination": {
    "per_page": 20,
    "starting_after": "{{next_page_token}}" # I also tried {{ next_page_token['next_page_token'] }} and {{ next_page_token['next_page_token']['starting_after']
  }
}

Here is the paginator:

type: DefaultPaginator
pagination_strategy:
  type: CursorPagination
  cursor_value: '{{ response.get("pages", {}).get("next", {}).get("starting_after", {}) }}'
  stop_condition: >-
    {{ not response.get("pages", {}).get("next", {}).get("starting_after", {})
    }}

Here is the response:

{
  "status": 200,
  "body": {
    "type": "ticket.list",
    "pages": {
      "type": "pages",
      "next": {
        "page": 2,
        "starting_after": "WzE3MTk5MTY5NzYwMDAsNTg1LDJd"
      },
      "page": 1,
      "per_page": 20,
      "total_pages": 11
    },
    "total_count": 219,
    "tickets": [... ]
  },
  "headers": {
    "Date": "Wed, 03 Jul 2024 10:07:30 GMT",
    "Content-Type": "application/json; charset=utf-8",
    "Transfer-Encoding": "chunked",
    "Connection": "keep-alive",
    "Status": "200 OK",
    "X-RateLimit-Limit": "1667",
    "X-RateLimit-Reset": "1720001250",
    "Vary": "Accept,Accept-Encoding",
    "X-RateLimit-Remaining": "1666",
    "X-Intercom-Version": "7c0b4cd723debb71533563e6ee48379600600ab2",
    "Content-Encoding": "gzip",
    "X-Request-Id": "001ms34bp7aj8elhac20",
    "ETag": "W/\"2486fdfbb7066d3ddaa912b2664da0a0\"",
    "X-Frame-Options": "SAMEORIGIN",
    "Cache-Control": "max-age=0, private, must-revalidate",
    "Strict-Transport-Security": "max-age=31556952; includeSubDomains; preload",
    "X-XSS-Protection": "1; mode=block",
    "X-Request-Queueing": "0",
    "Intercom-Version": "2.11",
    "X-Runtime": "2.640149",
    "X-Content-Type-Options": "nosniff",
    "Server": "nginx",
    "x-ami-version": "ami-03ba2b5f972368d27"
  }
}

Here is the request for the second page (which I would expect to have the starting_after WITHIN pagination, not outside of it as shown):

{

  "url": "https://api.intercom.io/tickets/search",
  "body": {
    "query": {
      "operator": "AND",
      "value": [
        {
          "field": "created_at",
          "operator": ">",
          "value": 1306054154
        }
      ]
    },
    "pagination": {
      "per_page": 20
    },
    "starting_after": "WzE3MTk5MTY5NzYwMDAsNTg1LDJd"
  },
  "headers": {
    "User-Agent": "python-requests/2.32.3",
    "Accept-Encoding": "gzip, deflate",
    "Accept": "*/*",
    "Connection": "keep-alive",
    "Content-Type": "application/json",
    "Intercom-Version": "2.11",
    "Authorization": "Bearer ****",
    "Content-Length": "186"
  },
  "http_method": "POST"
}
Full YAML `version: 2.0.0

type: DeclarativeSource

check:
type: CheckStream
stream_names:
- intercom tickets

definitions:
streams:
intercom tickets:
type: DeclarativeStream
next_page_token: '{{ $.response.pages.next.starting_after}}'
name: intercom tickets
retriever:
type: SimpleRetriever
requester:
$ref: '#/definitions/base_requester'
path: /tickets/search
http_method: POST
request_headers:
Content-Type: application/json
Intercom-Version: '2.11'
request_body_json:
query:
operator: AND
value:
- field: created_at
operator: '>'
value: '1306054154'
pagination:
per_page: 20
starting_after: '{{next_page_token}}'
record_selector:
type: RecordSelector
extractor:
type: DpathExtractor
field_path:
- tickets
paginator:
type: DefaultPaginator
page_token_option:
type: RequestOption
inject_into: body_json
field_name: starting_after
pagination_strategy:
type: CursorPagination
cursor_value: >-
{{ response.get("pages", {}).get("next", {}).get("starting_after",
{}) }}
stop_condition: >-
{{ not response.get("pages", {}).get("next",
{}).get("starting_after", {}) }}
schema_loader:
type: InlineSchemaLoader
schema:
$ref: '#/schemas/intercom tickets'
base_requester:
type: HttpRequester
url_base: https://api.intercom.io/
authenticator:
type: BearerAuthenticator
api_token: '{{ config["api_key"] }}'

streams:

  • $ref: '#/definitions/streams/intercom tickets'

spec:
type: Spec
connection_specification:
type: object
$schema: http://json-schema.org/draft-07/schema#
required:
- api_key
properties:
api_key:
type: string
order: 0
title: API Key
airbyte_secret: true
additionalProperties: true

metadata:
autoImportSchema:
intercom tickets: true

schemas:
intercom tickets:
type: object
$schema: http://json-schema.org/schema#
additionalProperties: true
properties:
type:
type:
- string
- 'null'
admin_assignee_id:
type:
- string
- 'null'
category:
type:
- string
- 'null'
contacts:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
contacts:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
external_id:
type:
- string
- 'null'
id:
type:
- string
- 'null'
created_at:
type:
- number
- 'null'
id:
type:
- string
- 'null'
is_shared:
type:
- boolean
- 'null'
linked_objects:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
data:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
id:
type:
- string
- 'null'
has_more:
type:
- boolean
- 'null'
total_count:
type:
- number
- 'null'
open:
type:
- boolean
- 'null'
team_assignee_id:
type:
- string
- 'null'
ticket_attributes:
type:
- object
- 'null'
properties:
Order Identifier:
type:
- number
- 'null'
Partner / Sirdab:
type:
- string
- 'null'
Sirdab SKU:
type:
- number
- 'null'
Type of Order:
type:
- string
- 'null'
default_description:
type:
- string
- 'null'
default_title:
type:
- string
- 'null'
ticket_id:
type:
- string
- 'null'
ticket_parts:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
ticket_parts:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
assigned_to:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
id:
type:
- string
- 'null'
attachments:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
content_type:
type:
- string
- 'null'
filesize:
type:
- number
- 'null'
name:
type:
- string
- 'null'
url:
type:
- string
- 'null'
author:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
email:
type:
- string
- 'null'
id:
type:
- string
- 'null'
name:
type:
- string
- 'null'
body:
type:
- string
- 'null'
created_at:
type:
- number
- 'null'
external_id:
type:
- string
- 'null'
id:
type:
- string
- 'null'
part_type:
type:
- string
- 'null'
previous_ticket_state:
type:
- string
- 'null'
redacted:
type:
- boolean
- 'null'
ticket_state:
type:
- string
- 'null'
updated_at:
type:
- number
- 'null'
total_count:
type:
- number
- 'null'
ticket_state:
type:
- string
- 'null'
ticket_state_external_label:
type:
- string
- 'null'
ticket_state_internal_label:
type:
- string
- 'null'
ticket_type:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
archived:
type:
- boolean
- 'null'
category:
type:
- string
- 'null'
created_at:
type:
- number
- 'null'
description:
type:
- string
- 'null'
icon:
type:
- string
- 'null'
id:
type:
- string
- 'null'
is_internal:
type:
- boolean
- 'null'
name:
type:
- string
- 'null'
ticket_type_attributes:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
data:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
archived:
type:
- boolean
- 'null'
created_at:
type:
- number
- 'null'
data_type:
type:
- string
- 'null'
default:
type:
- boolean
- 'null'
description:
type:
- string
- 'null'
id:
type:
- string
- 'null'
input_options:
type:
- object
- 'null'
properties:
list_options:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
archived:
type:
- boolean
- 'null'
id:
type:
- string
- 'null'
label:
type:
- string
- 'null'
multiline:
type:
- boolean
- 'null'
name:
type:
- string
- 'null'
order:
type:
- number
- 'null'
required_to_create:
type:
- boolean
- 'null'
required_to_create_for_contacts:
type:
- boolean
- 'null'
ticket_type_id:
type:
- number
- 'null'
updated_at:
type:
- number
- 'null'
visible_on_create:
type:
- boolean
- 'null'
visible_to_contacts:
type:
- boolean
- 'null'
workspace_id:
type:
- string
- 'null'
updated_at:
type:
- number
- 'null'
workspace_id:
type:
- string
- 'null'
updated_at:
type:
- number
- 'null'
`

@natikgadzhi
Copy link
Contributor

A few things going on here! First off, thank you A LOT for such a detailed report. I'm supporting our Connector Builder team, I'll take a look later today + tomorrow.

Out of curiosity, I see you're building against Intercom. Is there any reason why you can's use our source-intercom connector? Any missing streams / columns there? I would love to help and add the missing pieces in Intercom proper, as we're migrating it to low-code.

And to the point — I'll try to get our sandbox credentials and run the queries you're trying to run and see if I can reproduce the problem.

@iliyasned
Copy link
Author

Hi Natik, thanks for looking into this.
We're using the custom builder because Airbyte doesn't yet have ticket information streaming from Intercom out-of-the-box, so we were trying to use their REST API to get it working but we ran into this weirdly unsolvable 'next_page_token' issue.

@sherifnada
Copy link
Contributor

sherifnada commented Jul 3, 2024

Looked into this and I think I found the root cause:

  • This stream is configured as full refresh. With the Resumable Full Refresh feature recently added to Airbyte, the CDK tries to implement RFR "for free" by setting the cursor full refresh streams to a ResumableFullRefreshCursor
  • The problem is that if a stream is a ResumableFullRefreshCursor, we only seem to pull one page of data. I'm pretty sure this understanding is incomplete because the CDK tries to fetch more pages, which I'm not sure where exactly that happens, but when I added this change everything seemed to work how I expected (though I don't know if that change breaks something about state management.

What does make me sure that the issue is somewhere on the line of code I shared above is that when I add incremental sync to the stream (and thus make it so that the CDK doesn't create a ResumableFullRefreshCursor for this stream) everything works exactly as expected. Here is the updated YAML that makes things work:

Basically all we added was this block to the stream definition:

incremental_sync:
  type: DatetimeBasedCursor
  cursor_field: created_at
  cursor_datetime_formats:
    - '%s'
  datetime_format: '%s'
  start_datetime:
    type: MinMaxDatetime
    datetime: '{{ config["start_date"] }}'
    datetime_format: '%Y-%m-%dT%H:%M:%SZ'

We should probably keep the issue open until the CDK fix is merged. But I think we are unblocked for now.

full yaml version: 2.0.0

type: DeclarativeSource

check:
type: CheckStream
stream_names:
- tickets

definitions:
streams:
tickets:
type: DeclarativeStream
name: tickets
retriever:
type: SimpleRetriever
requester:
$ref: '#/definitions/base_requester'
path: /tickets/search?
http_method: POST
request_headers:
Content-Type: application/json
Intercom-Version: '2.11'
request_body_json:
sort:
field: created_at
order: ascending
query:
value:
- field: created_at
value: >-
{{ stream_slice.get('start_date',
timestamp(config['start_date'])) }}
operator: '>'
operator: AND
pagination:
per_page: 50
starting_after: '{{ next_page_token[''next_page_token''] }}'
record_selector:
type: RecordSelector
extractor:
type: DpathExtractor
field_path:
- tickets
paginator:
type: DefaultPaginator
pagination_strategy:
type: CursorPagination
cursor_value: >-
{{ response.get("pages", {}).get("next", {}).get("starting_after",
{}) }}
stop_condition: >-
{{ not response.get("pages", {}).get("next",
{}).get("starting_after", {}) }}
incremental_sync:
type: DatetimeBasedCursor
cursor_field: created_at
cursor_datetime_formats:
- '%s'
datetime_format: '%s'
start_datetime:
type: MinMaxDatetime
datetime: '{{ config["start_date"] }}'
datetime_format: '%Y-%m-%dT%H:%M:%SZ'
schema_loader:
type: InlineSchemaLoader
schema:
$ref: '#/schemas/tickets'
base_requester:
type: HttpRequester
url_base: https://api.intercom.io/
authenticator:
type: BearerAuthenticator
api_token: '{{ config["api_key"] }}'

streams:

  • $ref: '#/definitions/streams/tickets'

spec:
type: Spec
connection_specification:
type: object
$schema: http://json-schema.org/draft-07/schema#
required: []
properties: {}
additionalProperties: true

metadata:
autoImportSchema:
tickets: false
yamlComponents:
global:
- authenticator

schemas:
tickets:
type: object
$schema: http://json-schema.org/schema#
additionalProperties: true
properties:
type:
type:
- string
- 'null'
admin_assignee_id:
type:
- string
- 'null'
category:
type:
- string
- 'null'
contacts:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
contacts:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
external_id:
type:
- string
- 'null'
id:
type:
- string
- 'null'
created_at:
type:
- number
- 'null'
id:
type:
- string
- 'null'
is_shared:
type:
- boolean
- 'null'
linked_objects:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
data:
type:
- array
- 'null'
has_more:
type:
- boolean
- 'null'
total_count:
type:
- number
- 'null'
open:
type:
- boolean
- 'null'
team_assignee_id:
type:
- string
- 'null'
ticket_attributes:
type:
- object
- 'null'
properties:
Order Identifier:
type:
- number
- 'null'
Type of Order:
type:
- string
- 'null'
default_description:
type:
- string
- 'null'
default_title:
type:
- string
- 'null'
ticket_id:
type:
- string
- 'null'
ticket_parts:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
ticket_parts:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
assigned_to:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
id:
type:
- string
- 'null'
attachments:
type:
- array
- 'null'
author:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
email:
type:
- string
- 'null'
id:
type:
- string
- 'null'
name:
type:
- string
- 'null'
body:
type:
- string
- 'null'
created_at:
type:
- number
- 'null'
id:
type:
- string
- 'null'
part_type:
type:
- string
- 'null'
previous_ticket_state:
type:
- string
- 'null'
redacted:
type:
- boolean
- 'null'
ticket_state:
type:
- string
- 'null'
updated_at:
type:
- number
- 'null'
total_count:
type:
- number
- 'null'
ticket_state:
type:
- string
- 'null'
ticket_state_external_label:
type:
- string
- 'null'
ticket_state_internal_label:
type:
- string
- 'null'
ticket_type:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
archived:
type:
- boolean
- 'null'
category:
type:
- string
- 'null'
created_at:
type:
- number
- 'null'
description:
type:
- string
- 'null'
icon:
type:
- string
- 'null'
id:
type:
- string
- 'null'
is_internal:
type:
- boolean
- 'null'
name:
type:
- string
- 'null'
ticket_type_attributes:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
data:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
type:
type:
- string
- 'null'
archived:
type:
- boolean
- 'null'
created_at:
type:
- number
- 'null'
data_type:
type:
- string
- 'null'
default:
type:
- boolean
- 'null'
description:
type:
- string
- 'null'
id:
type:
- string
- 'null'
input_options:
type:
- object
- 'null'
properties:
list_options:
type:
- array
- 'null'
items:
type:
- object
- 'null'
properties:
archived:
type:
- boolean
- 'null'
id:
type:
- string
- 'null'
label:
type:
- string
- 'null'
multiline:
type:
- boolean
- 'null'
name:
type:
- string
- 'null'
order:
type:
- number
- 'null'
required_to_create:
type:
- boolean
- 'null'
required_to_create_for_contacts:
type:
- boolean
- 'null'
ticket_type_id:
type:
- number
- 'null'
updated_at:
type:
- number
- 'null'
visible_on_create:
type:
- boolean
- 'null'
visible_to_contacts:
type:
- boolean
- 'null'
workspace_id:
type:
- string
- 'null'
updated_at:
type:
- number
- 'null'
workspace_id:
type:
- string
- 'null'
updated_at:
type:
- number
- 'null'

@sherifnada
Copy link
Contributor

I also made a probably-not-entirely-correct PR adding this stream to the existing Intercom connector

@natikgadzhi
Copy link
Contributor

@brianjlai, any chance you can pick this up? /cc @girarda

@brianjlai
Copy link
Contributor

@natikgadzhi probably not, i'm mostly focused on the automatic RFR stuff for the time being

@ivanlm
Copy link

ivanlm commented Oct 24, 2024

I've had the same issue and found the "next_page_token" attribute is available in the "stream_partition" object. Not sure if this is a workaround or new method, as via next_page_token object it was not working for me either when trying to update a no code custom connector, under Text (Free Form) request body.

@natikgadzhi
Copy link
Contributor

/cc @lmossman I think this is an interesting one. Am I right to assume that a solution is to pass next page token to interpolation context dicts in the CDK side?

@lmossman
Copy link
Contributor

lmossman commented Nov 5, 2024

@natikgadzhi based on the comment above it sounds like this may already be doable with {{ stream_partition['next_page_token'] }} in the request body

@ivanlm
Copy link

ivanlm commented Nov 6, 2024

I've had the same issue and found the "next_page_token" attribute is available in the "stream_partition" object.

An update to the comment above, this workaround is not working when there is a "Parent Stream" or "Parameterized Requests" is being used. In this case the contents of "stream_partition" do not contain the "next_page_token". This became a blocker for one particular use case we have.

@lmossman
Copy link
Contributor

lmossman commented Nov 7, 2024

@natikgadzhi @maxi297 could you look into why the next_page_token is not available to request_body_json in the HttpRequester in this case the user has stated above?

@mannharleen
Copy link

@natikgadzhi @maxi297 @lmossman - any update on this please?
This is a bug surely. The current workaround {{ stream_partition['next_page_token'] }} works but only when parameterized req are not used.

I have a feeling that instead of making the next_page_token available under the dict named "next_page_token", it is being made available under a dict named "stream_partition"

@mannharleen
Copy link

I can take a stab at solving this if someone can guide me to where to look?

ossmht added a commit to ossmht/PyAirbyte that referenced this issue Dec 5, 2024
@justbeez
Copy link
Contributor

I noticed this recently when rebuilding the Pardot source. On streams set to Incremental, {{ next_page_token.next_page_token }} worked fine; but on non-incremental streams, it doesn't. This was a problem with Pardot, because when their nextPageUrl is passed, you cannot pass any other parameters.

What I noticed though was that if I enabled Parameterized Requests with a dummy value, next_page_token.next_page_token was populated correctly. So I worked around things that way, since it let me use next_page_token.next_page_token in all cases without having to add additional fallback logic based on the features enabled on the stream. So a hack, but at least a hack that only appears one place (the dummy Parameterized values, which don't get injected anyway).

Hopefully this helps track things down a bit more. I would argue that the right solution is fixing next_page_token.next_page_token, not fixing/relying on stream_partition.next_page_token (which I think just adds to the confusion since it isn't the documented way to handle this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests