-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Tempo: Avoid infinite loop for non-paginated APIs #16361
🐛 Source Tempo: Avoid infinite loop for non-paginated APIs #16361
Conversation
…nation and support more than DEFAULT_ITEMS_PER_PAGE results.
e6d7668
to
a00ac71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change process to look for next
in the API response payload
@@ -38,7 +38,7 @@ def lists(self, name, url, params, func, **kwargs): | |||
response = requests.get(f"{self.base_api_url}{url}?limit={params['limit']}&offset={params['offset']}", headers=self.headers) | |||
data = func(response.json()) | |||
yield from data | |||
if len(data) < self.DEFAULT_ITEMS_PER_PAGE: | |||
if not self.ENTITIES_MAP[name]["paginated"] or len(data) < self.DEFAULT_ITEMS_PER_PAGE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the safe way to handle this. As another API repo in python is listening on the next
field existing in the API response in order to continue. See https://github.com/stanislavulrych/tempo-api-python-client/blob/62418b72b10c625c7fcc9b9d1ea61f09a3c6a09d/tempoapiclient/client.py#L54
I took an example API call for worklogs and can confirm the response object next
is in there https://tempo-io.github.io/tempo-api-docs/v2.html#worklogs - This is the safest way and easier based on the data return vs. being explicit each time. Also there are chances when data may be more than the default and not have a page and this could cause errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worklogs
API call supports pagination and therefore has parameters such as offset, limit, next and previous.
However the accounts
and customers
API calls do not support pagination, and do not support these parameters (eg. see https://tempo-io.github.io/tempo-api-docs/v2.html#accounts). The API ignores the limit parameter (set to 100 by default) and returns all the results at once. If you have (for example) 120 accounts, then the existing code fails to exit.
This change puts a flag against the APIs calls that are paginated and those that are not. For non-paginated API calls it doesn't try to fetch further pages and just exits, avoiding an infinite loop.
It doesn't change the behaviour for paginated API calls (such as worklogs). I don't believe that checking the response size is < DEFAULT_ITEMS_PER_PAGE is a great way to check whether to exit or request further pages. But I'm unaware of an issue with the existing implementation (for the paginated APIs) and it isn't what I am trying to fix, so I've left it as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstrawson thanks for explination. All is good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstrawson code changes are all good, we are just miasing one change - you need to edit the md files too when making changes. Let me know if you need me to point you the connector md with an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koconder Apologies for missing that. If you could point me in the direction of the md with an example, I'd be grateful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstrawson you need to edit this file:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstrawson will get the team to check the integration test results which failed and report back. Should be able to get this review finalized soon
/test connector=connectors/source-tempo
Build FailedTest summary info:
|
/test connector=connectors/source-tempo
Build PassedTest summary info:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes completed, tests passing
/publish connector=connectors/source-tempo
if you have connectors that successfully published but failed definition generation, follow step 4 here |
…#16361) * 🐛 Source Tempo: Avoid infinite loop for APIs that do not support pagination and support more than DEFAULT_ITEMS_PER_PAGE results. * Source Tempo: Update changelog for version 0.2.6 * fix: update additionalProperties to true * fix: remove additionalProperties field in schema files Co-authored-by: Sajarin <sajarindider@gmail.com>
…#16361) * 🐛 Source Tempo: Avoid infinite loop for APIs that do not support pagination and support more than DEFAULT_ITEMS_PER_PAGE results. * Source Tempo: Update changelog for version 0.2.6 * fix: update additionalProperties to true * fix: remove additionalProperties field in schema files Co-authored-by: Sajarin <sajarindider@gmail.com>
What
Only some of the Tempo APIs support pagination. The ones that don't (
accounts
andcustomers
) return all their results and ignore any parameters to request the results to be paginated. Users that haveDEFAULT_ITEMS_PER_PAGE
or more results will enter an infinite loop and the sync operation never completes.How
The existing code repeatedly calls the API with different pagination parameters (
limit
andoffset
) until less thanDEFAULT_ITEMS_PER_PAGE
(normally 100) items are returned. At this point it terminates. This change adds a parameterpaginated
to each API inENTITIES_MAP
to indicate whether it is paginated or not. Where the API is paginated, behaviour is as before. However, if not paginated then only one call is made, avoiding entering an infinite loop.Recommended reading order
All code changes are in
client.py
, with a version bump inDockerfile
.🚨 User Impact 🚨
No user impact.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.