Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Zendesk Support: sync rate improvement #9062

Closed
octavia-squidington-iii opened this issue Dec 22, 2021 · 3 comments · Fixed by #9456
Closed

🐛 Source Zendesk Support: sync rate improvement #9062

octavia-squidington-iii opened this issue Dec 22, 2021 · 3 comments · Fixed by #9456

Comments

@octavia-squidington-iii
Copy link
Collaborator

Is this your first time deploying Airbyte: Yes
OS Version / Instance: MBP / Catalina
Deployment: docker-compose
Airbyte Version: 0.34.1-alpha
Source name/version: Zendesk Support 0.1.8
Destination name/version: Snowflake
Description: I am testing out Zendesk->Snowflake sync on my laptop before deploying to an EC2 instance. Everything is working, but the rate at which the data is syncing is very slow. As you can see in the logs, it is syncing at a rate of 1K rows every minute or so. We have >7M tickets in Zendesk, so total number of table rows is some multiple of that, let’s say 50M rows, which means that at 1000 rows/minute it will take over a month to backfill our data.

I emailed with Zendesk Support about this, and they said that given our API rate limit of 700 requests per minute and the # of rows that can be pulled per request, we should be able to backfill all of our data in less than 3 days. I just checked our Zendesk API status page, which says we are at 163/700 requests per minute at this very moment (with Airbyte chugging away).

Since this is my first time using Airbyte, I’m guessing/hoping that I missed a setting somewhere. Thanks in advance for your help!

https://airbytehq.slack.com/archives/C01MFR03D5W/p1639967338010700?thread_ts=1639967338.010700&cid=C01MFR03D5W

@marcosmarxm marcosmarxm added the area/connectors Connector related issues label Dec 22, 2021
@marcosmarxm marcosmarxm changed the title Created this issue from slack Source Zendesk Support: sync rate improvement Dec 22, 2021
@marcosmarxm
Copy link
Member

Logs from user: Untitled (2).txt

@alafanechere alafanechere changed the title Source Zendesk Support: sync rate improvement :bug Source Zendesk Support: sync rate improvement Dec 23, 2021
@alafanechere alafanechere changed the title :bug Source Zendesk Support: sync rate improvement 🐛 Source Zendesk Support: sync rate improvement Dec 23, 2021
@sherifnada sherifnada added this to the Connectors Jan 14 2022 milestone Dec 24, 2021
@htrueman htrueman self-assigned this Dec 27, 2021
@htrueman
Copy link
Contributor

Scoping Report

  • There are few base classes, inherited by end streams: IncrementalUnsortedCursorStream, IncrementalSortedCursorStream, IncrementalExportStream, FullRefreshStream, IncrementalUnsortedPageStream.
    That makes code not really clear. Also inheritance is not clear (e.g. FullRefreshStream is inherited from IncrementalUnsortedPageStream). This need to be refactored.

  • source-zendesk-support does not use all available api limits, thus we definitely have a scope to boots up the sync speed.

  • We may reduce the page size, so make more API calls. This may help, but definitely needs to be tested out.

  • According to usage reports, we may parallel the stream sync for up to 4 processes (with current page sizes). So it may be the case to take as example other sources (such as source-facebook-marketing or source-s3). Then create some kind of process pool to execute in parallel.

  • To do so, we may track the API activity (see https://support.zendesk.com/hc/en-us/articles/4408836402074-Using-the-API-dashboard). For developing purposes there is available graphical admin interface https://support.zendesk.com/hc/en-us/articles/4408838272410. In the codebase we can compare the activity in the last 24 hours with the Core API against the rate limit.

@htrueman
Copy link
Contributor

htrueman commented Feb 8, 2022

  • Working on rewriting the existing connector using future requests. So need to rewrite the streams.py to collect a deque of future requests to then process them as soon as they're finished.
  • To do so we need to pre-calculate the number of items by endpoint, then split it into n pages (count endpoint and offset pagination must be available).
  • We also need to catch rate limit exceptions, readd future request do deque one more time and resend in n backoff_time.
    Incremental sync config may be changed in some cases as we need to change stream endpoint if it's not supporting offset pagination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment