🐛 Source Zendesk Support: sync rate improvement #9062

octavia-squidington-iii · 2021-12-22T19:12:43Z

Is this your first time deploying Airbyte: Yes
OS Version / Instance: MBP / Catalina
Deployment: docker-compose
Airbyte Version: 0.34.1-alpha
Source name/version: Zendesk Support 0.1.8
Destination name/version: Snowflake
Description: I am testing out Zendesk->Snowflake sync on my laptop before deploying to an EC2 instance. Everything is working, but the rate at which the data is syncing is very slow. As you can see in the logs, it is syncing at a rate of 1K rows every minute or so. We have >7M tickets in Zendesk, so total number of table rows is some multiple of that, let’s say 50M rows, which means that at 1000 rows/minute it will take over a month to backfill our data.

I emailed with Zendesk Support about this, and they said that given our API rate limit of 700 requests per minute and the # of rows that can be pulled per request, we should be able to backfill all of our data in less than 3 days. I just checked our Zendesk API status page, which says we are at 163/700 requests per minute at this very moment (with Airbyte chugging away).

Since this is my first time using Airbyte, I’m guessing/hoping that I missed a setting somewhere. Thanks in advance for your help!

https://airbytehq.slack.com/archives/C01MFR03D5W/p1639967338010700?thread_ts=1639967338.010700&cid=C01MFR03D5W

marcosmarxm · 2021-12-22T19:14:50Z

Logs from user: Untitled (2).txt

htrueman · 2021-12-27T14:26:04Z

Scoping Report

There are few base classes, inherited by end streams: IncrementalUnsortedCursorStream, IncrementalSortedCursorStream, IncrementalExportStream, FullRefreshStream, IncrementalUnsortedPageStream.
That makes code not really clear. Also inheritance is not clear (e.g. FullRefreshStream is inherited from IncrementalUnsortedPageStream). This need to be refactored.
source-zendesk-support does not use all available api limits, thus we definitely have a scope to boots up the sync speed.
We may reduce the page size, so make more API calls. This may help, but definitely needs to be tested out.
According to usage reports, we may parallel the stream sync for up to 4 processes (with current page sizes). So it may be the case to take as example other sources (such as source-facebook-marketing or source-s3). Then create some kind of process pool to execute in parallel.
To do so, we may track the API activity (see https://support.zendesk.com/hc/en-us/articles/4408836402074-Using-the-API-dashboard). For developing purposes there is available graphical admin interface https://support.zendesk.com/hc/en-us/articles/4408838272410. In the codebase we can compare the activity in the last 24 hours with the Core API against the rate limit.

htrueman · 2022-02-08T08:20:24Z

Working on rewriting the existing connector using future requests. So need to rewrite the streams.py to collect a deque of future requests to then process them as soon as they're finished.
To do so we need to pre-calculate the number of items by endpoint, then split it into n pages (count endpoint and offset pagination must be available).
We also need to catch rate limit exceptions, readd future request do deque one more time and resend in n backoff_time.
Incremental sync config may be changed in some cases as we need to change stream endpoint if it's not supporting offset pagination.

octavia-squidington-iii added community slack labels Dec 22, 2021

marcosmarxm added the area/connectors Connector related issues label Dec 22, 2021

marcosmarxm changed the title ~~Created this issue from slack~~ Source Zendesk Support: sync rate improvement Dec 22, 2021

alafanechere changed the title ~~Source Zendesk Support: sync rate improvement~~ :bug Source Zendesk Support: sync rate improvement Dec 23, 2021

alafanechere changed the title ~~:bug Source Zendesk Support: sync rate improvement~~ 🐛 Source Zendesk Support: sync rate improvement Dec 23, 2021

sherifnada added this to the Connectors Jan 14 2022 milestone Dec 24, 2021

htrueman self-assigned this Dec 27, 2021

htrueman linked a pull request Jan 13, 2022 that will close this issue

🎉 Source Zendesk: sync rate improvement #9456

Merged

7 tasks

igrankova added connectors/sources-api connectors/source/zendesk-support labels Jan 17, 2022

VasylLazebnyk modified the milestones: Connectors Jan 14 2022, Connectors Jan 28 2022 Jan 17, 2022

karinakuz added connectors/destinations-warehouse connectors/destination/snowflake labels Jan 17, 2022

VasylLazebnyk modified the milestones: Connectors Jan 28 2022, Connectors Feb 11 Jan 31, 2022

VasylLazebnyk modified the milestones: Connectors Feb 11, Connectors Feb 25 Feb 14, 2022

htrueman closed this as completed in #9456 Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Source Zendesk Support: sync rate improvement #9062

🐛 Source Zendesk Support: sync rate improvement #9062

octavia-squidington-iii commented Dec 22, 2021

marcosmarxm commented Dec 22, 2021

htrueman commented Dec 27, 2021

htrueman commented Feb 8, 2022 •

edited

Loading

🐛 Source Zendesk Support: sync rate improvement #9062

🐛 Source Zendesk Support: sync rate improvement #9062

Comments

octavia-squidington-iii commented Dec 22, 2021

marcosmarxm commented Dec 22, 2021

htrueman commented Dec 27, 2021

Scoping Report

htrueman commented Feb 8, 2022 • edited Loading

htrueman commented Feb 8, 2022 •

edited

Loading