-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDL-24687: Enhance tap performance #150
Conversation
@@ -8,7 +9,8 @@ | |||
|
|||
|
|||
LOGGER = singer.get_logger() | |||
|
|||
DEFAULT_WAIT = 60 # Default wait time for backoff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in zendesk documentation
tap_zendesk/http.py
Outdated
try: | ||
response_json = await response.json() | ||
except Exception: # pylint: disable=broad-except | ||
response_json = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The except Exception block is too broad. It would be better to catch specific exceptions. Also in case of exception we should log the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a separate function raise_for_error_for_async
for the response validation.
tap_zendesk/http.py
Outdated
@backoff.on_exception(backoff.expo, | ||
(ConnectionError, ConnectionResetError, Timeout, ChunkedEncodingError, ProtocolError),#As ConnectionError error and timeout error does not have attribute status_code, | ||
max_tries=5, # here we added another backoff expression. | ||
factor=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it
tap_zendesk/http.py
Outdated
""" | ||
Perform an asynchronous GET request | ||
""" | ||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using while True:
loop might lead into infinite loop if for some reason we don't receive 200
status code. I think we should have some max. retry limits applied here. Also can't we raise custom exceptions and handle these retries in backoff logic itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed while loop and added backoff mechanism to retry 5 times.
|
||
from tap_zendesk import http, streams | ||
|
||
class TestASyncTicketAudits(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test does not cover scenarios where the paginate_ticket_audits
function might raise exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added scenario to validate exception
async def mock_get_objects(session, ticket_id): | ||
return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}] | ||
|
||
|
||
|
||
instance = streams.TicketAudits(None, {}) | ||
instance.stream = 'ticket_audits' | ||
|
||
|
||
# Run the sync method | ||
async def run_test(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def mock_get_objects(session, ticket_id): | |
return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}] | |
instance = streams.TicketAudits(None, {}) | |
instance.stream = 'ticket_audits' | |
# Run the sync method | |
async def run_test(): | |
async def mock_get_objects(session, ticket_id): | |
return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}] | |
instance = streams.TicketAudits(None, {}) | |
instance.stream = 'ticket_audits' | |
# Run the sync method | |
async def run_test(): |
|
||
@aioresponses() | ||
@patch('asyncio.sleep', return_value=None) | ||
def test_call_api_async_conflict(self, mocked, mock_sleep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added docstring in all the functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requested changes inline.
Addressed all the requested changes |
…-io/tap-zendesk into TDL-24687/enhance-tap-performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check the comment added
Description of change
In the current setup:
We first fetch tickets.
For each ticket, we make three separate API calls:
One for ticket_comments
One for ticket_metrics
One for ticket_audits
With 10,000 tickets, this results in over 30,000 API calls, which takes a long extraction time. So, we have added following potential fix to boos the tap performance,
Reduce API calls by side-loading
When fetching a ticket, it is possible to also fetch the ticket_metrics in a single call as a side load, eliminating the need for a separate API call for ticket_metrics.
Similarly, when fetching ticket_audits, we can also fetch ticket_comments as a side load, removing the need for an additional API call to retrieve ticket_comments.
Make API calls asynchronously
More details can be found here in the ticket.
Manual QA steps
Risks
Rollback steps
AI generated code
https://internal.qlik.dev/general/ways-of-working/code-reviews/#guidelines-for-ai-generated-code