Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDL-24687: Enhance tap performance #150

Merged
merged 15 commits into from
Nov 7, 2024

Conversation

prijendev
Copy link
Contributor

@prijendev prijendev commented Oct 28, 2024

Description of change

In the current setup:

We first fetch tickets.
For each ticket, we make three separate API calls:

  • One for ticket_comments

  • One for ticket_metrics

  • One for ticket_audits
    With 10,000 tickets, this results in over 30,000 API calls, which takes a long extraction time. So, we have added following potential fix to boos the tap performance,

  • Reduce API calls by side-loading

    • Ticket Metrics Side Load:
      When fetching a ticket, it is possible to also fetch the ticket_metrics in a single call as a side load, eliminating the need for a separate API call for ticket_metrics.
    • Ticket Audits and Comments Side Load:
      Similarly, when fetching ticket_audits, we can also fetch ticket_comments as a side load, removing the need for an additional API call to retrieve ticket_comments.
    • By combining these side loads, the total number of API calls will be reduced significantly, from over 30,000 calls to just 10,000 for the same 10,000 tickets.
  • Make API calls asynchronously

    • The total time required for the tap to complete in sync mode has been reduced by 90% compared to the current version.

More details can be found here in the ticket.

Manual QA steps

  • Verify that the discover mode is working as expected.
  • Verify that sync mode is working as expected.
  • Verify the no of records, state, and schema for each sync.
  • Verify that tap is working as expected with the state as well

Risks

Rollback steps

  • revert this branch

AI generated code

https://internal.qlik.dev/general/ways-of-working/code-reviews/#guidelines-for-ai-generated-code

  • this PR has been written with the help of GitHub Copilot or another generative AI tool

@prijendev prijendev changed the title Tdl 24687/enhance tap performance TDL-24687: Enhance tap performance Oct 28, 2024
@@ -8,7 +9,8 @@


LOGGER = singer.get_logger()

DEFAULT_WAIT = 60 # Default wait time for backoff
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in zendesk documentation

try:
response_json = await response.json()
except Exception: # pylint: disable=broad-except
response_json = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except Exception block is too broad. It would be better to catch specific exceptions. Also in case of exception we should log the warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a separate function raise_for_error_for_async for the response validation.

Comment on lines 213 to 216
@backoff.on_exception(backoff.expo,
(ConnectionError, ConnectionResetError, Timeout, ChunkedEncodingError, ProtocolError),#As ConnectionError error and timeout error does not have attribute status_code,
max_tries=5, # here we added another backoff expression.
factor=2)
Copy link
Contributor

@RushiT0122 RushiT0122 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

"""
Perform an asynchronous GET request
"""
while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using while True: loop might lead into infinite loop if for some reason we don't receive 200 status code. I think we should have some max. retry limits applied here. Also can't we raise custom exceptions and handle these retries in backoff logic itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed while loop and added backoff mechanism to retry 5 times.

tap_zendesk/http.py Outdated Show resolved Hide resolved
tap_zendesk/streams.py Outdated Show resolved Hide resolved
tap_zendesk/streams.py Outdated Show resolved Hide resolved
tap_zendesk/streams.py Outdated Show resolved Hide resolved
tap_zendesk/streams.py Outdated Show resolved Hide resolved

from tap_zendesk import http, streams

class TestASyncTicketAudits(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test does not cover scenarios where the paginate_ticket_audits function might raise exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added scenario to validate exception

Comment on lines 21 to 31
async def mock_get_objects(session, ticket_id):
return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}]



instance = streams.TicketAudits(None, {})
instance.stream = 'ticket_audits'


# Run the sync method
async def run_test():
Copy link
Contributor

@RushiT0122 RushiT0122 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def mock_get_objects(session, ticket_id):
return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}]
instance = streams.TicketAudits(None, {})
instance.stream = 'ticket_audits'
# Run the sync method
async def run_test():
async def mock_get_objects(session, ticket_id):
return [{'id': ticket_id, 'events': [{'type': 'Comment', 'id': f'comment_{ticket_id}'}], 'created_at': '2023-01-01T00:00:00Z', 'via': 'web', 'metadata': {}}]
instance = streams.TicketAudits(None, {})
instance.stream = 'ticket_audits'
# Run the sync method
async def run_test():


@aioresponses()
@patch('asyncio.sleep', return_value=None)
def test_call_api_async_conflict(self, mocked, mock_sleep):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docstring in all the functions

Copy link
Contributor

@RushiT0122 RushiT0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes inline.

@prijendev
Copy link
Contributor Author

Requested changes inline.

Addressed all the requested changes

@prijendev prijendev requested a review from RushiT0122 October 28, 2024 11:34
tap_zendesk/http.py Outdated Show resolved Hide resolved
tap_zendesk/http.py Outdated Show resolved Hide resolved
@RushiT0122 RushiT0122 self-requested a review October 30, 2024 08:27
Copy link
Member

@somethingmorerelevant somethingmorerelevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check the comment added

@prijendev prijendev merged commit f75a006 into master Nov 7, 2024
5 checks passed
prijendev added a commit that referenced this pull request Nov 8, 2024
prijendev added a commit that referenced this pull request Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants