-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Source Mixpanel: add page size to configuration to increase sync speed #41976
✨ Source Mixpanel: add page size to configuration to increase sync speed #41976
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
@@ -291,7 +291,7 @@ def next_page_token(self, response, last_records: List[Mapping[str, Any]]) -> Op | |||
if total: | |||
self._total = total | |||
|
|||
if self._total and page_number is not None and self._total > self.page_size * (page_number + 1): | |||
if self._total and page_number is not None and self._total > self._page_size * (page_number + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get the real page size from PageIncrement and not '{{ config.page_size }}'
airbyte-integrations/connectors/source-mixpanel/source_mixpanel/manifest.yaml
Outdated
Show resolved
Hide resolved
@@ -150,7 +153,7 @@ definitions: | |||
type: CustomPaginationStrategy | |||
class_name: "source_mixpanel.components.EngagePaginationStrategy" | |||
start_from_page: 1 | |||
page_size: 1000 | |||
page_size: "{{ config.page_size }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow the configuration of the page size
@@ -72,7 +72,7 @@ def validate_date(name: str, date_str: str, default: pendulum.date) -> pendulum. | |||
|
|||
@adapt_validate_if_testing | |||
def _validate_and_transform(self, config: MutableMapping[str, Any]): | |||
project_timezone, start_date, end_date, attribution_window, select_properties_by_default, region, date_window_size, project_id = ( | |||
project_timezone, start_date, end_date, attribution_window, select_properties_by_default, region, date_window_size, project_id, page_size = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is needed
@@ -119,6 +119,14 @@ | |||
"type": "integer", | |||
"minimum": 1, | |||
"default": 30 | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the page size ton the user defined configuration
@@ -94,37 +94,37 @@ def test_streams_string_date(requests_mock, config_raw): | |||
"config, success, expected_error_message", | |||
( | |||
( | |||
{"credentials": {"api_secret": "secret"}, "project_timezone": "Miami"}, | |||
{"credentials": {"api_secret": "secret"}, "project_timezone": "Miami", "page_size": 1000}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed otherwise the tests fail
@natikgadzhi @ChristoGrab |
bc0c587
to
3823c9b
Compare
airbyte-integrations/connectors/source-mixpanel/source_mixpanel/spec.json
Outdated
Show resolved
Hide resolved
Co-authored-by: Christo Grabowski <108154848+ChristoGrab@users.noreply.github.com>
Running CI Tests to validate changes. |
@ChristoGrab @marcosmarxm Tests seems failing but I cannot see the logs |
|
Thanks, will investigate that |
@marcosmarxm I fixed the issue, could you run the tests again ? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@descampsk Thanks for yet another solid contribution! Regression tests just came back green so we're all good to merge, nicely done 🙌
What
The engage stream retrieves 1000 records per request and has a 60s delay between two requests.
To synchronise 1 Million records, we need 16.6 hours.
If we increase the page size to 40 000, we then need only 25 minutes.
How
This PR only adds the possibility to set the page size to a user defined value to be able to set higher page size to increase the sync speed.
Review guide
See comments.
User Impact
Great performance increase.
Can this PR be safely reverted and rolled back?