Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Source Mixpanel: add page size to configuration to increase sync speed #41976

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ data:
connectorSubtype: api
connectorType: source
definitionId: 12928b32-bf0a-4f1e-964f-07e12e37153a
dockerImageTag: 3.2.4
dockerImageTag: 3.3.0
dockerRepository: airbyte/source-mixpanel
documentationUrl: https://docs.airbyte.com/integrations/sources/mixpanel
githubIssueLabel: source-mixpanel
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry]
version = "3.2.4"
version = "3.3.0"
name = "source-mixpanel"
description = "Source implementation for Mixpanel."
authors = ["Airbyte <contact@airbyte.io>"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ def next_page_token(self, response, last_records: List[Mapping[str, Any]]) -> Op
if total:
self._total = total

if self._total and page_number is not None and self._total > self.page_size * (page_number + 1):
if self._total and page_number is not None and self._total > self._page_size * (page_number + 1):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the real page size from PageIncrement and not '{{ config.page_size }}'

return {"session_id": decoded_response.get("session_id"), "page": page_number + 1}
else:
self._total = None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ definitions:
- error_message_contains: "Query rate limit exceeded"
action: RETRY
error_message: Query rate limit exceeded.
- error_message_contains: "unknown error"
ChristoGrab marked this conversation as resolved.
Show resolved Hide resolved
action: RETRY
error_message: An unknown error occurred
- error_message_contains: "to_date cannot be later than today"
action: FAIL
error_message: Your project timezone must be misconfigured. Please set it to the one defined in your Mixpanel project settings.
Expand Down Expand Up @@ -150,7 +153,7 @@ definitions:
type: CustomPaginationStrategy
class_name: "source_mixpanel.components.EngagePaginationStrategy"
start_from_page: 1
page_size: 1000
page_size: "{{ config.page_size }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow the configuration of the page size

page_token_option:
type: RequestOption
inject_into: request_parameter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,17 @@ def validate_date(name: str, date_str: str, default: pendulum.date) -> pendulum.

@adapt_validate_if_testing
def _validate_and_transform(self, config: MutableMapping[str, Any]):
project_timezone, start_date, end_date, attribution_window, select_properties_by_default, region, date_window_size, project_id = (
(
project_timezone,
start_date,
end_date,
attribution_window,
select_properties_by_default,
region,
date_window_size,
project_id,
page_size,
) = (
config.get("project_timezone", "US/Pacific"),
config.get("start_date"),
config.get("end_date"),
Expand All @@ -81,6 +91,7 @@ def _validate_and_transform(self, config: MutableMapping[str, Any]):
config.get("region", "US"),
config.get("date_window_size", 30),
config.get("credentials", dict()).get("project_id"),
config.get("page_size", 1000),
)
try:
project_timezone = pendulum.timezone(project_timezone)
Expand Down Expand Up @@ -113,5 +124,6 @@ def _validate_and_transform(self, config: MutableMapping[str, Any]):
config["region"] = region
config["date_window_size"] = date_window_size
config["project_id"] = project_id
config["page_size"] = page_size

return config
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,14 @@
"type": "integer",
"minimum": 1,
"default": 30
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the page size ton the user defined configuration

"page_size": {
"order": 9,
"title": "Page Size",
"description": "The number of records to fetch per request. Default is 1000.",
descampsk marked this conversation as resolved.
Show resolved Hide resolved
"type": "integer",
"minimum": 1,
"default": 1000
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def config(start_date):
"start_date": start_date,
"end_date": start_date.add(days=31),
"region": "US",
"page_size": 1000,
}


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,37 +94,37 @@ def test_streams_string_date(requests_mock, config_raw):
"config, success, expected_error_message",
(
(
{"credentials": {"api_secret": "secret"}, "project_timezone": "Miami"},
{"credentials": {"api_secret": "secret"}, "project_timezone": "Miami", "page_size": 1000},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed otherwise the tests fail

False,
"Could not parse time zone: Miami, please enter a valid timezone.",
),
(
{"credentials": {"api_secret": "secret"}, "start_date": "20 Jan 2021"},
{"credentials": {"api_secret": "secret"}, "start_date": "20 Jan 2021", "page_size": 1000},
False,
"Could not parse start date: 20 Jan 2021. Please enter a valid start date.",
),
(
{"credentials": {"api_secret": "secret"}, "end_date": "20 Jan 2021"},
{"credentials": {"api_secret": "secret"}, "end_date": "20 Jan 2021", "page_size": 1000},
False,
"Could not parse end date: 20 Jan 2021. Please enter a valid end date.",
),
(
{"credentials": {"api_secret": "secret"}, "attribution_window": "20 days"},
{"credentials": {"api_secret": "secret"}, "attribution_window": "20 days", "page_size": 1000},
False,
"Please provide a valid integer for the `Attribution window` parameter.",
),
(
{"credentials": {"api_secret": "secret"}, "select_properties_by_default": "Yes"},
{"credentials": {"api_secret": "secret"}, "select_properties_by_default": "Yes", "page_size": 1000},
False,
"Please provide a valid True/False value for the `Select properties by default` parameter.",
),
({"credentials": {"api_secret": "secret"}, "region": "UK"}, False, "Region must be either EU or US."),
({"credentials": {"api_secret": "secret"}, "region": "UK", "page_size": 1000}, False, "Region must be either EU or US."),
(
{"credentials": {"username": "user", "secret": "secret"}},
{"credentials": {"username": "user", "secret": "secret"}, "page_size": 1000},
False,
"Required parameter 'project_id' missing or malformed. Please provide a valid project ID.",
),
({"credentials": {"api_secret": "secret"}, "region": "EU", "start_date": "2021-02-01T00:00:00Z"}, True, None),
({"credentials": {"api_secret": "secret"}, "region": "EU", "start_date": "2021-02-01T00:00:00Z", "page_size": 1000}, True, None),
(
{
"credentials": {"username": "user", "secret": "secret", "project_id": 2397709},
Expand All @@ -135,6 +135,7 @@ def test_streams_string_date(requests_mock, config_raw):
"select_properties_by_default": True,
"region": "EU",
"date_window_size": 10,
"page_size": 1000
},
True,
None,
Expand Down
1 change: 1 addition & 0 deletions docs/integrations/sources/mixpanel.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ Syncing huge date windows may take longer due to Mixpanel's low API rate-limits

| Version | Date | Pull Request | Subject |
|:--------|:-----------|:---------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 3.3.0 | 2024-07-15 | [41754](https://github.com/airbytehq/airbyte/pull/41754) | Add engage page size to configuration |
| 3.2.4 | 2024-07-13 | [41754](https://github.com/airbytehq/airbyte/pull/41754) | Update dependencies |
| 3.2.3 | 2024-07-10 | [41420](https://github.com/airbytehq/airbyte/pull/41420) | Update dependencies |
| 3.2.2 | 2024-07-09 | [41289](https://github.com/airbytehq/airbyte/pull/41289) | Update dependencies |
Expand Down
Loading