Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Mailchimp: email activity stream missing data #14673

Closed
marcosmarxm opened this issue Jul 13, 2022 · 25 comments
Closed

Source Mailchimp: email activity stream missing data #14673

marcosmarxm opened this issue Jul 13, 2022 · 25 comments

Comments

@marcosmarxm
Copy link
Member

This Github issue is synchronized with Zendesk:

Ticket ID: #1553
Priority: normal
Group: User Success Engineer
Assignee: Nataly Merezhuk

Original ticket description:

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: you can use something like 4Gb / 1 Tb
  • Deployment: Kubernetes
  • Airbyte Version: 0.39.25
  • Source name/version: Mailchimp
  • Destination name/version: Json destination
  • Step: Run the Mailchimp source with the Json destination with more than 700 email events
  • Description:
    When I run the Mailchimp source to the JSON destination and compare it with what’s returned from the Mailchimp API’s, the output JSON is missing a lot of data. I spot-checked one campaign and found 61 events in the JSON output and the Mailchimp email_activity endpoint returns around 707.

[Discourse post]

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Nataly Merezhuk on 2022-07-12 at 22:36:

Hello, @murph! Could you please show me the Airbyte/server logs so I can see if you are getting an errors?

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2022-07-13 at 00:44:

We aren’t getting any errors but we were able to debug this a bit. It looks like every time you paginate the email activity endpoint you’re also incrementing the since param: airbyte/streams.py at bfa54aca50115770530ca6fdff24d4125541d23b · airbytehq/airbyte · GitHub. Via the cursor_field: airbyte/streams.py at bfa54aca50115770530ca6fdff24d4125541d23b · airbytehq/airbyte · GitHub which is the timestamp of the newest record: airbyte/streams.py at bfa54aca50115770530ca6fdff24d4125541d23b · airbytehq/airbyte · GitHub

That means that when we do an incremental sync we lose a lot of records. I don’t think this is the intended behavior?

[Discourse post]

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2022-07-13 at 00:57:

Looks like you cannot sort what’s returned from the email activity endpoint so this kind of checkpointing wont work https://mailchimp.com/developer/marketing/api/email-activity-reports/list-email-activity/

[Discourse post]

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Nataly Merezhuk on 2022-07-13 at 13:56:

Thanks for digging into this - you are right, this is definitely not the intended behavior. I've opened an issue on Github, I or another team member will start work on this soon!

@murphpdx
Copy link

Hi, thanks for filing this issue for us. I just wanted to check in to see if we know when this issue will be prioritized?

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2022-07-19 at 20:23:

Thank you for creating that issue! I just wanted to check in to see when the issue will be prioritized?

[Discourse post]

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Nataly Merezhuk on 2022-07-21 at 10:46:

@murph sorry for the wait, we have a few team members out this week. I asked one of my colleagues to set aside some time for this issue, so you'll be hearing something soon!

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2022-08-25 at 05:13:

Hi just checking in on this, has there been any movement?

[Discourse post]

@natalyjazzviolin natalyjazzviolin removed team/tse Technical Support Engineers autoteam labels Aug 29, 2022
@natalyjazzviolin natalyjazzviolin changed the title Missing Mailchimp email activity data Source-Mailchimp: email activity stream missing data Aug 29, 2022
@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Nataly Merezhuk on 2022-09-01 at 21:14:

Hi, Amanda! Thank you for your patience. No movement on this yet but I have a few debugging ideas.

Could you possibly update Airbyte to the latest version and try the sync once more? I have tried to replicate the issue on my end, but from what I can see the connector is working correctly: all records emitted by Mailchimp are being committed to JSON. 

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2022-09-13 at 19:20:

Did you use an incremental sync?

[Discourse post]

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2022-09-13 at 19:21:

I’m not sure why you need to debug more. If you look at the linked code it shows that you’re treating it like the data is sorted but the API is not sorted. You’re also paginating in multiple ways at the same time.

[Discourse post]

@marcosmarxm marcosmarxm changed the title Source-Mailchimp: email activity stream missing data Source Mailchimp: email activity stream missing data Nov 30, 2022
@davydov-d
Copy link
Collaborator

hey @marcosmarxm could you please verify with affected users if it is still the issue after upgrading the connector to the latest version?

@davydov-d
Copy link
Collaborator

The problem described in https://discuss.airbyte.io/t/missing-mailchimp-email-activity-data/1830 must have been fixed in #20765

@davydov-d davydov-d self-assigned this Feb 21, 2023
@murphpdx
Copy link

murphpdx commented Feb 22, 2023

The problem described in https://discuss.airbyte.io/t/missing-mailchimp-email-activity-data/1830 must have been fixed in #20765

I'm not sure why this was closed. It looks like you still have the bug. It seems like you're still setting the since field to the timestamp: https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mailchimp/source_mailchimp/streams.py#L154

It looks like cursor_field is set to timestamp. I know you have a sort_field of create_time but Mailchimp does not allow you to change the sorting. That means that you're going to lose a lot of records. The timestamp should stay the same the offset is what should be used to paginate. You will set the offset too offset = offset + pagesize; The since param should not change.
https://mailchimp.com/developer/marketing/docs/methods-parameters/#pagination
As you can see from the list-email-activity docs, there is no sort_field. I believe you're sort variable is just getting ignored.
https://mailchimp.com/developer/marketing/api/email-activity-reports/list-email-activity/

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:11:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:12:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:13:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:14:

Closed due to no response from requester.

1 similar comment
@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:14:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:15:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:16:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:20:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:21:

Closed due to no response from requester.

1 similar comment
@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:21:

Closed due to no response from requester.

@marcosmarxm
Copy link
Member Author

Comment made from Zendesk by Marcos Marx on 2023-04-03 at 23:23:

Closed due to no response from requester.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants