Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Amazon Ads: Incremental Deduped + History creates duplicates #18905

Closed
natalyjazzviolin opened this issue Nov 3, 2022 · 3 comments
Closed

Comments

@natalyjazzviolin
Copy link
Contributor

Environment

  • Airbyte version: 0.40.9
  • OS Version / Instance: Debian 10 Buster
  • Deployment: Docker
  • Source Connector and version: Amazon Ads 0.1.22
  • Destination Connector and version: BigQuery 1.2.4
  • Step where error happened: During/After Sync.

Current Behavior

Escalated from this discourse thread:
https://discuss.airbyte.io/t/source-amazon-ads-incremental-deduped-history-sync-duplication/2860

For the Steam named “sponsored_products_report_stream”, I have set the sync mode to “Incremental Deduped + history” with a daily sync schedule, however, taking a look at the output shows duplication occurring in the _airbyte_raw destination table. The _airbyte_data field within each record of the _airbyte_raw table has the following data structure:

{
    "reportDate": STRING,
    "profileId": NUMBER,
    "recordType": STRING,
    "updatedAt": STRING,
    "metric": OBJECT
}

I have set up a materialized view within BigQuery to normalize this object and the metric property into a single table with individual columns for each field. The query used for this normalization has been attached:
normalization.txt (6.4 KB)

Querying this materialized view for a specific record type on a specific date (e.g. report_date = “2022-10-09” AND record_type = “campaigns”) provides two rows for each “campaign”, both being synced on a different date.

image

By my understanding, the “Incremental Deduped + history” sync mode should update the original records and then update the “updatedAt” field within _airbyte_data, however, it just seems to add duplicated records on the next sync without touching the old records. This duplication occurs repeatedly, i.e. for today’s date, i see 1 record (correct, since only one sync has occurred for today’s date), for yesterday there are 2 records (2 syncs), for 2 days ago there are 3 records (3 syncs), etc, etc…

I have only tested this on “sponsored_products_report_stream”, however, I imagine the same is occurring across all report streams for this source since they all follow the same data structure.

Expected Behavior

No duplicate records are synced.

Logs

Logs

Steps to Reproduce

  1. Set up amazon ads source
  2. Sync
  3. See duplicate records
@marcosmarxm
Copy link
Member

Zendesk ticket #2672 has been linked to this issue.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Nataly Merezhuk on 2022-11-03 at 12:29:

Hi! I apologize for the delay - we had a lot of inquiries for Hacktoberfest, but now should be getting back to the normal rhythm of things! I've made a GitHub issue for your ticket here:
#18905

It's been triaged to the correct team and I'll inquire to see who can follow up on it!

@roman-yermilov-gl
Copy link
Contributor

Platform version and Source/Destination versions are outdated so this issue may not be actual. It seems it related to primary key problem which was also solved in this PR: #21677

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants