Source Amazon Ads: Incremental Deduped + History creates duplicates #18905
Labels
connectors/source/amazon-ads
lang/python
needs-triage
team/connectors-python
type/bug
Something isn't working
zendesk
Environment
Current Behavior
Escalated from this discourse thread:
https://discuss.airbyte.io/t/source-amazon-ads-incremental-deduped-history-sync-duplication/2860
For the Steam named “sponsored_products_report_stream”, I have set the sync mode to “Incremental Deduped + history” with a daily sync schedule, however, taking a look at the output shows duplication occurring in the _airbyte_raw destination table. The _airbyte_data field within each record of the _airbyte_raw table has the following data structure:
I have set up a materialized view within BigQuery to normalize this object and the metric property into a single table with individual columns for each field. The query used for this normalization has been attached:
normalization.txt (6.4 KB)
Querying this materialized view for a specific record type on a specific date (e.g. report_date = “2022-10-09” AND record_type = “campaigns”) provides two rows for each “campaign”, both being synced on a different date.
By my understanding, the “Incremental Deduped + history” sync mode should update the original records and then update the “updatedAt” field within _airbyte_data, however, it just seems to add duplicated records on the next sync without touching the old records. This duplication occurs repeatedly, i.e. for today’s date, i see 1 record (correct, since only one sync has occurred for today’s date), for yesterday there are 2 records (2 syncs), for 2 days ago there are 3 records (3 syncs), etc, etc…
I have only tested this on “sponsored_products_report_stream”, however, I imagine the same is occurring across all report streams for this source since they all follow the same data structure.
Expected Behavior
No duplicate records are synced.
Logs
Logs
Steps to Reproduce
The text was updated successfully, but these errors were encountered: