Skip to content

Replication key query produces duplicate values across runs #557

Closed
@cnatsis

Description

@cnatsis

When running a flow in INCREMENTAL mode with a DATE column as replication-key, the subsequent runs produce duplicate value, since the comparison is performed using >= operators.

https://github.com/MeltanoLabs/tap-postgres/blob/main/tap_postgres/client.py#L242

The correct operator to use is >, regardless the replication key type.

Example steps

State column: key_col

  1. First run,

No state existing, full table load, output data is:

col_a col_b key_col
a1 b1 2024-01-01
a2 b2 2024-01-02
  1. Insert new row in db

| col_a | col_b | key_col |
| a3 | b3 | 2024-12-01 |

  1. Second run

State value: 2024-01-02
⚠ Duplicate row a2 in output

| col_a | col_b | key_col |
| a2 | b2 | 2024-01-02 |
| a3 | b3 | 2024-12-01 |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions