Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logs warning if deduplication state is large #1877

Merged
merged 5 commits into from
Nov 13, 2024

Conversation

willi-mueller
Copy link
Collaborator

Description

As a first step towards resolving this issue: warns if the deduplication state grows over a large count of hashes of the primary key.

Related Issues

#1131

Copy link

netlify bot commented Sep 26, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit ca3471c
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6733ba5613a0660008dca050

@willi-mueller willi-mueller force-pushed the feat/1131-warn-large-deduplication-state branch from cf80f0d to aeca74a Compare September 26, 2024 08:53
@willi-mueller willi-mueller changed the title logs warning if deduplication state is large logs warning if deduplication state is large (feat/1331) Sep 28, 2024
@willi-mueller willi-mueller changed the title logs warning if deduplication state is large (feat/1331) logs warning if deduplication state is large Sep 28, 2024
@willi-mueller willi-mueller self-assigned this Sep 28, 2024
dlt/extract/incremental/__init__.py Outdated Show resolved Hide resolved
dlt/extract/incremental/__init__.py Outdated Show resolved Hide resolved
tests/extract/test_incremental.py Outdated Show resolved Hide resolved
@burnash burnash force-pushed the feat/1131-warn-large-deduplication-state branch from 4e59fb7 to dd66d42 Compare November 10, 2024 13:18
@burnash burnash requested a review from rudolfix November 10, 2024 13:22
@burnash burnash assigned burnash and unassigned willi-mueller Nov 10, 2024
burnash added a commit that referenced this pull request Nov 10, 2024
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to simplify the code a little

@@ -118,6 +118,8 @@ class Incremental(ItemTransform[TDataItem], BaseConfiguration, Generic[TCursorVa
EMPTY: ClassVar["Incremental[Any]"] = None
placement_affinity: ClassVar[float] = 1 # stick to end

DEFAULT_DUPLICATE_CURSOR_WARNING_THRESHOLD: ClassVar[int] = 200
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental is also a dataclass. you should declare uplicate_cursor_warning_threshold under lag and this is a good place to attach default value so DEFAULT_DUPLICATE_CURSOR_WARNING_THRESHOLD is not longer needed.

tbh. my preference would be to not create a new field at all and just keep DEFAULT_DUPLICATE_CURSOR_WARNING_THRESHOLD if someone want to change that, one can do it for all instances by changing the classvar. which is IMO pretty cool. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed in ca3471c

@burnash burnash force-pushed the feat/1131-warn-large-deduplication-state branch from dd66d42 to ca3471c Compare November 12, 2024 20:28
@rudolfix rudolfix merged commit 592c797 into devel Nov 13, 2024
58 of 59 checks passed
@rudolfix rudolfix deleted the feat/1131-warn-large-deduplication-state branch November 13, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

disable rows deduplication if Incremental is attached to a resource with merge write disposition
3 participants