Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental table hints and incremental in resource decorator #2033

Merged
merged 17 commits into from
Nov 30, 2024

Conversation

steinitzu
Copy link
Collaborator

  • Extract incremental settings to a dict in table schema
  • Support passing incremental settings to @resource decorator

Description

  • Add incremental hint to table schema, containing all incremental settings used in extract
  • Accept incremental in @resource decorator. Either as dict (IncrementalArgs typeddict or incremental instance)
  • Decorator incremental is injected to resource func if argument is in signature
  • Explicit argument overrides decorator incremental

Related Issues

Additional Context

@steinitzu steinitzu requested a review from rudolfix November 7, 2024 05:02
Copy link

netlify bot commented Nov 7, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 3a8e993
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/67489fd4dac1410009499c18

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steinitzu pls take a look at:

  • IMO it is way better if we set Incremental on the column level in Schema and ignore all non-fitting cases
  • I think the code that injects Incremental should use existing mechanism (apply_hints) OR I do not understand some edge cases you had

dlt/extract/extract.py Outdated Show resolved Hide resolved
dlt/extract/hints.py Show resolved Hide resolved
dlt/extract/incremental/__init__.py Outdated Show resolved Hide resolved
@@ -102,6 +103,7 @@ def __init__(
section: str = None,
args_bound: bool = False,
SPEC: Type[BaseConfiguration] = None,
incremental: Optional[TIncrementalConfig] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please pass it in hint. Resource hints may hold incremental and you can extend make_hints to set it up:

class TResourceHints(TResourceHintsBase, total=False):
    name: TTableHintTemplate[str]
    # description: TTableHintTemplate[str]
    # table_sealed: Optional[bool]
    columns: TTableHintTemplate[TTableSchemaColumns]
    incremental: Incremental[Any]

@@ -642,7 +648,9 @@ def make_resource(_name: str, _section: str, _data: Any) -> TDltResourceImpl:
selected,
cast(DltResource, data_from),
True,
incremental=incremental,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should extend make_hints to accept incremental config. then we'll be able to use existing mechanism that injects incremental into functions

dlt/extract/incremental/__init__.py Show resolved Hide resolved
dlt/extract/resource.py Outdated Show resolved Hide resolved
dlt/extract/resource.py Outdated Show resolved Hide resolved
dlt/extract/resource.py Outdated Show resolved Hide resolved
dlt/extract/resource.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. let's save just incremental: True into the schema. see my comment why
  2. I added a test that makes sure we can add incremental during extraction (but limited...)
  3. we should also add "primary_key" to IncrementalArgs. mind that empty key ([] or ()) is also a valid hint. (this key is used for boundary deduplication)

tests:
Do we detect nested incremental columns ie $.item.id will map into "item__id"? if you allow for that then please test it

@@ -86,6 +86,7 @@ def make_hints(
table_format: TTableHintTemplate[TTableFormat] = None,
file_format: TTableHintTemplate[TFileFormat] = None,
references: TTableHintTemplate[TTableReferenceParam] = None,
incremental: Incremental[Any] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we allow TIncrementalConfig here as well?

@@ -221,7 +228,7 @@ def apply_hints(
columns: TTableHintTemplate[TAnySchemaColumns] = None,
primary_key: TTableHintTemplate[TColumnNames] = None,
merge_key: TTableHintTemplate[TColumnNames] = None,
incremental: Incremental[Any] = None,
incremental: Union[Incremental[Any], IncrementalArgs] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is TIncrementalConfig

E.g. the jsonpath `$.foo` or `foo` would return `foo`.
The jsonpath `$.foo.items[*]` would return None.
"""
path = compile_path(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @burnash added very similar function. let's unify our code and make sure this is tested

def _merge_incremental_column_hint(dict_: Dict[str, Any]) -> None:
incremental = dict_.pop("incremental")
if isinstance(incremental, Incremental):
incremental_hint = incremental.to_table_hint()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ticket asked to put the full incremental info into table now when I think of it - it will be confusing. it describes the behavior of the data source, not the data in the actual table. my take is to just set incremental to True if it is present on a column,

it is sufficient for what we plan upstream. we could add row order later if we really need it (using x- annotation)

I do not thing this impacts this PR too much

@@ -166,6 +178,17 @@ def __init__(
self._bound_pipe: SupportsPipe = None
"""Bound pipe"""

def to_table_hint(self) -> Optional[IncrementalArgs]:
"""Table hint is only returned when all properties are serializable"""
if self.last_value_func not in (min, max):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we we do not need to care. we just return True if we can localize incremental to a column

@steinitzu steinitzu force-pushed the incremental-table-hints branch from 8b45dfb to 8a0c020 Compare November 28, 2024 02:07
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudolfix rudolfix merged commit 61c2ed9 into devel Nov 30, 2024
58 of 59 checks passed
@rudolfix rudolfix deleted the incremental-table-hints branch November 30, 2024 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add incremental value info to table computed in the DltResource
2 participants