Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python source cdk does not support incremental sync with a cursor field configured by user #7390

Closed
Tracked by #19239
marc-marketparts opened this issue Oct 26, 2021 · 2 comments
Labels
area/connectors Connector related issues CDK Connector Development Kit community frozen Not being actively worked on Icebox team/connectors-python type/bug Something isn't working

Comments

@marc-marketparts
Copy link

Enviroment

  • Airbyte version: v0.30.16-alpha
  • OS Version / Instance: Ubuntu 20.04.2 LTS on Windows 10 WSL 2
  • Deployment: n/a
  • Source Connector and version: python connector in development (Python source generator)
  • Destination Connector and version: n/a
  • Severity: Medium
  • Step where error happened: Incremental sync with cursor field configured by the end user

Current Behavior

Incremental sync with a cursor field configured by the end user does not work with the current implementation of the python cdk.

There seems to be some flows in cursor field implementation in AbstractSource and Stream classes:

  1. The method Stream::supports_incremental() rely on a source defined cursor_field (self.cursor_field) which is unknown in case of user configuration (and not set at configuration time). So the method will always return False in the latter because self.cursor_field will be left empty.

    @property
    def supports_incremental(self) -> bool:
    """
    :return: True if this stream supports incrementally reading data
    """
    return len(self._wrapped_cursor_field()) > 0
    def _wrapped_cursor_field(self) -> List[str]:
    return [self.cursor_field] if isinstance(self.cursor_field, str) else self.cursor_field

  2. If we override supports_incremental to return True, another issue occurs in the Stream::get_update_state() call. When a user select a cursor field in the UI, the configured cursor field is never passed to this method and therefore not accessible in it.
    When the initial state is empty, we cannot know which cursor field has been chosen, and so cannot generate the correct new state.

    records = stream_instance.read_records(
    sync_mode=SyncMode.incremental,
    stream_slice=slice,
    stream_state=stream_state,
    cursor_field=configured_stream.cursor_field or None,
    )
    for record_data in records:
    record_counter += 1
    yield self._as_airbyte_record(stream_name, record_data)
    stream_state = stream_instance.get_updated_state(stream_state, record_data)

This behavior works for source defined cursor only, as the cursor_field is accessible by referencing the property self.cursor_field defined in the source code of the stream.

Expected Behavior

The implementation of incremental sync must be out of the box, without any need to override python cdk core methods.

Logs

n/a

Steps to Reproduce

  1. Implement a python source which inherits from AbstractSource
  2. Implement an incremental stream with a user defined cursor field

Are you willing to submit a PR?

No

Here are some suggestions to correct this behaviour:

  1. Add a condition on source_defined_cursor in Stream::supports_incremental():
    --> if False, we return True,
    --> if not, we return the result of _wrapped_cursor_field()

def supports_incremental(self) -> bool:
"""
:return: True if this stream supports incrementally reading data
"""
return len(self._wrapped_cursor_field()) > 0

NB: The two latter are not backward compatible.

@marc-marketparts marc-marketparts added the type/bug Something isn't working label Oct 26, 2021
@marcosmarxm
Copy link
Member

@Phlair can you give your opinion here?

@sherifnada sherifnada added area/connectors Connector related issues CDK Connector Development Kit labels Oct 29, 2021
@Phlair
Copy link
Contributor

Phlair commented Nov 1, 2021

This is a good spot. My observation aligns with the Current Behaviour in the issue description. Thanks for creating this issue @marc-marketparts!

@bleonard bleonard added autoteam team/tse Technical Support Engineers labels Apr 26, 2022
@marcosmarxm marcosmarxm added team/extensibility and removed autoteam team/tse Technical Support Engineers labels Jun 14, 2022
@sherifnada sherifnada mentioned this issue Nov 9, 2022
10 tasks
@maxi297 maxi297 added the technical-debt issues to fix code smell label Apr 25, 2023
@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
@girarda girarda added Icebox and removed technical-debt issues to fix code smell labels Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues CDK Connector Development Kit community frozen Not being actively worked on Icebox team/connectors-python type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants