🐛 CDK: fix bug with limit parameter for incremental stream #5833

avida · 2021-09-03T08:21:39Z

What

Fix #5832

How

Describe the solution

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

Create a non-forked branch based on this PR and test the below items on it
Build is successful
Credentials added to Github CI. Instructions.
/test connector=connectors/<name> command is passing.
New Connector version released on Dockerhub by running the /publish command described here

Updating a connector

Community member or Airbyter

Grant edit access to maintainers (instructions)
Secrets in the connector's spec are annotated with airbyte_secret
Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
Code reviews completed
Documentation updated
- Connector's README.md
- Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
PR name follows PR naming conventions
Connector version bumped like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

Create a non-forked branch based on this PR and test the below items on it
Build is successful
Credentials added to Github CI. Instructions.
/test connector=connectors/<name> command is passing.
New Connector version released on Dockerhub by running the /publish command described here

Connector Generator

Issue acceptance criteria met
PR name follows PR naming conventions
If adding a new generator, add it to the list of scaffold modules being tested
The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
Documentation which references the generator is updated as needed.

keu · 2021-09-03T09:32:54Z

airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py


        record_counter = 0
        stream_name = configured_stream.stream.name
        logger.info(f"Syncing stream: {stream_name} ")
        for record in record_iterator:
            if record.type == MessageType.RECORD:
-                if internal_config.limit and record_counter >= internal_config.limit:


I think we still need this because we might have limit > size(slice)

Got you idea, but don't think its good to put this condition back, check out my proposal (updated PR)

keu · 2021-09-03T09:33:21Z

airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py

@@ -184,17 +182,23 @@ def _read_incremental(
                stream_state = stream_instance.get_updated_state(stream_state, record_data)
                if checkpoint_interval and record_counter % checkpoint_interval == 0:
                    yield self._checkpoint_state(stream_name, stream_state, connector_state, logger)
+                if internal_config.limit and record_counter >= internal_config.limit:


so we don't need it here because of comment above

still need it, check out updated PR

keu

seem comments

keu · 2021-09-03T09:43:58Z

airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py

                yield self._as_airbyte_record(configured_stream.stream.name, record)
+                if internal_config.limit and count + 1 >= internal_config.limit:


how about this one?

keu · 2021-09-03T09:45:22Z

airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py

            records = stream_instance.read_records(
                sync_mode=SyncMode.incremental,
                stream_slice=slice,
                stream_state=stream_state,
                cursor_field=configured_stream.cursor_field or None,
            )
-            for record_data in records:
-                record_counter += 1
+            for record_counter, record_data in enumerate(records):


you know you can do start=1, right?

No, but now yes :)

keu · 2021-09-03T12:17:38Z

airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py

            records = stream_instance.read_records(
                sync_mode=SyncMode.incremental,
                stream_slice=slice,
                stream_state=stream_state,
                cursor_field=configured_stream.cursor_field or None,
            )
-            for record_data in records:
-                record_counter += 1
+            for record_counter, record_data in enumerate(records, 1):


Suggested change

for record_counter, record_data in enumerate(records, 1):

for record_counter, record_data in enumerate(records, start=1):

sherifnada · 2021-09-09T01:36:35Z

airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py

                yield self._as_airbyte_record(stream_name, record_data)
                stream_state = stream_instance.get_updated_state(stream_state, record_data)
                if checkpoint_interval and record_counter % checkpoint_interval == 0:
                    yield self._checkpoint_state(stream_name, stream_state, connector_state, logger)
+                if internal_config.limit:


it feels like a code smell to duplicate this logic twice. Can't we put it in the calling method?

keu · 2021-09-09T12:21:07Z

airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py

                yield self._as_airbyte_record(stream_name, record_data)
                stream_state = stream_instance.get_updated_state(stream_state, record_data)
                if checkpoint_interval and record_counter % checkpoint_interval == 0:
                    yield self._checkpoint_state(stream_name, stream_state, connector_state, logger)

+                total_records_counter += 1
+                if self._limit_reached(internal_config, total_records_counter):
+                    break


so we still going to read all slices, right?

keu · 2021-09-09T12:22:01Z

airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py

        for slice in slices:
            records = stream_instance.read_records(
                stream_slice=slice, sync_mode=SyncMode.full_refresh, cursor_field=configured_stream.cursor_field
            )
            for record in records:
                yield self._as_airbyte_record(configured_stream.stream.name, record)
+                total_records_counter += 1
+                if self._limit_reached(internal_config, total_records_counter):
+                    break


keu

see comments

…-fix-limit-for-incremental

avida · 2021-09-09T16:47:17Z

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/1218176839
✅ https://github.com/airbytehq/airbyte/actions/runs/1218176839

Addressed issues

avida requested review from htrueman, Zirochkaa and yevhenii-ldv September 3, 2021 08:21

github-actions bot added the CDK Connector Development Kit label Sep 3, 2021

CDK: fix bug with limit parameter for incremental stream

b6578ae

avida force-pushed the drezchykov/5832-cdk-fix-limit-for-incremental branch from 08dfd98 to b6578ae Compare September 3, 2021 08:32

keu reviewed Sep 3, 2021

View reviewed changes

keu suggested changes Sep 3, 2021

View reviewed changes

Fix reviewer comment

a04fcd8

keu reviewed Sep 3, 2021

View reviewed changes

Dmytro Rezchykov added 2 commits September 3, 2021 12:50

Fix reviewer comment

16d1233

Fix reviewer comment

bc9b01d

avida requested a review from keu September 3, 2021 09:54

keu reviewed Sep 3, 2021

View reviewed changes

Fix review comments

50c00b0

avida requested review from sherifnada and keu September 3, 2021 14:38

sherifnada suggested changes Sep 9, 2021

View reviewed changes

Fix review comments

21097c3

avida requested a review from sherifnada September 9, 2021 09:34

keu reviewed Sep 9, 2021

View reviewed changes

keu previously requested changes Sep 9, 2021

View reviewed changes

fix review comments

18f4830

avida requested a review from keu September 9, 2021 14:54

fix review comment

4aa578c

sherifnada approved these changes Sep 9, 2021

View reviewed changes

bumped version

1713689

Merge remote-tracking branch 'origin/master' into drezchykov/5832-cdk…

c7a3867

…-fix-limit-for-incremental

avida merged commit 6041f3d into master Sep 9, 2021

avida deleted the drezchykov/5832-cdk-fix-limit-for-incremental branch September 9, 2021 16:55

jrhizor mentioned this pull request Sep 16, 2021

Bump Airbyte version from 0.29.17-alpha to 0.29.18-alpha #6125

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 CDK: fix bug with limit parameter for incremental stream #5833

🐛 CDK: fix bug with limit parameter for incremental stream #5833

avida commented Sep 3, 2021

keu Sep 3, 2021

avida Sep 3, 2021

keu Sep 3, 2021

avida Sep 3, 2021

keu left a comment

keu Sep 3, 2021

avida Sep 3, 2021

keu Sep 3, 2021

avida Sep 3, 2021

keu Sep 3, 2021

avida Sep 3, 2021

sherifnada Sep 9, 2021

avida Sep 9, 2021

keu Sep 9, 2021 •

edited

Loading

avida Sep 9, 2021

keu Sep 9, 2021

avida Sep 9, 2021

keu left a comment

avida commented Sep 9, 2021 •

edited by github-actions bot

Loading

		yield self._as_airbyte_record(configured_stream.stream.name, record)
		if internal_config.limit and count + 1 >= internal_config.limit:

	for record_counter, record_data in enumerate(records, 1):
	for record_counter, record_data in enumerate(records, start=1):

🐛 CDK: fix bug with limit parameter for incremental stream #5833

🐛 CDK: fix bug with limit parameter for incremental stream #5833

Conversation

avida commented Sep 3, 2021

What

How

Recommended reading order

Pre-merge Checklist

Community member or Airbyter

Airbyter

Community member or Airbyter

Airbyter

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keu Sep 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keu left a comment

Choose a reason for hiding this comment

avida commented Sep 9, 2021 • edited by github-actions bot Loading

keu Sep 9, 2021 •

edited

Loading

avida commented Sep 9, 2021 •

edited by github-actions bot

Loading