Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connector builder: support for test read with message grouping per slices #23925

Merged
merged 81 commits into from
Mar 16, 2023

Conversation

girarda
Copy link
Contributor

@girarda girarda commented Mar 10, 2023

What

  • Implement the test read method

Sample command
python main.py read --config secrets/config.json --catalog secrets/configured_catalog.json

Test read

The test read method will run if the "__command" field is set to "test_read". The source will read data from stream defined in the configured catalog. Only one stream at a time is currently supported.

The test_read command can be configured with the optional "__test_read_config" field in the config
The "__test_read_config" should have the following fields

  • max_pages_per_slice
  • max_slices
  • max_records

sample config for a test read:

{
  "api_key": "",
  "__command": "test_read"
  "__test_read_config": {
    "stream_name": "Accounts",
    "max_pages_per_slice": 5,
    "max_slices": 5,
    "max_records": 10
  },
  "__injected_declarative_manifest": {
}

configured catalog:

{
  "streams": [
    {
      "stream": {
        "name": "Accounts",
        "json_schema": {
          "$schema": "http://json-schema.org/draft-07/schema#",
          "type": "object",
          "properties": {}
        },
        "supported_sync_modes": ["full_refresh"],
        "source_defined_cursor": false
      },
      "sync_mode": "full_refresh",
      "destination_sync_mode": "overwrite"
    }
  ]
}

sample configured catalog for resolving the manifest:

{
  "streams": [
    {
      "stream": {
        "name": "resolve_manifest",
        "json_schema": {
          "$schema": "http://json-schema.org/draft-07/schema#",
          "type": "object",
          "properties": {}
        },
        "supported_sync_modes": ["full_refresh"],
        "source_defined_cursor": false
      },
      "sync_mode": "full_refresh",
      "destination_sync_mode": "overwrite"
    }
  ]
}

How

Interface change

  • Implement the read_stream method, which will be called if the config's command == "test_read"

Test read

Recommended reading order

  1. airbyte-cdk/python/source_declarative_manifest/main.py
  2. airbyte-cdk/python/connector_builder/connector_builder_handler.py
  3. airbyte-cdk/python/connector_builder/message_grouper.py
  4. airbyte-cdk/python/connector_builder/models.py
  5. airbyte-cdk/python/airbyte_cdk/sources/source.py
  6. airbyte-cdk/python/unit_tests/connector_builder/test_message_grouper.py
  7. airbyte-cdk/python/airbyte_cdk/sources/source.py
  8. airbyte-cdk/python/unit_tests/connector_builder/utils.py

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Mar 10, 2023
@@ -0,0 +1,503 @@
#
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def _emitted_at():
return int(datetime.now().timestamp()) * 1000

def _create_configure_catalog(stream_name: str) -> ConfiguredAirbyteCatalog:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating the catalog on the Java side would allow us to keep the same read interface as connectors



def create_source(config: Mapping[str, Any]) -> ManifestDeclarativeSource:
manifest = config.get("__injected_declarative_manifest")
return ManifestDeclarativeSource(manifest)
return ManifestDeclarativeSource(manifest, True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set debug to True so the source returns raw requests and responses

@girarda girarda marked this pull request as ready for review March 14, 2023 19:04
@girarda girarda requested a review from a team as a code owner March 14, 2023 19:04
@girarda girarda requested review from clnoll and maxi297 March 14, 2023 19:04
@girarda girarda changed the title Alex/test read Connector builder: support for test read with message grouping per slices Mar 15, 2023
Copy link
Contributor

@clnoll clnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @girarda! A couple of questions but no blockers.


def read_stream(source: DeclarativeSource, config: Mapping[str, Any], configured_catalog: ConfiguredAirbyteCatalog) -> AirbyteMessage:
try:
if "__test_read_config" not in config:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Feels like this should be optional since we're not requiring any of the keys to be present, and are handling the case where it isn't set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. fixed

class MessageGrouper:
logger = logging.getLogger("airbyte.connector-builder")

def __init__(self, max_pages_per_slice: int, max_slices: int, max_record_limit: int = 1000):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (also I know this may just be copy/pasted): Would it be preferable to enforce the maximums in this class, instead of allowing them to be fully configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main reason why I didn't enforce maximums here is that we'll need to update the value in two places if we decide to increase it. I think the risk of not enforcing a limit here is negligible because we control the caller

@girarda girarda merged commit bb5741a into master Mar 16, 2023
@girarda girarda deleted the alex/test_read branch March 16, 2023 00:12
adriennevermorel pushed a commit to adriennevermorel/airbyte that referenced this pull request Mar 17, 2023
…ices (airbytehq#23925)

* New connector_builder module for handling requests from the Connector Builder.

Also implements `resolve_manifest` handler

* Automated Commit - Formatting Changes

* Rename ConnectorBuilderSource to ConnectorBuilderHandler

* Update source_declarative_manifest README

* Reorganize

* read records

* paste unit tests from connector builder server

* compiles but tests fail

* first test passes

* Second test passes

* 3rd test passes

* one more test

* another test

* one more test

* test

* return StreamRead

* test

* test

* rename

* test

* test

* test

* main seems to work

* Update

* Update

* Update

* Update

* update

* error message

* rename

* update

* Update

* CR improvements

* fix test_source_declarative_manifest

* fix tests

* Update

* Update

* Update

* Update

* rename

* rename

* rename

* format

* Give connector_builder its own main.py

* Update

* reset

* delete dead code

* remove debug print

* update test

* Update

* set right stream

* Add --catalog argument

* Remove unneeded preparse

* Update README

* handle error

* tests pass

* more explicit test

* reset

* format

* fix merge

* raise exception

* fix

* black format

* raise with config

* update

* fix flake

* __test_read_config is optional

* fix

* Automated Commit - Formatting Changes

* fix

* exclude_unset

---------

Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: clnoll <clnoll@users.noreply.github.com>
Co-authored-by: girarda <girarda@users.noreply.github.com>
erohmensing pushed a commit that referenced this pull request Mar 22, 2023
…ices (#23925)

* New connector_builder module for handling requests from the Connector Builder.

Also implements `resolve_manifest` handler

* Automated Commit - Formatting Changes

* Rename ConnectorBuilderSource to ConnectorBuilderHandler

* Update source_declarative_manifest README

* Reorganize

* read records

* paste unit tests from connector builder server

* compiles but tests fail

* first test passes

* Second test passes

* 3rd test passes

* one more test

* another test

* one more test

* test

* return StreamRead

* test

* test

* rename

* test

* test

* test

* main seems to work

* Update

* Update

* Update

* Update

* update

* error message

* rename

* update

* Update

* CR improvements

* fix test_source_declarative_manifest

* fix tests

* Update

* Update

* Update

* Update

* rename

* rename

* rename

* format

* Give connector_builder its own main.py

* Update

* reset

* delete dead code

* remove debug print

* update test

* Update

* set right stream

* Add --catalog argument

* Remove unneeded preparse

* Update README

* handle error

* tests pass

* more explicit test

* reset

* format

* fix merge

* raise exception

* fix

* black format

* raise with config

* update

* fix flake

* __test_read_config is optional

* fix

* Automated Commit - Formatting Changes

* fix

* exclude_unset

---------

Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: clnoll <clnoll@users.noreply.github.com>
Co-authored-by: girarda <girarda@users.noreply.github.com>
erohmensing pushed a commit that referenced this pull request Mar 22, 2023
…ices (#23925)

* New connector_builder module for handling requests from the Connector Builder.

Also implements `resolve_manifest` handler

* Automated Commit - Formatting Changes

* Rename ConnectorBuilderSource to ConnectorBuilderHandler

* Update source_declarative_manifest README

* Reorganize

* read records

* paste unit tests from connector builder server

* compiles but tests fail

* first test passes

* Second test passes

* 3rd test passes

* one more test

* another test

* one more test

* test

* return StreamRead

* test

* test

* rename

* test

* test

* test

* main seems to work

* Update

* Update

* Update

* Update

* update

* error message

* rename

* update

* Update

* CR improvements

* fix test_source_declarative_manifest

* fix tests

* Update

* Update

* Update

* Update

* rename

* rename

* rename

* format

* Give connector_builder its own main.py

* Update

* reset

* delete dead code

* remove debug print

* update test

* Update

* set right stream

* Add --catalog argument

* Remove unneeded preparse

* Update README

* handle error

* tests pass

* more explicit test

* reset

* format

* fix merge

* raise exception

* fix

* black format

* raise with config

* update

* fix flake

* __test_read_config is optional

* fix

* Automated Commit - Formatting Changes

* fix

* exclude_unset

---------

Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: clnoll <clnoll@users.noreply.github.com>
Co-authored-by: girarda <girarda@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants