Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ [file based cdk] S3 legacy config adapter #29145

Merged
merged 5 commits into from
Aug 9, 2023

Conversation

brianjlai
Copy link
Contributor

@brianjlai brianjlai commented Aug 7, 2023

Closes #28131

What

Adds a transformer that allows the new S3 connector built with the file-based CDK to accept configs in the legacy format to be processed. It does not persist the change in the DB, just how the connector handles an operation

How

The CDK allows for a source to override the read_config() method which is responsible for reading in the config JSON file. By overriding this, we can still get the JSON from the file, but we also then parse it into the legacy format instance and then construct a new dictionary representing the new format. By doing it this way, the downstream flow can assume it is always using the new config

Recommended reading order

  1. legacy_config_transformer.py
  2. s3_source.py

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.


@ classmethod
def create_globs(cls, path_pattern: str, path_prefix: str) -> List[str]:
if path_prefix:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also add trimming to the prefix/pattern to avoid extra / but I also didn't see any of that in the existing version so I kept it simpler

@brianjlai brianjlai changed the title [file based cdk] S3 legacy config adapter ✨ [file based cdk] S3 legacy config adapter Aug 7, 2023
@brianjlai brianjlai requested review from girarda, maxi297 and clnoll August 7, 2023 17:23
@brianjlai brianjlai marked this pull request as ready for review August 7, 2023 17:23
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit ac88970854) - ❌

⏲️ Total pipeline duration: 22mn01s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

}

if legacy_config.provider.start_date:
transformed_config["start_date"] = legacy_config.provider.start_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add the extra precision? eg convert 2021-01-01T00:00:00Z to 2021-01-01T00:00:00.000000Z

from typing import Mapping, Any, List


class LegacyConfigTransformer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add a comment in the spec that this file needs to be updated if the spec changes? I'm not sure how long we'll need this transformer and there are PRs in the pipeline (eg #17334)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

return transformed_config

@ classmethod
def create_globs(cls, path_pattern: str, path_prefix: str) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we make this private?

Copy link
Contributor

@clnoll clnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @brianjlai!

@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit a6159c1cbe) - ❌

⏲️ Total pipeline duration: 21mn56s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 5ef32c6e2d) - ❌

⏲️ Total pipeline duration: 22mn19s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 391c1ee09d) - ❌

⏲️ Total pipeline duration: 22mn00s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@brianjlai brianjlai merged commit 0543099 into master Aug 9, 2023
@brianjlai brianjlai deleted the brian/s3_file_based_adapter branch August 9, 2023 23:09
harrytou pushed a commit to KYVENetwork/airbyte that referenced this pull request Sep 1, 2023
* s3 adapter

* pr feedback and updates after rebasing master

* add comment

* formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File CDK: S3 config adapter (top-level config options)
5 participants