Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source s3 cursor with history adapter #29028

Merged
merged 1 commit into from
Aug 11, 2023

Conversation

clnoll
Copy link
Contributor

@clnoll clnoll commented Aug 3, 2023

Creates a Cursor object for source S3 that adapts state message in the old format to the new format.

Depends on #29027.

@clnoll clnoll requested a review from a team as a code owner August 3, 2023 03:08
@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit connectors/source/s3 labels Aug 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@clnoll clnoll requested review from girarda, maxi297 and brianjlai August 3, 2023 03:08
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit e462779b5e) - ❌

⏲️ Total pipeline duration: 17mn43s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@clnoll clnoll force-pushed the source-s3-cursor-with-history-adapter branch from e462779 to 5dd8d06 Compare August 5, 2023 19:53
@octavia-squidington-iii octavia-squidington-iii removed the CDK Connector Development Kit label Aug 5, 2023
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 5dd8d066f5) - ❌

⏲️ Total pipeline duration: 21mn53s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit ebe3cd975a) - ❌

⏲️ Total pipeline duration: 01mn39s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@clnoll clnoll force-pushed the source-s3-cursor-with-history-adapter branch from ebe3cd9 to 7a6a6b9 Compare August 6, 2023 02:53
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Aug 6, 2023
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 7a6a6b973d) - ❌

⏲️ Total pipeline duration: 20mn35s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@clnoll clnoll force-pushed the source-s3-cursor-with-history-adapter branch from efd77d9 to 10ce5fe Compare August 6, 2023 10:05
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 10ce5fe70d) - ❌

⏲️ Total pipeline duration: 21mn24s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@clnoll clnoll force-pushed the source-s3-cursor-with-history-adapter branch from 10ce5fe to 0a23e96 Compare August 6, 2023 20:41
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 0a23e96e08) - ❌

⏲️ Total pipeline duration: 21mn32s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@clnoll clnoll force-pushed the source-s3-cursor-with-history-adapter branch from 0a23e96 to fb0cf97 Compare August 6, 2023 23:45
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit fb0cf97942) - ✅

⏲️ Total pipeline duration: 22mn09s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like test coverage does not guarantee the behavior we would like. Can we test public methods instead?

return True
else:
return False
if cursor := stream_state.get(self.cursor_field):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an edge case but if the cursor_field change, the state will be considered as V3. Is that ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call-out, and I'm okay with that since it's an edge case, and we will hopefully be deprecating v3 before too long.

],
)
def test_convert_dates(input_history: MutableMapping[str, Any], expected_output: MutableMapping[str, Any]) -> None:
assert Cursor._convert_legacy_history(input_history) == expected_output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this test does not guarantee that the public interface is working as expected i.e. set_initial_state could do something completely unexpected and we won't know with this coverage. Should we test set_initial_state instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good call, updated tests to use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw this change caught a bug. Updated code in airbyte-integrations/connectors/source-s3/source_s3/v4/cursor.py if you want to take another look.

timestamp_millis = stream_state[self.cursor_field].split("_")[0]
converted_state[self.cursor_field] = self._get_ts_from_millis_ts(timestamp_millis, "%Y-%m-%dT%H:%M:%SZ")
if "history" in stream_state:
converted_state["history"] = converted_history
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can always set converted_state["history"] = converted_history. It'll just be an empty object if the stream_state doesn't have a history field

def get_source(args: List[str]):
catalog_path = AirbyteEntrypoint.extract_catalog(args)
try:
return FileBasedSource(SourceS3StreamReader(), Config, catalog_path, cursor_cls=Cursor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on why or what scenario we need this additional try/catch a opposed to just initializing it the way we had been doing before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes so I added this in order to make the CATs pass. When check errors, they expect the error to be output as an AirbyteMessage. But if we initialize FileBasedSource outside of the try that won't happen.

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Aug 8, 2023
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 0127a8f6db) - ✅

⏲️ Total pipeline duration: 22mn12s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@clnoll clnoll requested review from girarda, brianjlai and maxi297 August 9, 2023 21:48
Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🚢 🚢

@clnoll clnoll force-pushed the source-s3-cursor-with-history-adapter branch from 0127a8f to 78dcbe8 Compare August 9, 2023 22:18
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 78dcbe85eb) - ❌

⏲️ Total pipeline duration: 24mn22s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just small nits, nothing major so looks good!

if not timestamp:
return timestamp
try:
timestamp_millis = datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S.%fZ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: but since this is used in mroe than one place in the file (also on line 460), can we make this a constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done.


for filename in filenames:
if filename in converted_history:
if date_obj > datetime.strptime(converted_history[filename], "%Y-%m-%dT%H:%M:%S.%fZ"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here let's use a constant since its used 3 times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done.

@clnoll clnoll force-pushed the source-s3-cursor-with-history-adapter branch from 78dcbe8 to c14d044 Compare August 10, 2023 23:21
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit c14d044e66) - ✅

⏲️ Total pipeline duration: 24mn28s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@clnoll clnoll force-pushed the source-s3-cursor-with-history-adapter branch from c14d044 to 5e32482 Compare August 11, 2023 00:05
@octavia-squidington-iii
Copy link
Collaborator

source-s3 test report (commit 5e32482768) - ✅

⏲️ Total pipeline duration: 24mn34s

Step Result
Validate airbyte-integrations/connectors/source-s3/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-s3 docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-s3 test

@clnoll clnoll merged commit 6946052 into master Aug 11, 2023
@clnoll clnoll deleted the source-s3-cursor-with-history-adapter branch August 11, 2023 15:38
harrytou pushed a commit to KYVENetwork/airbyte that referenced this pull request Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation CDK Connector Development Kit connectors/source/s3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants