Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination snowflake 1s1t: it works now #29174

Merged
merged 28 commits into from
Aug 9, 2023
Merged

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Aug 7, 2023

#29188 needs to merge first, to fix the incrementalDedupDefaultNamespace test. All other tests are passing.

T+D expected records have non-UTC timezone in airbyte_extracted_at because of #29199. This also happens with old, non-1s1t syncs.

This PR is a grabbag of bugfixes:

  • snowflake schema change detection is now implemented correctly (and has tests in BaseSqlGeneratorIntegrationTest)
  • Added some more CDC tests + fixed a bug in snowflake's deleted_at null check
  • GeneralStagingFunctions correctly finds the relevant StreamConfig
  • check returns failure for 1s1t + gcs/s3 staging
  • in 1s1t mode, wrap a bunch of raw table stuff in quotes to force lowercase table/stage/schema names
  • Properly implement AbstractSnowflakeTypingDedupingTest#dumpFinalTable
  • Fix timestamp formats in snowflake expected record files
  • add support for +/-08 and +/-0800 timezone offsets in snowflake
  • add support for raw schema overrides (along with GSM secret for the new test)

I ran a faker -> snowflake sync locally. Try out select * from "edgao_1s1t"."users" limit 10; and select * from "edgao_default_raw_schema"."edgao_1s1t_raw__stream_users" limit 10; on our test instance!

@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Aug 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Aug 7, 2023
@edgao edgao marked this pull request as ready for review August 7, 2023 22:33
@edgao edgao requested review from a team as code owners August 7, 2023 22:33
@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

Comment on lines +389 to +399
"use_1s1t_format": {
"type": "boolean",
"description": "(Beta) Use <a href=\"https://github.com/airbytehq/airbyte/issues/26028\" target=\"_blank\">Destinations V2</a>. Contact Airbyte Support to participate in the beta program.",
"title": "Use Destinations V2 (Early Access)",
"order": 10
},
"raw_data_schema": {
"type": "string",
"description": "(Beta) The schema to write raw tables into",
"title": "Destinations V2 Raw Table Schema (Early Access)",
"order": 11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Hesperide - we are going right into early-access for all users, no private beta thi time around

@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

return nameTransformer.applyDefaultCase(String.join(".",
nameTransformer.convertStreamName(namespace),
nameTransformer.convertStreamName(streamName)));
if (use1s1t) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something that only applies to 1s1t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. this prevents snowflake from uppercasing the schema/table name, which is a breaking change relative to current behavior.

// TODO
return new StreamId(namespace, name, rawNamespaceOverride, StreamId.concatenateRawTableName(namespace, name), namespace, name);
}

@Override
public ColumnId buildColumnId(String name) {
public ColumnId buildColumnId(final String name) {
// TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing 🤦 I'm sure we'll get some beta users reporting weird character handling, but in my current testing this seems to work fine. I'll remove the comments.

@@ -174,14 +174,28 @@ private String extractAndCast(final ColumnId column, final AirbyteType airbyteTy
return new StringSubstitutor(Map.of("column_name", column.originalName())).replace(
"""
CASE
WHEN NOT ("_airbyte_data":"${column_name}"::TEXT REGEXP '\\\\d{1,2}:\\\\d{2}:\\\\d{2}(\\\\.\\\\d+)?(Z|[+\\\\-]\\\\d{1,2}:\\\\d{2})')
WHEN NOT ("_airbyte_data":"${column_name}"::TEXT REGEXP '\\\\d{1,2}:\\\\d{2}:\\\\d{2}(\\\\.\\\\d+)?(Z|[+\\\\-]\\\\d{1,2}(:?\\\\d{2})?)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whenever I have a gnarly regex like this I like to add a comment with an example or two of what it is supposed to match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, added comments for this

@edgao edgao marked this pull request as draft August 8, 2023 18:45
@edgao edgao marked this pull request as ready for review August 8, 2023 18:55
@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

@edgao edgao enabled auto-merge (squash) August 9, 2023 14:51
@octavia-squidington-iii
Copy link
Collaborator

destination-teradata test report (commit 3e515efadf) - ✅

⏲️ Total pipeline duration: 15mn21s

Step Result
Validate airbyte-integrations/connectors/destination-teradata/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build destination-teradata docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:destination-teradata:integrationTest

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-teradata test

@octavia-squidington-iii
Copy link
Collaborator

destination-snowflake test report (commit 3e515efadf) - ✅

⏲️ Total pipeline duration: 40mn32s

Step Result
Validate airbyte-integrations/connectors/destination-snowflake/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build destination-snowflake docker image for platform linux/x86_64
Build airbyte/normalization-snowflake:dev
./gradlew :airbyte-integrations:connectors:destination-snowflake:integrationTest

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-snowflake test

@octavia-squidington-iii
Copy link
Collaborator

destination-bigquery test report (commit 3e515efadf) - ✅

⏲️ Total pipeline duration: 32mn03s

Step Result
Validate airbyte-integrations/connectors/destination-bigquery/metadata.yaml
Connector version semver check
QA checks
Build connector tar
Build destination-bigquery docker image for platform linux/x86_64
Build airbyte/normalization:dev
./gradlew :airbyte-integrations:connectors:destination-bigquery:integrationTest

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-bigquery test

@edgao edgao merged commit e9f1a7a into master Aug 9, 2023
@edgao edgao deleted the edgao/1s1t/snowflake_impl2 branch August 9, 2023 16:21
harrytou pushed a commit to KYVENetwork/airbyte that referenced this pull request Sep 1, 2023
* disable 1s1t in gcs/s3 mode

* derp

* quote things in many places

* fix timestamp format...?

* delete unused tests

* more expectedrecord timestamp fixes

* implement dumpFinalTable

* fix bug

* bugfix in schema change detection

* add schema change detection tests

* fix schema change detection

* add spec options

* logistics

* add timestamp format test

* and snowflake

* accept raw schema override

* fix test handling

* fix unit test

* Automated Commit - Format and Process Resources Changes

* typo

* forgot to fix this

* ... I had uncommitted changes

* resolve TODOs

* add regex examples

* correctly drop check table in check connection

* also bump teradata >.>

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants