Skip to content

Commit

Permalink
Shrodingers/destination databricks dbt (#1)
Browse files Browse the repository at this point in the history
* octavia-cli: fix workspace not having anonymous_data_collection property (#13869)

* Update connection update calls to use central utility to ensure connection update has all data (#13564)

* Update connection updates with build update utility
* Add buildConnectionUpdate utility
* Update components that update the connection to use utility when necessary

* Use conection name when saving connection from replication view to prevent override from refreshed catalog

* Improve connection check on ReplicationView onSubmit function

* Display connection state in connection setting page (#13394)

* Display Connection State in Setting page

* memoize callback

* rendering and confirmaton

* setState API

* Input validation

* remove JSON step

* rename apiMethod to `updateState`

* test and adjust route

* skip if sync is running

* prevent state update when sync is running

* code editor component

* errors fixed

* scss style

* make linter happy

* Back to monaco editor

* Remove ability to edit state

* Adjust FE code

* Fix CSS problem

* Update airbyte-webapp/src/locales/en.json

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* just use PRE to render state for now

Co-authored-by: Tim Roes <tim@airbyte.io>
Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* update api for per stream (#13835)

* Update airbyte-protocol.md (#13892)

* Update airbyte-protocol.md

* Fix typo

* Fix prose

* Add protocol reviewers for protocol documentation

* Remove duplicate

* Edited Amplitude, Mailchimp, and Zendesk Support docs (#13897)

* deleting SUMMARY.md since we don't need it for docusaurus builds (#13901)

* Do not hide unexpected errors in the check connection (#13903)

* Do not hide unexpected errors in the check connection

* Fix test

* Common code to deserialize a state message in the new format (#13772)

* Common code to deserialize a state message in the new format

* PR comments and type changed to typed

* Format

* Add StateType and StateWrapper objects to the model

* Use state wrapper instead of Either

* Switch to optional

* PR comments

* Support array legacy state

* format

Co-authored-by: Jimmy Ma <jimmy@airbyte.io>

* 🐛 Source Amazon Seller Partner: handle start date for financial stream (#13633)

* start and end date for finacial stream should not be more than 180 days apart

* improve unit tests

* make changes to start date for finance stream

* update tests

* lint changes

* update version to 0.2.22 for source-amazon-seller-partner

* Normalization: Fix incorrect jinja2 macro `json_extract_array` call (#13894)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Docs: fixed the broken links (#13915)

* 0.2.5 -> 0.2.6 (#13924)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* 13546 Fix integration tests source-postgres Mac OS (#13872)

* 13546 Fix integration tests source-postgres Mac OS

* 13548 Fixed integration tests source-tidb Mac OS (#13927)

* Source MsSql : incr ver to include changes #13854 (#13887)

* incr version

* put PR id

* docker ver

* connectors that published (#13932)

* Deprecate PART_SIZE_MB in connectors using S3/GCS storage (#13753)

* Removed part_size from connectors that use StreamTransferManager

* fixed S3DestinationConfigTest

* fixed S3JsonlFormatConfigTest

* upadate changelog and bump version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* upadate changelog and bump version for Redshift and Snowflake destinations

* auto-bump connector version

* fix GCS staging test

* fix GCS staging test

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Reverted changes in SshBastionContainer (#13934)

* 🎉 New Source Dockerhub (#13931)

* init

* implement working source + tests

* add docs

* add docs

* fix bad comments

* Update airbyte-integrations/connectors/source-dockerhub/acceptance-test-config.yml

* Update airbyte-integrations/connectors/source-dockerhub/Dockerfile

* Update airbyte-integrations/connectors/source-dockerhub/.dockerignore

* Apply suggestions from code review

* Update docs/integrations/sources/dockerhub.md

* Update airbyte-integrations/connectors/source-dockerhub/integration_tests/acceptance.py

Co-authored-by: George Claireaux <george@airbyte.io>

* address @Phlair's feedback

* address @Phlair's feedback

* each record is now a Docker image rather than response page

* format

* fix unit tests

* fix acceptance tests

* add icon, definition and generate seed spec

* add requests to requirements

Co-authored-by: sw-yx <shawnthe1@gmail.com>

* commented out non-relevant tests (#13940)

* Bump Airbyte version from 0.39.20-alpha to 0.39.21-alpha (#13938)

Co-authored-by: alafanechere <alafanechere@users.noreply.github.com>

* newaction (#13942)

* remove test action (#13944)

* 🎉Source-mysql: aligned datatype test (#13945)

* [13607] source-mysql: aligned datatype tests for regular and CDC ways + added CHAR fix to CDC processing

* #13958 Source Stripe: fix configured catalogs (#13959)

* 🐛 Source: Typeform - Update schema for Responses stream (#13935)

* Upd responses schema

* Upd docs

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* :window: Updated email invitation flow that enables invited users to set name and create password (#12788)

* First pass accepting email link invitation
* Update Auth service with signInWithEmailLink calls
* Add AcceptEmailInvite component
* Update FirebaseActionRoute to handle sign in mode
* Rename ResetPasswordAction to FirebseActionRoute

* Add create password setp to AcceptEmailInvite component

* Remove continueURL from invite fetch

* Update accept email invite for user to enter both email and password together

* Set name during email link signup

* Update AcceptEmailInvite to send name
* Add updateName to UserService
* Update AuthService to set name during sign up

* Remove steps from AcceptEmailInvite component
Remove setPassword from AuthService

* Add header and title to accept invite page

* Move invite error messages to en file

* For invite link pages, show login link instead of sign up

* Disable name update on sign in via email lnk

* Resend email invite when the invite link is expired

* Fix status message in accept email invite page

* Re-enable set user's name during sign up email invite

* Update signUpWithEmailLink so that sign up is successful even if we fail to update the user's name

* Update comments on GoogleAuthService signInWithEmailLink

* Add newsletter and accept terms checkboxes to accept email invite component
* Extract signup form from signup page
* Extract fields from signup form
* Update accept email invite component to use field components from signup form
* Ensure that sign up button is disable until form is valid and security checkbox is checked

* Make error status text color in accept email link red

* Update workspace check in DefaultView so that user lands in workspace selector when there are no workspaces

* Add coment around continueUrl param usage in UserService

* Remove usless default case in GoogleAuthService

* Source Marketo: process fail during creation of an export job (#13930)

* #9322 source Marketo: process fail during creation of an export job

* #9322 source marketo: upd changelog

* #9322 source marketo: fix unit test

* #9322 source marketo: fix SATs

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* :window: :wrench: Add eslint rules for CSS modules (#13952)

* add eslint-plugin-css-modules rules

* Fixes:
- turn on eslint css modules rule as error
- remove unused styles

* add warning message if styled components is used

* Revert "add warning message if styled components is used"

This reverts commit 4e92b8b2110142bb679f15aeb034e377e0dcc69c.

* replace rule severity with words

* Update salesforce.md

Fixed broken link

* :window: 🔧 Add auto-fixable linting rules to webapp (#13462)

* Add new eslint rules that fit with our code style and downgrade rules to warn

* allowExpressions in fragment eslint rule

* Enable function-component-definition in eslint and fix styles

* Cleanup lint file

* Fix react/function-component-definition warnings manually

* Add more auto-fixable rules and fix

* Fix functions that require usless returns

* Update array-type rule to array-simple

* Fix eslint errors manually
disable assignmentExpression for arrays in prefer-destructuring rule

* Auto fix new linting issues after rebase

* Enhance /publish to allow for multiple connectors and parallel execution (#13864)

* start

* revert

* azblob

* bq

* bq denorm

* megapublish baaaabyyyy

* fix needs

* matrix connectors

* auto-bump connector version

* dont failfast and max parallel 5

* multi runno

* minor

* testing matrix agents

* name

* testing multi agents

* tmp fix

* new multi agents

* multi test

* tryy

* let's do this

* magico

* fix

* label test

* couple more connector bumps

* temp

* things

* check this

* lets gooo

* more connectors

* Delete TEMP-testing-command.yml

* auto-bump connector version

* added comment describing bash part

* running single thread

* catch sentry cli

* auto-bump connector version

* destinations

* + snowflake

* saved

* auto-bump connector version

* auto-bump connector version

* java source bumps

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* remove twice-defined methods

* label things

* revert action

* using the new test action

* point at action

* wrong tag on action

* update pool label

* update to use new ec2-github-runner fork

* this needs to be more generic than publisher

* change publish to run on pool

* add comment about runner-pool usage

* updated publish command docs for multi & parallel connector runs

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* unbump failed publish versions

* missed dockerfiles

* remove failed docs

* mssql fix

* overhauled the git comment output

* bumping a test connector that should work

* slight order switcheroo

* output connectors properly in first message

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Bump Airbyte version from 0.39.21-alpha to 0.39.22-alpha (#13979)

Co-authored-by: Phlair <Phlair@users.noreply.github.com>

* Parker/temporal cloud (#13243)

* switch to temporal cloud client for now

* format

* use client cert/key env secret instead of path to secret

* add TODO comments

* format

* add logging to debug timeout issue

* add more logging

* change workflow task timeout

* PR feedback: consolidate as much as possible, add missing javadoc

* fix acceptance test, needs to specify localhost

* add internal-use only comments

* format

* refactor to clean up TemporalClient and prepare it for future dependency injection framework

* remove extraneous log statements

* PR feedback

* fix test

* return isInitialized true in test

* 📄  Postgres source: fix CDC setup order in docs (#13949)

* postgres source: fix CDC setup order docs

* Update docs/integrations/sources/postgres.md

Co-authored-by: Liren Tu <tuliren@gmail.com>

* Per-stream state support for Postgres source (#13609)

* WIP Per-stream state support for Postgres source

* Fix failing test

* Improve code coverage

* Make global the default state manager

* Add legacy adapter state manager

* Formatting

* Include legacy state for backwards compatibility

* Add global state manager

* Implement Global/CDC state handling

* Fix test issues

* Fix issue with updated method signature

* Handle empty state case in global state manager

* Adjust to protocol changes

* Fix failing acceptance tests

* Fix failing test

* Fix unmodifiable list issue

* Fix unmodifiable exception

* PR feedback

* Abstract global state manager selection

* Handle conversion between different state types

* Handle invalid conversion

* Rename parameter

* Refactor state manager creation

* Fix failing tests

* Fix failing integration tests

* Add CDC test

* Fix failing integration test

* Revert change

* Fix failing integration test

* Use per-stream for postgres tests

* Formatting

* Correct stream descriptor validation

* Correct permalink

* PR feedback

* Bump Airbyte version from 0.39.22-alpha to 0.39.23-alpha (#13984)

Co-authored-by: pmossman <pmossman@users.noreply.github.com>

* Adds test for new workflow (#13986)

* Adds test for new workflow

* Adds airbyte repo

* remove testing line

* Add new InterpolatedRequestOptionsProvider that encapsulates all variations of request arguments (#13472)

* write out new request options provider and refactor components and parts of the YAML config

* fix formatting

* pr feedback to consolidate body_data_provider to simplify the code

* pr feedback get rid of extraneous optional

* publish oss for cloud (#13978)

workflow to publish oss artifacts that cloud needs to build against
use docker buildx to create arm images for local development

* skip debezium engine startup in case no table is in INCREMENTAL mode (#13870)

* 🎉 Source Github: break point added for workflows_runs stream (#13926)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* 6339: error when attempting to use azure sql database within an elastic pool as source for cdc based replication (#13866)

* 6339: debug info

* 6339: not using 'USE' on Azure SQL servers

* 6339: cleanup

* 6339: cleanup2

* 6339: cleanup3

* 6339: versions/changelogs updated

* 6339: merge from master (consolidation issue)

* 6339: dev connector version (for testing in airbyte cloud)

* 6339: code review implementation

* 6339: apply formatting

* in case runners fail to spin up, this needs to run on github-hosted (#13996)

* 12708: Add an option to use encryption with staging in Redshift Destination (#13675)

* 12708: Add an option to use encryption with staging in Redshift Destination

* 12708: docs/docker configs updated

* 12708: merge with master

* 12708: merge fix

* 12708: code review implementation

* 12708: fix for older configs

* 12708: fix for older configs in check

* 12708: merge from master (consolidation issue)

* 12708: versions updated

* :tada: New Source: Webflow (#13617)

* Added webflow code

* Updated readme

* Updated README

* Added webflow to source_definitions.yaml

* Enhanced documentation for the Webflow source connector

* Improved webflow source connector instructions

* Moved Site ID to before API token in Spec.yaml (for presentation in the UI)

* Addressed comments in PR.

* Changes to address requests in PR review

* Removed version from config

* Minor udpate to spec.yaml for clarity

* Updated to pass the accept-version as a constant rather than parameter

* Updated check_connection to hit the collections API that requires both site id and the authentication token.

* Fixed the test_check_connection to use the new check_connection function

* Added a streams test for generate_streams

* Re-named "autentication" object to "auth" to be more consistent with the way it is created by the CDK

* Added in an explict line to instantiante an "auth" object from WebflowTokenAuthenticator, to make it easier to describe in the blog

* Fixed a typo in a comment

* Renamed some classes to be more intuitive

* Renamed class to be more intuitive

* Minor change to an internal method name

* Made _get_collection_name_to_id_dict staticmethod

* Fixed a unit-test error that only appeared when running " python -m pytest -s unit_tests".
This was caused by Mocked settings from test_source.py leaking into test_streams.py

* format: add double quotes and remove unused import

* readme: remove semantic version naming of connector in build commands

* Updated spec.yaml

* auto-bump connector version

* format files

* add changelog

* update dockerfile

* auto-bump connector version

Co-authored-by: sajarin <sajarindider@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>

* Source-oracle: fixed tests + checkstyle (#13997)

* Source-oracle: fixed tests + checkstyle

* 🐛Destination-mysql: fixed integration test and build process (#13302)

* [13180] destination-mysql: fixed integration test

* update changelog to include debezium version upgrade (#13844)

* make table headers look less like successes (#13999)

* source-twilio: implement lookback windows (#13896)

* Revert "12708: Add an option to use encryption with staging in Redshift Destination (#13675)" (#14010)

This reverts commit aa28d448d820df9d79c2c0d06b38978d1108fb2c.

* Revert "6339: error when attempting to use azure sql database within an elastic pool as source for cdc based replication (#13866)" (#14011)

This reverts commit 0d870bd37bc3b5cd798b92115d73bcc45a42d8f7.

* [low-code connectors] BasicHttpAuthenticator (#13733)

* implement basichttpauthenticator

* add optional refresh access token authenticator

* remove prints

* type hints

* Fix and unit test

* missing test

* Add class to __init__ file

* Add comment

* migrate JsonSchemas to use basic path instead of JSONPath (#13917)

* scaffold for catalog diff, needs fixing on type handling and tests (#13786)

* Prepare release of JDBC connectors (#13987)

* Prepare release of JDBC connectors

* Update source definitions manually

* use built in check for if path is definite (#13834)

* 13535 Fixed bastion network for integration tests (#14007)

* doc: add error troubleshooting `docker-compose up` (#13765)

* fix: duplicate resource allocations in `airbyte-temporal` deployment (#13816)

* helm-chart: Fix worker deployment format error (#13839)

* add catalog diff connection read (#13918)

* doc: fix small typo on Shopify documentation (#13992)

* add streams to reset to job info (#13919)

* Generate api for changes in #13370 and make code compatible (#14014)

* Generate api for per-stream updates #13835 (#14021)

* Revert "Prepare release of JDBC connectors (#13987)" (#14029)

This reverts commit df759b30778082508e2872513800fac34d98ff7c.

* Fix per stream state protocol backward compatibility (#14032)

* rename state type field to fix backwards compatibility issue

* replace usages of stateType with type

* support semi incremental by adding extractor record filter (#13520)

* support semi incremental by adding extractor record filter

* refactor extractor into a record_selector that supports extraction and filtering of response records

* Remove pydantic spec from amazon ads and use YAML spec (#13988)

* add EdDSA support in SSH tunnel (#9494)

* add EdDSA support

* verify EdDSA support works correct

Co-authored-by: Yurii Bidiuk <yura.bidyuk@gmail.com>

* 🎉New source connector: source-metabase (#13752)

* Add docs

* Close metabase session when sync finishes

* Close session in check_connection

* Add source definition to seed

* Add icon

* improve cdc check for connectors (#14005)

* improve should use cdc check

* Revert "improve should use cdc check"

This reverts commit 7d01727279d21d33a6c18ed3227ee94432636120.

* improve should use cdc check

* add unit test

* Update webflow.md

* Update webflow.md

* Update webflow.md

* Remove legacy sentry code from cdk (#14016)

* rip sentry out of cdk

* remove sentry dsn from gsc

* Update webflow.md

* Update webflow.md

* Fixed broken links (#14071)

* 🪟Persist unsaved changes on schema refresh (#13895)

* add form values tracker context

* add clarifying comment

* add same functionality to create connection

* Update airbyte-webapp/src/components/CreateConnectionContent/CreateConnectionContent.tsx

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* Fixes broken links so we can deploy again (#14075)

also adds better error message for when this happens to others

* Adds symmary.md to gitignore (#14078)

* Added webflow icon (#14069)

* Added webflow icon

* Added icon

* Build create connection form build failure (#14081)

* Fix CDK obfuscation of nested secrets (#14035)

* Added Buy Credits section to Managing Airbyte Cloud (#13905)

* Added Buy Credits section to Managing Airbyte Cloud

* Made some style changes

* Made edits based on Natalie's suggestions

* Deleted link

* Deleted line

* Edited email address

* Updated reaching out to sales sentence

* disable es-lit to fix build (#14087)

* Release source connectors (#14077)

* Release source connectors

* Fix issue with database connection in test

* Fix failing tests due to authentication

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Bump Airbyte version from 0.39.23-alpha to 0.39.24-alpha (#14094)

Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com>

* Emit the state to remove in the airbyte empty source (#13725)

What
This PR updates the EmptyAirbyteSource in order to perform a partial update and handle the new state message format.

How
The empty will now emit different messages based on the type of state being provided:

Per stream: it will emit one message per stream that have been reset
Global: It will emit one global message that will contain null for the stream that have been reset including the shared state
Co-authored-by: Jimmy Ma <jimmy@airbyte.io>

* Add StatePersistence object (#13900)

Add a StatePersistence object that supports Read/Writes of States to the DB with StreamDescriptor fields

The only migrations that is supported are
* moving from LEGACY to GLOBAL
* moving from LEGACY to STREAM
* All other state type migrations are expected to go through an explicit reset beforehand.

* secret-persistence: Hashicorp Vault Secret Store (#13616)

Co-authored-by: Amanda Murphy <amanda.murphy@heapanalytics.com>
Co-authored-by: Benoit Moriceau <benoit@airbyte.io>

* 🐛 Source Hubspot: remove `AirbyteSentry` dependency (#14102)

* fixed

* updated changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* fix: format VaultSecretPersistenceTest.java (#14110)

* Source Hubspot: extend error logging (#14054)

* #291 incall - source Hubspot: extend error logging

* huspot: upd changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Update webflow.md (#14083)

* Update webflow.md

Removed a description that is only applicable to people that are writing connector code, not to _users_ of the connector.

* Update webflow.md

* Update webflow.md

* Update webflow.md

* Update webflow.md

* 12708: Add an option to use encryption with staging in Redshift Desti… (#14013)

* 12708: Add an option to use encryption with staging in Redshift Destination (#13675)

* 12708: Add an option to use encryption with staging in Redshift Destination

* 12708: docs/docker configs updated

* 12708: merge with master

* 12708: merge fix

* 12708: code review implementation

* 12708: fix for older configs

* 12708: fix for older configs in check

* 12708: merge from master (consolidation issue)

* 12708: versions updated

* 12708: specs updated

* 12708: specs updated

* 12708: removing autogenerated files from PR

* 12708: changelog updated

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Source PayPal Transaction: Update Transaction Schema (#13682)

* Update transaction schema.
* Transform money values from strings to floats or integers.

Co-authored-by: nataly <nataly@airbyte.io>
Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* fix(jsonSchemas): raise error when items property not provided (#14018)

* fix stream name in stream transformation update (#14044)

* 🐛 Destination Redshift: Improved discovery for redshift-destination not SUPER streams (#13690)

airbyte-12843: Improved discovery for redshift-destination not SUPER tables, excluded views from discovery.

* Remove skiptests option (#14100)

* update sentry release script (#14123)

* Remove "additionalProperties": false from specs for connectors with staging (#14114)

* Remove "additionalProperties": false from spec for connectors with staging

* Remove "additionalProperties": false from spec for Redshift destination

* bump versions

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* [14003] source-oracle: added custom jdbc field (#14092)

* [14003] source-oracle: added custom jdbc field

* Add JobErrorReporter for sending sync job connector failures to Sentry (#13899)

* skeleton for reporting connector errors to sentry

* report on job failures instead of attempt failures

* report sync job failures with relevant metadata using JobErrorReporter

* send stack traces from python connectors to sentry

* test JobCreationAndStatusUpdate and JobErrorReporter

* logs

* refactor into helper, initial tests

* using sentry

* run format

* load reporting client from env

* load sentry dsn from env

* send java stack traces to sentry

* test sentryclient, refactor to use Hub instance

* ErrorReportingClient.report -> .reportJobFailureReason

* inject exception helper, test stack trace parse error tagging

* rm logs

* more stack trace tests

* remove logs

* fix failing tests

* rename ErrorReportingClient to JobErrorReportingClient

* rename vars in docker-compose

* Return an Optional instead of null when parsing stack traces

* dont remove airbyte prefix when setting release name

* from_trace_message static

* remove failureSummary from jobfailure input, get from Job

* send stacktrace string if we weren't able to parse

* set deployment mode tag

* update .env

* just log if something goes wrong

* Use StateMessageHelper in source (#14125)

* Use StateMessageHelper in source

* PR feedback and formatting

* More PR feedback

* Revert change

* Revert changes

* Bump Airbyte version from 0.39.24-alpha to 0.39.25-alpha (#14124)

Co-authored-by: brianjlai <brianjlai@users.noreply.github.com>

* Refactor acceptance tests and utils (#13950)

* Refactor Basic acceptance tests and utils

* Refactor Advanced acceptance tests and utils

* Remove unused code

* Clear destination db data during cleanup

* Cleanup comments

* cleanup init code

* test creating new desintation db for each test

* cleanup desintation db init

* Allow to edit api client

* pull in temporal cloud changes

* Rename helper to harness; set some funcs to private; turn init into constructor

* add func to set env vars instead of using static vars and move some functionality out of init into acceptance tests

* update javadoc

Co-authored-by: Davin Chia <davinchia@gmail.com>

* fix javadoc formatting

* fix var naming

Co-authored-by: Davin Chia <davinchia@gmail.com>

* Bump Airbyte version from 0.39.25-alpha to 0.39.26-alpha (#14141)

Co-authored-by: terencecho <terencecho@users.noreply.github.com>

* 🎉 octavia-cli: Add ability to get existing resources (#13254)

* 13541 Fixed integration tests source-db2 Mac OS (#14133)

* 13523 Fix integration tests destination-cassandra Mac OS (#14134)

* 🐛 Source Hubspot: fixed SAT test, commented out expected_records (#14140)

* :bug: Source Intercom: extend `Contacts` schema with new properties (#14099)

* Source Twilio: adopt best practices (#14000)

* #1946 Source twilio: aopt best practices - tune tests

* #1946 add expected_records to acceptance-test-config.yml

* #1946 source twilio - upd schema and changelog

* #1946 fix expected_records

* #1946 source twilio: rm alerts from expected records as they expire in 30 days

* #1946 source twilio: bump version

* 🎉 Source BingAds:  expose hourly/daily/weekly/monthly options from configuration (#13801)

* #12489 - expose hourly/daily/weekly/monthly reports in discovery by default instead of in the connector's configuration settings

removed:  config settings for hourly/daily/weekly/monthly reports
added:    default value for all periodic reports to True

* #12489 - expose hourly/daily/weekly/monthly reports in discovery by default instead of in the connector's configuration settings

removed:  unused class variables, if-statement

* #12489 - expose hourly/daily/weekly/monthly reports in discovery by default instead of in the connector's configuration settings

removed:  unused variables from config

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* remove VersionMismatchServer (#14076)

* remove VersionMismatchServer

* remove VersionMismatchServerTest

* revert intended changes

* Increase instance termination time limit to 3 hours to accommodate connector builds. (#14181)

* Use correct bash comment symbol. (#14183)

* 🎉 New Source: Orbit.love (#13390)

* source-orbit: add definition and specs (#14189)

* 🎉 Base Norrmalization: clean-up Redshift `tmp_schemas` after SAT (#14015)

Now after `base-normalization` SAT the Destination Redshift will be automatically cleaned up from test leftovers. Other destinations are not covered yet.

* Source Salesforce: fix customIntegrationTest for SAT (#14172)

* Source Amazon Ads: increase timeout for SAT (#14167)

* 🎉  Introduce Google Analytics Data API source (#12701)

* Introduce Google Analytics Data API source

https://developers.google.com/analytics/devguides/reporting/data/v1

* Add Google Analytics Data API source PR link

* Add `client` class for Google Analytics Data API

* Move dimensions and metrics extraction to the `client` class

In the Google Analytics Data API

* Change the copyright date to 2022 in Google Analytics Data API

* fix: removing incremental syncs

* fix: change project_id to string

* fix: flake check is failing

* chore: added it to source definitions

* chore: update seed file

Co-authored-by: Harshith Mullapudi <harshithmullapudi@gmail.com>

* 🐛 Destination Redshift: use s3 bucket path for s3 staging operations (#13916)

* Publish acceptance test utils maven artifact (#14142)

* Fix StatePersistence Legacy read/write (#14129)

StatePersistence will wrap/unwrap legacy state on write/read to ensure
compatibility with the old behavior/data.

* 🎉 Destination connectors: Improved "SecondSync" checks in Standard Destination Acceptance tests (#14184)

* [11731] Improved "SecondSync" checks in Standard Destination Acceptance tests

* 🐛 Source Zendesk Support: fixed "Retry-After" non integer value (#14112)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Source Tiktok Marketing: Videometrics (#13650)

* added video metrics in streams.py

* common metrics list updated.

* updated streams.py with extended metrics required.

* updated stream_test

* updated configured_catalog

* video metrics required list updated.

* chore: formatting

* chore: bump version in source definitions

* chore: update seed file

Co-authored-by: Harshith Mullapudi <harshithmullapudi@gmail.com>

* 🎉 Source Github: secondary rate limits has to retry (#13955)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Harshith/test pr 13118 (#14192)

* Firebolt destination

* feat: Write method dropdown

* feat: Use future-proof Auth in SDK

* refactor: Move writer instantiation

* fix: tests are failing

* fix: tests are failing

* fix: tests are failing

* chore: added connector to definitions

* fix: formatting and spec

* fix: formatting for orbit

Co-authored-by: ptiurin <petro.tiurin@firebolt.io>

* 🪟 :art: Show credit usage on chart's specific day (#13503)

* add tooltip to chart

* Fixes:
- update main chart color;
- change onHover background color

* change chart color pallet to grey 500

* update color reference

* remove opacity from UsageCell

* 🐛 destination-redshift: use s3 bucket path for s3 cleanup (#14190)

* Improve documentation for Postgres Source (#13830)

* Improve documentation for Postgres Source
 * add information about additional JDBC params
 * add anchors for doc sections
 * fix link to CDC on Bare Metal
 * add more details about parsing date/time values
 * add doc link to SSH fields

* Handle null reset source config (#14202)

* handle null reset source config

* format

* Wait indefinitely if connection is not active (#14200)

* also wait indefinitely if connection is deleted

* fix test

* Bump Airbyte version from 0.39.26-alpha to 0.39.27-alpha (#14204)

Co-authored-by: lmossman <lmossman@users.noreply.github.com>

* Bmoric/feature flag for state deserialization (#14127)

* Add Feature flag

* Add default feature flag value

* Update test

* remove unsused

* tmp

* Update tests

* rm unwanted change

* PR comments

* [low-code connectors] default types and default values (#14004)

* default types and default values

* cleanup

* fixes so read works

* remove prints and trycatch

* comment

* remove unused param

* split file

* extract method

* extract methods

* comment

* optional

* fix test

* cleanup

* delete interpolated request header provider

* simplify next page url paginator interface

* comment

* format

* add state type endpoint (#14111)

* Bump Airbyte version from 0.39.27-alpha to 0.39.28-alpha (#14210)

Co-authored-by: sherifnada <sherifnada@users.noreply.github.com>

* 🐛 source-orbit: remove workspace_old.json (#14208)

* Fix: Docs plural login redirecting to wrong URL (#14207)

* [docs] fix numbering and incorrect filename in CDK docs (#13045)

* [docs] fix numbering in CDK docs

* Update 5-declare-schema.md

* Update 5-declare-schema.md

* Update 6-read-data.md

* Update 8-test-your-connector.md

* Remove the old scheduler from HelmCharts helper (#14187)

* Remove the old scheduler from HelmCharts helper

The old scheduler was removed as part of https://github.com/airbytehq/airbyte/pull/13400

* Remove legacy `scheduler` comment in HelmCharts

* Source Gitlab: add GroupIssueBoards stream (#13252)

* GitLab Source: add GroupIssueBoards stream

* Address stream schema comments

* Address comments

* Bump version

* Add as empty stream

* run seed file source (#14215)

* fix 'cannot reach server' error on demo instance (#10020)

* Update CODEOWNERS (#14209)

* 🎉 Source Github: use GraphQL for `reviews` stream (#13989)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* workflow for publishing artifacts for cloud (#14199)

* fix sentry org slug change (#14218)

* Source File: correct spec json to match json format (#13738)

* Upgrade spotless version and remove jvmargs workaround (#13705)

* Source Zendesk Chat: Process large amount of data in batches for incremental  (#14214)

* increased the limit of itens in request

* Configuration for max api pages on requests

* included api_pagination_limit in sample

* included api_pagination_limit in invalid_config

* creating new table for chat_session

* reverted api_pagination_limit approach

* removed api_pagination_limit from TimeIncrementalStream

* correct chat json

* bump connector version

* add changelog

* run format

* auto-bump connector version

Co-authored-by: Roberto Bonnet <robertojuarezwp@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Remove all @ts-ignore (#14221)

* Bump hadoop to use version 3.3.3 (#14182)

* Change the persistence activity to use the new persistence layer (#14205)

* Change the persistence activity to use the new persistence layer

* Use lombok

* format

* Use new State message helper

* Fix build (#14225)

* Fix build

* Fix test

* Use new state persistence for state reads (#14126)

* Inject StatePersistence into DefaultJobCreator
* Read the state from StatePersistence instead of ConfigRepository
* Add a conversion helper to convert StateWrapper to State
* Remove unused ConfigRepository.getConnectionState

* Temporal per stream resets (#13990)

* remove reset flags from workflow state + refactor

* bring back cancelledForReset, since we need to distinguish between that case and a normal cancel

* delete reset job streams on cancel or success

* extract isResetJob to method

* merge with master

* set sync modes on streams in reset job correctly

* format

* Add test for getAllStreamsForConnection

* fix tests

* update more tests

* add StreamResetActivityTests

* fix tests for default job creator

* remove outdated comment

* remove debug lines

* remove unused enum value

* fix tests

* fix constant equals ordering

* make job mock not static

* DRY and add comments

* add comment about deleted streams

* Remove io.airbyte.config.StreamDescriptor

* regisster stream reset activity impl

* refetch connection workflow when checking job id, since it may have been restarted

* only cancel if workflow is running, to allow reset signal to always succeed even if batched with a workflow start

* fix reset signal to use new doneWaiting workflow state prop

* try to fix tests

* fix reset cancel case

* add acceptance test for resetting while sync is running

* format

* fix new acceptance test

* lower sleep on test

* raise sleep

* increase sleep and timeout, and remove repeated test

* use CatalogHelpers to extract stream descriptors

* raise sleep and timeout to prevent transient failures

* format

Co-authored-by: alovew <anne@airbyte.io>

* fix PostgresJdbcSourceAcceptanceTest by activating the feature flag (#14240)

* fix PostgresJdbcSourceAcceptanceTest by activating the feature flag

* fix AbstractJdbcSourceAcceptanceTest as well

* fix expected_spec for strict encrypt

* [13539] Fix integration tests source-clickhouse Mac OS (#14201)

* [13539] Fix integration tests source-clickhouse Mac OS
fixed unit tests

* [13524] Fix integration tests destination-clickhouse Mac OS
fixed unit tests

* 6339: error when attempting to use azure sql database within an elastic pool as source for cdc based replication (#14121)

* 6339: implementation

* 6339: changelog updated

* 6339: definitions updated

* 6339: definitions reverted

* 6339: still struggling with publishing

* auto-bump connector version

* 6339: definitions reverted - correct

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 🪟 🎨 Update favicon and table row image styles (#14020)

* style changes to favicon and imageblock

* fix import

* revert component and props names to block

* Update airbyte-webapp/src/components/ImageBlock/ImageBlock.tsx

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* Update airbyte-webapp/src/components/ImageBlock/ImageBlock.module.scss

Co-authored-by: Vladimir <volodymyr.s.petrov@globallogic.com>

* Update airbyte-webapp/src/components/ImageBlock/ImageBlock.tsx

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* Update airbyte-webapp/src/components/ImageBlock/ImageBlock.module.scss

Co-authored-by: Vladimir <volodymyr.s.petrov@globallogic.com>

* add storybook

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>
Co-authored-by: Vladimir <volodymyr.s.petrov@globallogic.com>

* upgrade potgresql version to fix default timestamp handling (#14211)

* implement logic to trigger snapshot of new tables via debezium (#13994)

* implement logic to trigger snapshot of new tables via debezium

* format

* improve test condition

* fix build

* BigQuery Denormalized "airbyte_type": "big_integer" to INT64 (#14079)

* BigQuery Denormalized "airbyte_type": "big_integer" to INT64

* updated changelog

* added unit test

* removed star import

* fixed checkstyle

* bump version

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Add Metrics section to Scaling Airbyte doc (#14224)

* Added metrics section to scaling airbyte doc

* Updated URL in doc

* Deleted link

* Added link

* Added backslashes before brackets that aren't links

* Edited note about tagged metrics

* Changed list

* Changed spacing

* Changed spacing

* Changed spacing

* Deleted period

* Fixed broken firebolt link

* Added tables

* Cleaned up wording in tables

* Add ability to provide source/destination connector docker image (#14266)

* Add ability to provide source/destination connector docker image

* Make constant public

* Bump Airbyte version from 0.39.28-alpha to 0.39.29-alpha (#14232)

* disable flaky cmw test temporarily (#14269)

* release new postgres source connector version 0.4.29 (#14265)

* release new postgres source connector version 0.4.29

* add changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* :tada: Source Tiktok marketing - remove granularity config option (#13890)

* Removed granularity config option from spec, added corresponsing streams for each support granularity (hourly daily, lifetime), updated unittests, SAT

* auto-formating

* auto-formating

* removed AdvertisersIds stream from list of exposed streams, updated docs

* expose new style streams since 0.1.13, expose old streams for config for older version

* update spec

* fixed path to catalog

* increased timeout

* source bing-ads to ga (#13679)

* Source Tiktok marketing - increase connector version (#14272)

* increased connector version

* increased connector version in seed

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Fix flaky connection manager workflow test (#14271)

* try thread sleep instead of test env, and run 100 times

* replace testEnv.sleep with Thread.sleep in several tests

* replace RepeatedTest with Test

* replace testEnv.sleep with Thread.sleep after signals are executed

* run each test 100 times to see if any are flaky

* add log

* change repetitions to 100 to avoid out of memory

* format

* swap repeated test for normal test

* 13532 Fixed integration tests destination-mssql Mac OS (#14252)

* 13532 Fixed integration tests destination-mssql Mac OS

* Source Google Analytics: Specify integer for dimension ga:dateHourMinute (#14298)

* Specify integer for dimension ga:dateHourMinute
* Update changelog

* 🎉 Source Github: rename field `mergeable` to `is_mergeable` (#14274)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Update Airbyte Client (#14270)

* #12668 #13198 enable full refresh, disable incremental and expected_records (#14191)

* 🎉 Destination S3: update INSTANCE_PROFILE to use AWSDefaultProfileCredential (#14231)

Co-authored-by: Mike Balmer <remlabm@users.noreply.github.com>

* Source Zendesk Support: pagination group membership (#14304)

* add next_page_tooken and request

* correct group_membership paginatin

* update doc

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 🪟 🐛 Fix OAuth validation not allowing to create source or destination (#14197)

* Enable "Set up source/destination" button only if the form is valid

* Update how ServiceForm initial values are patched so that it correctly patches the configuration with default values

* Update initial values patching in service form to use initialValues to preserve already set values
Update useOAuthFlowAdapter to correctly merge the values from the oauth response

* Remove unused values var from ServiceForm

* Add acceptance tests for per-stream state updates (#14263)

* Add acceptance tests for per-stream state updates

* PR feedback

* Formatting

* More PR feedback

* PR feedback

* Remove unused constant

* Make sure that the feature flag is transfer to container (#14314)

* Make sure that the feature flag is transfer to container

* propagate the feature flags

* Avoid propagating the feature flags

* Fix tests

* Source Postgres : use more simple and comprehensive query to get selectable tables (#14251)

* use more simple and comprehensive query to get selectable tables

* cover case when schema is not specified

* add test to check discover with different ways of grants

* format

* incr ver

* incr ver

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Fixed broken link

* Fix for deleting stream resets (#14322)

* Fix for deleting stream resets

* Fix build by updating var (#14321)

* Edited formatting (#14275)

* Avoid error when creating dupl stream reset (#14328)

* Bump Airbyte version from 0.39.29-alpha to 0.39.30-alpha (#14329)

Co-authored-by: lmossman <lmossman@users.noreply.github.com>

* Release new postgres strict encrypt version (#14331)

* Bump postgres strict encrypt version

* Update changelogs

* Update doc

* Release new destination s3 version to pick up latest change (#14332)

* Bump s3 version

* Update pr id

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 13538 Fix integration tests destination-scylla Mac OS (#14308)

* 13538 Fix integration tests destination-scylla Mac OS

* Update cdk-speedrun.md (#14258)

Added a link at the bottom of the article , so the user may find the more in-depth tutorial about building a real-world connector.

* Update README.md (#14303)

Added a link to https://airbyte.com/tutorials/extract-data-from-the-webflow-api in Webflow's README.md

* Update building-a-python-source.md (#14262)

* Update webflow.md (#14254)

Added a link to the new blog - https://airbyte.com/tutorials/extract-data-from-the-webflow-api

Co-authored-by: Simon Späti <simu@sspaeti.com>

* Alex/declarative stream incremental fix (#14268)

* checkout files from test branch

* read_incremental works

* reset to master

* remove dead code

* comment

* fix

* Add test

* comments

* utc

* format

* small fix

* Add test with rfc3339

* remove unused param

* fix test

* 🐛 SingerSource: Fix incompatibilities and typing issues (#14148)

* Use logging.Logger in SingerSource

* Fix SingerSource ConfigContainer

This fixes typing issues with `ConfigContainer` and makes it compatible
with `split_config`. Fixes #8710.

* Fix SingerSource state and catalog typer issues

* Rename SingerSource method args to match parent classes

* Remove old comment about excluding Singer

Co-authored-by: Alexandre Girard <alexandre@airbyte.io>

* Update source postgres release stage to beta (#14326)

* fix NPE (#14353)

* fix NPE

* Add test

* Fix trailing

* 🎉 octavia-cli: Add ability to import existing resources (#14137)

* helm chart: Add Image Pull Secrets Param  (#14031)

* fix format (#14354)

* Bump Airbyte version from 0.39.30-alpha to 0.39.31-alpha (#14355)

Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com>

* tiktok to ga (#14358)

* Update state.state type (#14360)

* Run some DATs as part of base-normalization tests (#14312)

* Revert "🎉 Source Github: rename field `mergeable` to `is_mergeable` (#14274)" (#14338)

* Revert "🎉 Source Github: rename field `mergeable` to `is_mergeable` (#14274)"

* Properly update the hasEmitted state (#14367)

* Bmoric/state aggregator (#14364)

* Update state.state type

* Add state aggregator

* Test and format

* PR comments

* Move to its own package

* Update airbyte-workers/src/test/java/io/airbyte/workers/internal/state_aggregator/StateAggregatorTest.java

Co-authored-by: Lake Mossman <lake@airbyte.io>

* format

* Update airbyte-workers/src/main/java/io/airbyte/workers/internal/state_aggregator/DefaultStateAggregator.java

Co-authored-by: Lake Mossman <lake@airbyte.io>

* format

Co-authored-by: Lake Mossman <lake@airbyte.io>

* Bump Airbyte version from 0.39.31-alpha to 0.39.32-alpha (#14383)

Co-authored-by: alafanechere <alafanechere@users.noreply.github.com>

* 🐛 Source Mixpanel: fix SAT tests (#14349)

* Call the new revoke_user_session endpoint from the FE (#13165)

* Source Instagram: change releaseStage to GA (#14162)

* Source Google Analytics: Change releaseStage to GA (#13957)

* source-outreach: fix record parsing and cursor field access (#14386)

* Kustomize: Use `resources` since `bases` is deprecated (#14037)

* fix: clone api doesn't take update configurations (#13592)

* fix: clone api doesn't take update configurations

* fix: you will be able to create clone in different workspace

* fix: added description to source/destination body

* cdk: Attach namespace to stream  in catalog (#13923)

* Source TiDB: correct jdbc string builder (#14243)

* add icon for tidb-connector

* Fix TiDB source connector

* bump connector version

* auto-bump connector version

Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Source Google Ads: use docsaurus feature for warn/note and udpdate doc (#14392)

* use docsaurus feature for warn/note and udpdate doc

* update description in supported streams

* Source Facebook Marketing: allow configuration of MAX_BATCH_SIZE (#14267)

* Add max batch size config

* Bump semver

* add changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 🎉 Source Github: add Retry for GraphQL API Resource limitations (#14376)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Add more metadata to the JobErrorReporter (#14395)

* add workspace_id and connector_repository as tags

* add tag for connection url

* fix urls for job notifier

* format

* fix failing test

* beta -> generally_available (#14315)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* helm chart: Fix/double printing of extra volume mounts (#14091)

* SentryJobErrorReporter: better handling of multiline chained java exceptions (#14398)

* Docs: deploy on gcp use docusaurus tabs (#14401)

* Revert "Kustomize: Use `resources` since `bases` is deprecated (#14037)" (#14415)

This reverts commit 5c9a6a5fc655a9e597f755be8fc8ccf805a2537a.

* Use Debezium Postgres image for CDC tests (#14318)

* Use Debezium Postgres image for CDC tests

* Formatting

* 🎉 octavia-cli: Add ability to import all resources (#14374)

* Bump Airbyte version from 0.39.32-alpha to 0.39.33-alpha (#14419)

Co-authored-by: pedroslopez <pedroslopez@users.noreply.github.com>

* 📝 MySql source: clarify tinyint to number conversion when size > 1 (#14424)

* 🪟 🐛 Fix Setup Source Button on OAuth Sources (#14413)

* don't disable setup button

* make eslint happy

* one more cleanup

* use the spec to decide how to create config object

* Bump Airbyte version from 0.39.33-alpha to 0.39.34-alpha (#14428)

Co-authored-by: timroes <timroes@users.noreply.github.com>

* [low-code cdk] Enable configurable state checkpointing (#14317)

* checkout files from test branch

* read_incremental works

* reset to master

* remove dead code

* comment

* fix

* Add test

* comments

* utc

* format

* small fix

* Add test with rfc3339

* remove unused param

* fix test

* configurable state checkpointing

* update test

* fix type hints (#14352)

* normalization: Do not return NULL for MySQL column values > 512 chars  (#11694)

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>
Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>
Co-authored-by: Evan Tahler <evan@airbyte.io>
Co-authored-by: Tim Roes <tim@airbyte.io>
Co-authored-by: Charles <charles@airbyte.io>
Co-authored-by: Jonathan Pearlin <jonathan@airbyte.io>
Co-authored-by: Amruta Ranade <11484018+Amruta-Ranade@users.noreply.github.com>
Co-authored-by: Benoit Moriceau <benoit@airbyte.io>
Co-authored-by: Jimmy Ma <jimmy@airbyte.io>
Co-authored-by: Ganpat Agarwal <gagarwal@artica.com>
Co-authored-by: Serhii Chvaliuk <grubberr@gmail.com>
Co-authored-by: Rajakavitha Kodhandapani <krajakavitha@gmail.com>
Co-authored-by: Yevhen Sukhomud <suhomud@gmail.com>
Co-authored-by: Andrii Leonets <30464745+DoNotPanicUA@users.noreply.github.com>
Co-authored-by: George Claireaux <george@claireaux.co.uk>
Co-authored-by: VitaliiMaltsev <39538064+VitaliiMaltsev@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: sw-yx <shawnthe1@gmail.com>
Co-authored-by: Baz <oleksandr.bazarnov@globallogic.com>
Co-authored-by: Octavia Squidington III <90398440+octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: alafanechere <alafanechere@users.noreply.github.com>
Co-authored-by: Eugene <etsybaev@gmail.com>
Co-authored-by: Denis Davydov <davydov.den18@gmail.com>
Co-authored-by: Anna Lvova <37615075+annalvova05@users.noreply.github.com>
Co-authored-by: Vladimir <volodymyr.s.petrov@globallogic.com>
Co-authored-by: Phlair <Phlair@users.noreply.github.com>
Co-authored-by: Parker Mossman <parker@airbyte.io>
Co-authored-by: Adam <adam-bloom@users.noreply.github.com>
Co-authored-by: Liren Tu <tuliren@gmail.com>
Co-authored-by: pmossman <pmossman@users.noreply.github.com>
Co-authored-by: Topher Lubaway <asimplechris@gmail.com>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
Co-authored-by: Peter Hu <peter@airbyte.io>
Co-authored-by: Subodh Kant Chaturvedi <subodh1810@gmail.com>
Co-authored-by: Tuhai Maksym <kimerinn@gmail.com>
Co-authored-by: Alexander Marquardt <alexander.marquardt@gmail.com>
Co-authored-by: sajarin <sajarindider@gmail.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
Co-authored-by: steve withington <steve@digitalmine.com>
Co-authored-by: Leo Sussan <leosussan@gmail.com>
Co-authored-by: cenegd <cenegd@live.com>
Co-authored-by: Tomas Perez Alvarez <72174660+Tomperez98@users.noreply.github.com>
Co-authored-by: Lake Mossman <lake@airbyte.io>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
Co-authored-by: Yurii Bidiuk <yura.bidyuk@gmail.com>
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
Co-authored-by: Teal Larson <LARSON.TEAL@GMAIL.COM>
Co-authored-by: Sophia Wiley <106352739+sophia-wiley@users.noreply.github.com>
Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com>
Co-authored-by: Jimmy Ma <gosusnp@users.noreply.github.com>
Co-authored-by: Stella Chung <schung507@gmail.com>
Co-authored-by: Amanda Murphy <amanda.murphy@heapanalytics.com>
Co-authored-by: Mohamed Magdy <mohamed.magdy@canary.is>
Co-authored-by: nataly <nataly@airbyte.io>
Co-authored-by: Tyler Russell <tylerrussell85@gmail.com>
Co-authored-by: Alexander Tsukanov <alexander.tsukanovvv@gmail.com>
Co-authored-by: Pedro S. Lopez <pedroslopez@me.com>
Co-authored-by: brianjlai <brianjlai@users.noreply.github.com>
Co-authored-by: terencecho <terence@airbyte.io>
Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: terencecho <terencecho@users.noreply.github.com>
Co-authored-by: Daniel Diamond <33811744+danieldiamond@users.noreply.github.com>
Co-authored-by: drrest <dr.rest@gmail.com>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: Abhi Vaidyanatha <abhi@airbyte.io>
Co-authored-by: Harshith Mullapudi <harshithmullapudi@gmail.com>
Co-authored-by: Zawar Khan <zawar.khan@getmercury.io>
Co-authored-by: ptiurin <petro.tiurin@firebolt.io>
Co-authored-by: Greg Solovyev <grishick@users.noreply.github.com>
Co-authored-by: lmossman <lmossman@users.noreply.github.com>
Co-authored-by: sherifnada <sherifnada@users.noreply.github.com>
Co-authored-by: Sachin Jangid <sachinjangid832@gmail.com>
Co-authored-by: Chris Wu <chris@faros.ai>
Co-authored-by: Jared Rhizor <me@jaredrhizor.com>
Co-authored-by: tison <wander4096@gmail.com>
Co-authored-by: Roberto Bonnet <robertojuarezwp@gmail.com>
Co-authored-by: Malik Diarra <malik@airbyte.io>
Co-authored-by: alovew <anne@airbyte.io>
Co-authored-by: Oleksandr Sheheda <alexandr-shegeda@users.noreply.github.com>
Co-authored-by: midavadim <midavadim@yahoo.com>
Co-authored-by: Arsen Losenko <20901439+arsenlosenko@users.noreply.github.com>
Co-authored-by: Ryan Lewon <ryan@segv.net>
Co-authored-by: Mike Balmer <remlabm@users.noreply.github.com>
Co-authored-by: Anne <102554163+alovew@users.noreply.github.com>
Co-authored-by: Liren Tu <tuliren.git@outlook.com>
Co-authored-by: Simon Späti <simu@sspaeti.com>
Co-authored-by: Albin Skott <cstruct@users.noreply.github.com>
Co-authored-by: Caleb Fornari <calebfornari@gmail.com>
Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com>
Co-authored-by: Christian Martin <christian@ctmartin.me>
Co-authored-by: jordan-glitch <65691557+jordan-glitch@users.noreply.github.com>
Co-authored-by: Daemonxiao <35677990+Daemonxiao@users.noreply.github.com>
Co-authored-by: Keith Thompson <keithjoethompson@gmail.com>
Co-authored-by: Leo Sussan <leo@reach.vote>
Co-authored-by: pedroslopez <pedroslopez@users.noreply.github.com>
Co-authored-by: timroes <timroes@users.noreply.github.com>
Co-authored-by: Johannes Nicolai <jonico@planetscale.com>
  • Loading branch information
Show file tree
Hide file tree
Showing 109 changed files with 3,005 additions and 327 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,11 @@ public DataSource build() {
* will preserve existing behavior that tests for the connection on first use, not on creation.
*/
config.setInitializationFailTimeout(Integer.MIN_VALUE);
/*
* Default timeout is 30 sec, which is too short when you work with cloud data warehouses clusters
* that can take 4-5 min to start up. Set it to 30 min to be sure
*/
config.setConnectionTimeout(30 * 60 * 1000);

connectionProperties.forEach(config::addDataSourceProperty);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@
!dbt-project-template-oracle
!dbt-project-template-clickhouse
!dbt-project-template-snowflake
!dbt-project-template-databricks
!dbt-project-template-redshift
6 changes: 6 additions & 0 deletions airbyte-integrations/bases/base-normalization/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,10 @@ task airbyteDockerSnowflake(type: Exec, dependsOn: checkSshScriptCopy) {
configure buildAirbyteDocker('snowflake')
dependsOn assemble
}
task airbyteDockerDatabricks(type: Exec, dependsOn: checkSshScriptCopy) {
configure buildAirbyteDocker('databricks')
dependsOn assemble
}
task airbyteDockerRedshift(type: Exec, dependsOn: checkSshScriptCopy) {
configure buildAirbyteDocker('redshift')
dependsOn assemble
Expand All @@ -85,6 +89,7 @@ airbyteDocker.dependsOn(airbyteDockerMySql)
airbyteDocker.dependsOn(airbyteDockerOracle)
airbyteDocker.dependsOn(airbyteDockerClickhouse)
airbyteDocker.dependsOn(airbyteDockerSnowflake)
airbyteDocker.dependsOn(airbyteDockerDatabricks)
airbyteDocker.dependsOn(airbyteDockerRedshift)

task("customIntegrationTestPython", type: PythonTask, dependsOn: installTestReqs) {
Expand All @@ -100,6 +105,7 @@ task("customIntegrationTestPython", type: PythonTask, dependsOn: installTestReqs
dependsOn ':airbyte-integrations:connectors:destination-oracle:airbyteDocker'
dependsOn ':airbyte-integrations:connectors:destination-mssql:airbyteDocker'
dependsOn ':airbyte-integrations:connectors:destination-clickhouse:airbyteDocker'
dependsOn ':airbyte-integrations:connectors:destination-databricks:airbyteDocker'
}

// DATs have some additional tests that exercise normalization code paths,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
FROM fishtownanalytics/dbt:1.0.0
COPY --from=airbyte/base-airbyte-protocol-python:0.1.1 /airbyte /airbyte

# Install SSH Tunneling dependencies
RUN apt-get update && apt-get install -y jq sshpass

WORKDIR /airbyte
COPY entrypoint.sh .
COPY build/sshtunneling.sh .

WORKDIR /airbyte/normalization_code
COPY normalization ./normalization
COPY setup.py .
COPY dbt-project-template/ ./dbt-template/
COPY dbt-project-template-databricks/* ./dbt-template/

# Install python dependencies
WORKDIR /airbyte/base_python_structs
RUN pip install .

WORKDIR /airbyte/normalization_code
RUN pip install .

WORKDIR /airbyte/normalization_code/dbt-template/
# Download external dbt dependencies
RUN pip install dbt-databricks==1.0.0
RUN dbt deps

WORKDIR /airbyte
ENV AIRBYTE_ENTRYPOINT "/airbyte/entrypoint.sh"
ENTRYPOINT ["/airbyte/entrypoint.sh"]

LABEL io.airbyte.version=0.1.73
LABEL io.airbyte.name=airbyte/normalization-databricks
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# This file is necessary to install dbt-utils with dbt deps
# the content will be overwritten by the transform function

# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: "airbyte_utils"
version: "1.0"
config-version: 2

# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: "normalize"

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that source models can be found
# in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
docs-paths: ["docs"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
seed-paths: ["data"]
macro-paths: ["macros"]

target-path: "../build" # directory which will store compiled SQL files
log-path: "../logs" # directory which will store DBT logs
packages-install-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies

clean-targets: # directories to be removed by `dbt clean`
- "build"
- "dbt_modules"

quoting:
database: true
# Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785)
# all schemas should be unquoted
schema: false
identifier: false

# You can define configurations for models in the `model-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!
models:
+transient: false
airbyte_utils:
+materialized: table
generated:
airbyte_ctes:
+tags: airbyte_internal_cte
+materialized: ephemeral
airbyte_incremental:
+tags: incremental_tables
+materialized: incremental
+incremental_strategy: merge
# schema change test is supported automatically by the merge operation
# need to be run against a cluster with spark.databricks.delta.schema.autoMerge.enabled = True
# schema merge being handled at the final step, if a schema changes in one of the primary keys
# that coalesce differently to string, unicity will be broken
+on_schema_change: "ignore"
+file_format: delta
+pre-hook: 'SET spark.databricks.delta.schema.autoMerge.enabled = True'
airbyte_tables:
+tags: normalized_tables
+materialized: table
+file_format: delta
airbyte_views:
+tags: airbyte_internal_views
+materialized: view

dispatch:
- macro_namespace: dbt_utils
search_order: ["airbyte_utils", "dbt_utils"]
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- postgres: unnest() -> https://www.postgresqltutorial.com/postgresql-array/
- MSSQL: openjson() –> https://docs.microsoft.com/en-us/sql/relational-databases/json/validate-query-and-change-json-data-with-built-in-functions-sql-server?view=sql-server-ver15
- ClickHouse: ARRAY JOIN> https://clickhouse.com/docs/zh/sql-reference/statements/select/array-join/
- Databricks: LATERAL VIEW -> https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-qry-select-lateral-view.html
#}

{# cross_join_unnest ------------------------------------------------- #}
Expand Down Expand Up @@ -50,6 +51,10 @@
cross join table(flatten({{ array_col }})) as {{ array_col }}
{%- endmacro %}

{% macro databricks__cross_join_unnest(stream_name, array_col) -%}
lateral view outer explode(from_json({{ array_col }}, 'array<string>')) as _airbyte_nested_data
{%- endmacro %}

{% macro sqlserver__cross_join_unnest(stream_name, array_col) -%}
{# https://docs.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server?view=sql-server-ver15#option-1---openjson-with-the-default-output #}
CROSS APPLY (
Expand Down Expand Up @@ -87,6 +92,10 @@
_airbyte_nested_data
{%- endmacro %}

{% macro databricks__unnested_column_value(column_col) -%}
_airbyte_nested_data
{%- endmacro %}

{% macro oracle__unnested_column_value(column_col) -%}
{{ column_col }}
{%- endmacro %}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,31 @@
{% endcall %}

{% endmacro %}

{#
This changes the behaviour of the default adapter macro, since DBT defaults to 256 when there are no explicit varchar limits
(cf : https://github.com/dbt-labs/dbt-core/blob/3996a69861d5ba9a460092c93b7e08d8e2a63f88/core/dbt/adapters/base/column.py#L91)
Since normalization code uses varchar for string type (and not text) on postgres, we need to set the max length possible when using unlimited varchars
(cf : https://dba.stackexchange.com/questions/189876/size-limit-of-character-varying-postgresql)
#}

{% macro postgres__get_columns_in_relation(relation) -%}
{% call statement('get_columns_in_relation', fetch_result=True) %}
select
column_name,
data_type,
COALESCE(character_maximum_length, 10485760),
numeric_precision,
numeric_scale

from {{ relation.information_schema('columns') }}
where table_name = '{{ relation.identifier }}'
{% if relation.schema %}
and table_schema = '{{ relation.schema }}'
{% endif %}
order by ordinal_position

{% endcall %}
{% set table = load_result('get_columns_in_relation').table %}
{{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@
{% macro oracle__current_timestamp() %}
CURRENT_TIMESTAMP
{% endmacro %}

{% macro databricks__current_timestamp() %}
CURRENT_TIMESTAMP
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@
string
{% endmacro %}

{%- macro databricks__type_json() -%}
string
{%- endmacro -%}

{%- macro redshift__type_json() -%}
{%- if redshift_super_type() -%}
super
Expand Down Expand Up @@ -91,6 +95,10 @@
INT
{% endmacro %}

{% macro databricks__type_int() %}
INT
{% endmacro %}


{# bigint ------------------------------------------------- #}
{% macro mysql__type_bigint() %}
Expand All @@ -105,6 +113,10 @@
BIGINT
{% endmacro %}

{% macro databricks__type_bigint() %}
BIGINT
{% endmacro %}


{# numeric ------------------------------------------------- --#}
{% macro mysql__type_numeric() %}
Expand All @@ -115,6 +127,10 @@
Float64
{% endmacro %}

{% macro databricks__type_numeric() %}
FLOAT
{% endmacro %}


{# timestamp ------------------------------------------------- --#}
{% macro mysql__type_timestamp() %}
Expand Down Expand Up @@ -146,6 +162,12 @@
timestamp
{% endmacro %}

{#-- Spark timestamps are already 'point in time', even if converted / stored without the original tz info, relative to session tz --#}
{#-- cf: https://docs.databricks.com/spark/latest/dataframes-datasets/dates-timestamps.html --#}
{% macro databricks__type_timestamp_with_timezone() %}
timestamp
{% endmacro %}

{#-- MySQL doesnt allow cast operation to work with TIMESTAMP so we have to use char --#}
{%- macro mysql__type_timestamp_with_timezone() -%}
char
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- Postgres: json_extract_path_text(<from_json>, 'path' [, 'path' [, ...}}) -> https://www.postgresql.org/docs/12/functions-json.html
- MySQL: JSON_EXTRACT(json_doc, 'path' [, 'path'] ...) -> https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html
- ClickHouse: JSONExtractString(json_doc, 'path' [, 'path'] ...) -> https://clickhouse.com/docs/en/sql-reference/functions/json-functions/
- Databricks: get_json_object(json_txt, 'path') -> https://spark.apache.org/docs/latest/api/sql/#get_json_object
#}

{# format_json_path -------------------------------------------------- #}
Expand Down Expand Up @@ -42,6 +43,15 @@
{{ "'$.\"" ~ json_path_list|join(".") ~ "\"'" }}
{%- endmacro %}

{% macro databricks__format_json_path(json_path_list) -%}
{# -- '$.x.y.z' #}
{%- set str_list = [] -%}
{%- for json_path in json_path_list -%}
{%- if str_list.append(json_path.replace("'", "\\'")) -%} {%- endif -%}
{%- endfor -%}
{{ "'$." ~ str_list|join(".") ~ "'" }}
{%- endmacro %}

{% macro redshift__format_json_path(json_path_list) -%}
{%- set quote = '"' if redshift_super_type() else "'" -%}
{%- set str_list = [] -%}
Expand Down Expand Up @@ -86,6 +96,14 @@
json_extract({{ from_table}}.{{ json_column }}, {{ format_json_path(json_path_list) }})
{%- endmacro %}

{% macro databricks__json_extract(from_table, json_column, json_path_list, normalized_json_path) -%}
{%- if from_table|string() == '' %}
get_json_object({{ json_column }}, {{ format_json_path(json_path_list) }})
{% else %}
get_json_object({{ from_table }}.{{ json_column }}, {{ format_json_path(json_path_list) }})
{% endif -%}
{%- endmacro %}

{% macro oracle__json_extract(from_table, json_column, json_path_list, normalized_json_path) -%}
json_value({{ json_column }}, {{ format_json_path(normalized_json_path) }})
{%- endmacro %}
Expand Down Expand Up @@ -191,6 +209,10 @@
JSONExtractRaw(assumeNotNull({{ json_column }}), {{ format_json_path(json_path_list) }})
{%- endmacro %}

{% macro databricks__json_extract_scalar(json_column, json_path_list, normalized_json_path) -%}
get_json_object({{ json_column }}, {{ format_json_path(json_path_list) }})
{%- endmacro %}

{# json_extract_array ------------------------------------------------- #}

{% macro json_extract_array(json_column, json_path_list, normalized_json_path) -%}
Expand Down Expand Up @@ -237,6 +259,10 @@
JSONExtractArrayRaw(assumeNotNull({{ json_column }}), {{ format_json_path(json_path_list) }})
{%- endmacro %}

{% macro databricks__json_extract_array(json_column, json_path_list, normalized_json_path) -%}
get_json_object({{ json_column }}, {{ format_json_path(json_path_list) }})
{%- endmacro %}

{# json_extract_string_array ------------------------------------------------- #}

{% macro json_extract_string_array(json_column, json_path_list, normalized_json_path) -%}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,43 @@
- the column _airbyte_ab_id does not exists in the normalized tables and make sure it is well populated.
#}

{%- macro get_columns_in_relation_if_exist(target_table) -%}
{{ return(adapter.dispatch('get_columns_in_relation_if_exist')(target_table)) }}
{%- endmacro -%}

{%- macro default__get_columns_in_relation_if_exist(target_table) -%}
{{ return(adapter.get_columns_in_relation(target_table)) }}
{%- endmacro -%}

{%- macro databricks__get_columns_in_relation_if_exist(target_table) -%}
{%- if target_table.schema is none -%}
{%- set found_table = True %}
{%- else -%}
{% call statement('list_table_infos', fetch_result=True) -%}
show tables in {{ target_table.schema }} like '*'
{% endcall %}
{%- set existing_tables = load_result('list_table_infos').table -%}
{%- set found_table = [] %}
{%- for table in existing_tables -%}
{%- if table.tableName == target_table.identifier -%}
{% do found_table.append(table.tableName) %}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
{%- if found_table -%}
{%- set cols = adapter.get_columns_in_relation(target_table) -%}
{{ return(cols) }}
{%- else -%}
{{ return ([]) }}
{%- endif -%}
{%- endmacro -%}

{%- macro need_full_refresh(col_ab_id, target_table=this) -%}
{%- if not execute -%}
{{ return(false) }}
{%- endif -%}
{%- set found_column = [] %}
{%- set cols = adapter.get_columns_in_relation(target_table) -%}
{%- set cols = get_columns_in_relation_if_exist(target_table) -%}
{%- for col in cols -%}
{%- if col.column == col_ab_id -%}
{% do found_column.append(col.column) %}
Expand All @@ -18,7 +49,7 @@
{%- if found_column -%}
{{ return(false) }}
{%- else -%}
{{ dbt_utils.log_info(target_table ~ "." ~ col_ab_id ~ " does not exist yet. The table will be created or rebuilt with dbt.full_refresh") }}
{{ dbt_utils.log_info(target_table ~ "." ~ col_ab_id ~ " does not exist. The table needs to be rebuilt in full_refresh") }}
{{ return(true) }}
{%- endif -%}
{%- endmacro -%}
Expand Down
Loading

0 comments on commit 0232182

Please sign in to comment.