Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Refactor Normalization docker images and upgrade to use dbt 0.21.0 #6959

Merged
merged 39 commits into from
Oct 14, 2021

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Oct 11, 2021

What

  • Decouple destination dependencies and produce multiple docker images for normalization (see context section below) This is not refactoring the code per destinations yet
    • airbyte/normalization base image
    • airbyte/normalization-mssql
    • airbyte/normalization-mysql
    • airbyte/normalization-oracle
  • Tie normalization docker image versions to Airbyte core versions (and part of docker composeBuild)
  • Improve integration test framework for normalization to be run on certain destinations (via env variable)
  • Improve integration tests by isolating schemas for the CI (avoid conflict when multiple runs of tests are running at the same time)

How

Describe the solution
Closes #2054
Closes #6872

Recommended reading order

Build related changes:

  1. airbyte-workers/src/main/java/io/airbyte/workers/normalization/NormalizationRunnerFactory.java
  2. airbyte-workers/src/main/java/io/airbyte/workers/normalization/DefaultNormalizationRunner.java
  3. airbyte-integrations/bases/base-normalization/build.gradle
  4. settings.gradle
  5. docker-compose.build.yaml
  6. airbyte-integrations/bases/base-normalization/Dockerfile
  7. the rest

DX with normalization tests:

  1. airbyte-integrations/bases/base-normalization/README.md
  2. airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py
  3. airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py
  4. airbyte-integrations/bases/base-normalization/integration_tests/test_ephemeral.py

Context

Docker images for normalization

We currently have additional packages to install inside normalization docker image due to:

  • Oracle destination
  • MySql destination
  • MSSql destination

Some of these destinations have a hard dependency on a certain dbt version (0.19.0) and can't be upgraded to more recent releases.

However, we do want to support the latest dbt versions as they come.
As the number of supported destinations for normalization grows, we may run into dbt versions conflicts.

So this PR produces a dedicated docker image for each "custom" destination (outside of dbt core support).

Docker image version

Instead of publishing a new docker image at each normalization PR, we can publish normalization images when releasing Airbyte versions. (additionally, we can run normalization in dev mode without change to codebase now)

Integration tests stability & useability

When running integration tests, we noticed flakiness in results. Maube due to concurrent runs by multiple people/CI jobs.
So a random schema where to run tests should better isolate tests.

It is now easier to select which part of normalization tests to run per destination via an env variable

@github-actions github-actions bot added area/worker Related to worker normalization labels Oct 11, 2021
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 11, 2021 17:38 Inactive
@ChristopheDuong
Copy link
Contributor Author

I'm also trying to fix Oracle integration tests (they are currently disabled)...

@github-actions github-actions bot added the area/connectors Connector related issues label Oct 11, 2021
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 11, 2021 19:11 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 11, 2021 19:16 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 11, 2021 19:18 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets October 11, 2021 19:25 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 11, 2021 19:29 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 11, 2021 19:44 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets October 11, 2021 19:45 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 12, 2021 08:37 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 12, 2021 08:56 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 12, 2021 09:06 Inactive
Copy link
Contributor

@Phlair Phlair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, lgtm! I think this makes a lot of sense given different dependencies between dests and the improved dx will be 👌

What's the reasoning behind the choices of mssql, mysql & oracle to be the ones to separate out? Is intended goal to move all destinations toward this structure within normalization?

@@ -90,9 +93,11 @@ if(!System.getenv().containsKey("SUB_BUILD") || System.getenv().get("SUB_BUILD")
include ':airbyte-integrations:connectors:destination-oracle'
include ':airbyte-integrations:connectors:destination-mssql'

//Needed by destination-bugquery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to think this was on purpose 😆

Comment on lines +26 to +27
if DestinationType.POSTGRES.value not in destinations_to_test:
destinations_to_test.append(DestinationType.POSTGRES.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for enforcing postgres like this?

Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some test functions in that class are explicitly requiring postgres destination (but omit tests on other destinations)

@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Oct 14, 2021

What's the reasoning behind the choices of mssql, mysql & oracle to be the ones to separate out? Is intended goal to move all destinations toward this structure within normalization?

normalization for "regular" destinations is based on dbt's core "mono-repo" so we don't actually need to produce multiple docker images there, the code base is the same with the macros making the slight tweaks in syntax.

Unfortunately for mssql, mysql and oracle, those are community contributed adapters to dbt, so they each live in a separate git repo and have their own dbt version dependency which may be different from each other, and not always in sync with core dbt...
That's why I am splitting them out so we are not held back to only a single core dbt version.

  • Latest dbt core is 0.21.0
  • MSSql is at 0.20.1
  • MySql is at 0.19.0
  • Oracle is at 0.19.1

@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 14, 2021 13:45 Inactive
@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Oct 14, 2021

/test connector=connectors/destination-postgres

🕑 connectors/destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1342057920
✅ connectors/destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1342057920
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     120      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 370    218    41%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     29    79%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1062    403    62%

@jrhizor jrhizor temporarily deployed to more-secrets October 14, 2021 13:48 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets October 14, 2021 13:48 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 14, 2021 13:52 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 14, 2021 14:54 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 14, 2021 15:14 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets October 14, 2021 15:15 Inactive
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 14, 2021 15:44 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets October 14, 2021 15:53 Inactive
@ChristopheDuong
Copy link
Contributor Author

ChristopheDuong commented Oct 14, 2021

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1342893212
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1342893212
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              200     94    53%
	 source_acceptance_test/tests/test_full_refresh.py       18     11    39%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  41     24    41%
	 source_acceptance_test/utils/compare.py                 47     20    57%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  860    419    51%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     120      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 370    218    41%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     29    79%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1062    403    62%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            83     59    29%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     19    42%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        660    393    40%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     120      6    95%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 370    218    41%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     29    79%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1062    403    62%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    12      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     12    92%
	 normalization/transform_catalog/destination_name_transformer.py     120      4    97%
	 normalization/transform_catalog/reserved_keywords.py                 11      0   100%
	 normalization/transform_catalog/stream_processor.py                 370     32    91%
	 normalization/transform_catalog/table_name_registry.py              174     51    71%
	 normalization/transform_catalog/transform.py                         45     30    33%
	 normalization/transform_catalog/utils.py                             33      0   100%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         140     45    68%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1062    180    83%

@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 14, 2021 17:41 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets October 14, 2021 17:42 Inactive
@ChristopheDuong ChristopheDuong merged commit c462055 into master Oct 14, 2021
@ChristopheDuong ChristopheDuong deleted the chris/normalization-clean-up-split branch October 14, 2021 18:29
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets October 14, 2021 18:30 Inactive
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
…irbytehq#6959)

* Split normalization docker images for some connectors with specifics dependencies

* Regenerate (airbytehq#7003)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/worker Related to worker normalization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade dbt version used by normalization DX on normalization use dev images when Airbyte is running in dev
5 participants