Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling Normalization for DuckDB #22566

Closed
wants to merge 19 commits into from
Closed

Conversation

sspaeti
Copy link
Contributor

@sspaeti sspaeti commented Feb 8, 2023

What

Enabling Normalization that got disabled temporarily (see https://github.com/airbytehq/airbyte/pull/22528/files) and fix normalization integration tests. E.g. setup_duck_db missing.

Can be tested with:

NORMALIZATION_TEST_TARGET=duckdb  pytest integration_tests/test_normalization.py

How

Adding setup_duck_db and fixing integration tests that failed.

Recommended reading order

  1. destination_definitions.yaml
  2. dbt_integration_test.py

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

@sspaeti sspaeti requested a review from a team as a code owner February 8, 2023 15:51
@sspaeti sspaeti self-assigned this Feb 8, 2023
@sspaeti sspaeti marked this pull request as draft February 8, 2023 15:51
@sspaeti sspaeti temporarily deployed to more-secrets February 8, 2023 16:03 — with GitHub Actions Inactive
@sspaeti sspaeti temporarily deployed to more-secrets February 8, 2023 16:03 — with GitHub Actions Inactive
@sspaeti sspaeti temporarily deployed to more-secrets February 10, 2023 15:15 — with GitHub Actions Inactive
@sspaeti sspaeti temporarily deployed to more-secrets February 10, 2023 15:15 — with GitHub Actions Inactive
@sspaeti
Copy link
Contributor Author

sspaeti commented Feb 13, 2023

This PR will be revisited after the PR #22381 is merged.

I added the missing setup_duck_db and the tests ./gradlew :airbyte-integrations:bases:base-normalization:airbyteDockerDuckDB are green:

BUILD SUCCESSFUL in 17s
67 actionable tasks: 14 executed, 53 up-to-date
S3 cache 663ms wasted on misses, reads: 1, elapsed: 663ms

But running NORMALIZATION_TEST_TARGET=tidb pytest integration_tests/test_normalization.py directly, I see the error:

FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.DUCKDB-test_nested_streams] - AssertionError: assert False
FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.DUCKDB-test_simple_streams] - AssertionError: assert False

I'm unsure how to debug or fix these errors. Any help @edgao or @ryankfu would be appreciated when you have more air to breathe. What I noted thought, that I get the same errors for TIDB:

NORMALIZATION_TEST_TARGET=tidb  pytest integration_tests/test_normalization.py
...
FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.TIDB-test_nested_streams] - AssertionError: assert False
FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.TIDB-test_simple_streams] - AssertionError: assert False

So either it's a general error, or both have the same to fix. Any pointer or hint is greatly appreciated, as the end-to-end test was successful.

@edgao
Copy link
Contributor

edgao commented Feb 17, 2023

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4205429374
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4205429374
🐛 https://gradle.com/s/akvhqehqn24go

Build Failed

Test summary info:

	 =========================== short test summary info ============================
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.ORACLE does not support incremental sync with schema change yet
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.TIDB does not support incremental sync with schema change yet
	 SKIPPED [3] integration_tests/test_ephemeral.py:102: ephemeral materialization isn't supported in ClickHouse yet
	 SKIPPED [1] integration_tests/test_ephemeral.py:59: Skipping test for column limit, because in MySQL, the max number of columns is limited by row size (8KB)
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.CLICKHOUSE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:149: DestinationType.CLICKHOUSE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:152: DestinationType.MSSQL is disabled as it doesnt fully support schema change in incremental yet
	 SKIPPED [2] integration_tests/test_normalization.py:135: DestinationType.MYSQL does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.ORACLE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:135: DestinationType.ORACLE does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:149: DestinationType.SNOWFLAKE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:149: DestinationType.TIDB is disabled as it doesnt support schema change in incremental yet (column type changes)
	 FAILED integration_tests/test_drop_scd_overwrite.py::test_reset_scd_on_overwrite[DestinationType.BIGQUERY]
	 FAILED integration_tests/test_drop_scd_overwrite.py::test_reset_scd_on_overwrite[DestinationType.DUCKDB]
	 FAILED integration_tests/test_ephemeral.py::test_destination_supported_limits[DestinationType.BIGQUERY-1000]
	 FAILED integration_tests/test_ephemeral.py::test_destination_supported_limits[DestinationType.DUCKDB-1000]
	 FAILED integration_tests/test_ephemeral.py::test_destination_failure_over_limits[BigQuery-3000-The view is too large.]
	 FAILED integration_tests/test_ephemeral.py::test_empty_streams[DestinationType.BIGQUERY]
	 FAILED integration_tests/test_ephemeral.py::test_empty_streams[DestinationType.DUCKDB]
	 FAILED integration_tests/test_ephemeral.py::test_stream_with_1_airbyte_column[DestinationType.BIGQUERY]
	 FAILED integration_tests/test_ephemeral.py::test_stream_with_1_airbyte_column[DestinationType.DUCKDB]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.BIGQUERY-test_nested_streams]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.BIGQUERY-test_simple_streams]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.DUCKDB-test_nested_streams]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.DUCKDB-test_simple_streams]
	 �[31m============ �[31m�[1m13 failed�[0m, �[32m40 passed�[0m, �[33m15 skipped�[0m�[31m in 3768.19s (1:02:48)�[0m�[31m ============�[0m

@edgao edgao temporarily deployed to more-secrets February 17, 2023 16:03 — with GitHub Actions Inactive
@edgao edgao temporarily deployed to more-secrets February 17, 2023 16:03 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Feb 17, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

❌ Destinations (16)

Connector Version Changelog Publish
destination-bigquery 1.2.14
destination-bigquery-denormalized 1.2.14
destination-clickhouse 0.2.2
destination-clickhouse-strict-encrypt 0.2.2 🔵
(ignored)
🔵
(ignored)
destination-jdbc 0.3.14 🔵
(ignored)
🔵
(ignored)
destination-mssql 0.1.22
destination-mssql-strict-encrypt 0.1.22 🔵
(ignored)
🔵
(ignored)
destination-mysql 0.1.20
destination-mysql-strict-encrypt 0.1.21
(mismatch: 0.1.20)
🔵
(ignored)
🔵
(ignored)
destination-oracle 0.1.19
destination-oracle-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
destination-postgres 0.3.26
destination-postgres-strict-encrypt 0.3.26 🔵
(ignored)
🔵
(ignored)
destination-redshift 0.4.0
destination-snowflake 0.4.49
destination-tidb 0.1.0
  • See "Actionable Items" below for how to resolve warnings and errors.

👀 Other Modules (1)

  • base-normalization

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryankfu were there other items you wanted to bring up for duckdb? I think this pr looks reasonable, assuming tests pass (modulo comments) but I'm also pretty behind on this whole thing

@ryankfu
Copy link
Contributor

ryankfu commented Feb 17, 2023

@edgao Did not have any other remarks for getting DuckDB normalization back for the connector. Was mostly waiting on the fix for mitigating deeply nested objects and the freeze to be lifted before revisiting this PR but agree that this PR makes sense to me and reverts the filter logic you had in place

sspaeti and others added 2 commits February 20, 2023 18:40
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
@sspaeti sspaeti marked this pull request as ready for review February 20, 2023 17:48
@sspaeti
Copy link
Contributor Author

sspaeti commented Feb 20, 2023

I removed suggested integration tests and all is green.

Testing locally - I can't activate normalization yet - how would I clear existing configs (in case they only get overwritten initially)? Last time I tried to delete local actor_definition table, does this still work:
image

Minor Logo Question:

Plus a minor, the logo looks fine when building locally, but didn't when merged in master. Will try again after merge.
This is how it looks building locally:
image

and this from master:
image

@sspaeti
Copy link
Contributor Author

sspaeti commented Feb 21, 2023

Hey @ryankfu, thanks so much for your and @edgao's help.

Regarding the missing normalization option (Image above) in the connection: if I rebuild duckdb destination as I mentioned below, should I see the option to choose normalization? Is something still missing to activate, or is that more of a local database problem (although I deleted all and restarted)?

docker-compose down -v
docker image rm airbyte/destination-duckdb:0.1.0
./gradlew :airbyte-integrations:connectors:destination-duckdb:build && docker tag airbyte/destination-duckdb:0.1.0
docker tag airbyte/destination-duckdb:dev airbyte/destination-duckdb:0.1.0
docker image ls | grep duckdb
airbyte/destination-duckdb           0.1.0      502c829befb2   13 minutes ago   928MB
airbyte/destination-duckdb           dev        502c829befb2   13 minutes ago   928MB
SUB_BUILD=PLATFORM ./gradlew build\nVERSION=dev docker-compose up

I also tried to delete actor_definition, without luck. I guess these are not the master anymore with the latest changes?

--delete specs for destination: get reinserted during start-up
delete
FROM public.actor_definition
where lower(name) like '%duck%'

Any suggestions maybe? Not that the same happens when merged into master

@edgao
Copy link
Contributor

edgao commented Feb 22, 2023

hmmmmmmmmmmmmmm

things are pretty different after the monorepo shift and I'm still learning the new world myself

asking the conn ops folks here https://airbytehq-team.slack.com/archives/C03VDJ4FMJB/p1677027104919629

@edgao
Copy link
Contributor

edgao commented Feb 22, 2023

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4245230307

@edgao
Copy link
Contributor

edgao commented Feb 23, 2023

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4248093657

@edgao
Copy link
Contributor

edgao commented Feb 23, 2023

something about this is being spooky on macos. The destination connector is complaining that /local/test.duckdb doesn't exist - which is expected, I think? Since we're relying on it to create that file

rerunning on CI in case this is specific to macos, but I do think we should make the tests pass locally as well.

also fyi - you can run the normalization testcases locally. E.g. pytest -v 'integration_tests/test_normalization.py::test_normalization[DestinationType.DUCKDB-test_simple_streams]', or ./gradlew :airbyte-integrations:bases:base-normalization:integrationTest to run the full suite

logs are pretty confusing to parse, but this message (from the destination connector, i.e. before even running normalization) has the problem:

{"type": "TRACE", "trace": {"type": "ERROR", "emitted_at": 1677108815431.992, "error": {"message": "Something went wrong in the connector. See the logs for more details.", "internal_message": "IO Error: Cannot open file \"/local/test.duckdb\": No such file or directory", "stack_trace": "Traceback (most recent call last):\n  File \"/airbyte/integration_code/main.py\", line 11, in <module>\n    DestinationDuckdb().run(sys.argv[1:])\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/destinations/destination.py\", line 119, in run\n    for message in output_messages:\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/destinations/destination.py\", line 113, in run_cmd\n    yield from self._run_write(config=config, configured_catalog_path=parsed_args.catalog, input_stream=wrapped_stdin)\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/destinations/destination.py\", line 49, in _run_write\n    yield from self.write(config=config, configured_catalog=catalog, input_messages=input_messages)\n  File \"/airbyte/integration_code/destination_duckdb/destination.py\", line 64, in write\n    con = duckdb.connect(database=path, read_only=False)\nduckdb.IOException: IO Error: Cannot open file \"/local/test.duckdb\": No such file or directory\n", "failure_type": "system_error"}}}

and you can pull out the stacktrace:

Traceback (most recent call last):
  File "/airbyte/integration_code/main.py", line 11, in <module>
    DestinationDuckdb().run(sys.argv[1:])
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/destinations/destination.py", line 119, in run
    for message in output_messages:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/destinations/destination.py", line 113, in run_cmd
    yield from self._run_write(config=config, configured_catalog_path=parsed_args.catalog, input_stream=wrapped_stdin)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/destinations/destination.py", line 49, in _run_write
    yield from self.write(config=config, configured_catalog=catalog, input_messages=input_messages)
  File "/airbyte/integration_code/destination_duckdb/destination.py", line 64, in write
    con = duckdb.connect(database=path, read_only=False)
duckdb.IOException: IO Error: Cannot open file "/local/test.duckdb": No such file or directory

@edgao
Copy link
Contributor

edgao commented Feb 23, 2023

ah, I forgot that the setup_duckdb method is supposed to create secrets/duckdb.json >.> which is why CI failed

@sspaeti
Copy link
Contributor Author

sspaeti commented Feb 27, 2023

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4280929343

@sspaeti
Copy link
Contributor Author

sspaeti commented Feb 27, 2023

logs are pretty confusing to parse, but this message (from the destination connector, i.e. before even running normalization) has the problem: Cannot open file \"/local/test.duckdb\"

@edgao Isn't this error expected if you run with pytest as /local is only available inside docker. Same as sqlite-destination, this test is expected to fail IMO. As ./gradlew :airbyte-integrations:bases:base-normalization:airbyteDockerDuckDB does pass all tests.

And I couldn't get normalization enabled as with mono repo split. Easier to merge and check in master if it works? (it would not break anything, except adding the option for normalization that hopefully works).

@sspaeti
Copy link
Contributor Author

sspaeti commented Feb 27, 2023

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4283771249

@edgao
Copy link
Contributor

edgao commented Mar 3, 2023

if you run with pytest

pytest will still run the destination-duckdb container, it doesn't run the destination code directly. ./gradlew :airbyte-integrations:bases:base-normalization:airbyteDockerDuckDB doesn't run the integration tests, it only runs the unit tests

I'll try and set this up locally tomorrow (mostly been dealing with oc stuff for the past few days...). If it works then we should make a separate pr with just the destination_definitions.yaml change (i.e. enable duckdb normalization in OSS) - everything else is just to keep the normalization build green, so we shouldn't merge it until it succeeds

@edgao
Copy link
Contributor

edgao commented Mar 8, 2023

hm, I finally got a sync to run (followed instructions here to get the deployment working) but it didn't seem to quite work as expected. The source emitted nonzero records:

2023-03-08 19:41:34 source > Read 1341 records from payment_intents stream

but the raw table was empty (which is weird? b/c this is just the connector itself, we haven't even run normalization yet)

D select * from _airbyte_raw_payment_intents;
┌────────────────┬─────────────────────┬───────────────┐
│ _airbyte_ab_id │ _airbyte_emitted_at │ _airbyte_data │
│    varchar     │        json         │     json      │
├──────────────────────────────────────────────────────┤
│                        0 rows                        │
└──────────────────────────────────────────────────────┘

and then normalization failed:

2023-03-08 19:43:36 normalization > Completed with 1 error and 0 warnings:
2023-03-08 19:43:36 normalization > Runtime Error in model payment_intents (models/generated/airbyte_tables/main/payment_intents.sql)
2023-03-08 19:43:36 normalization >   Binder Error: Catalog "main" does not exist!

this was using destination-duckdb:0.1.0 + normalization-duckdb:0.2.26, i.e. what's available on the master branch.

full logs bda075de_9bc7_4495_a31a_60dff5d8bcf7_logs_53_txt.txt

also - it looks like duckdb has already released a new version, with a different storage format, do we need to release a new version of destination-duckdb?

$ ./duckdb /tmp/airbyte_local/test.duckdb
\Error: unable to open database "/tmp/airbyte_local/test.duckdb": IO Error: Trying to read a database file with version number 39, but we can only read version 43.
The database file was created with DuckDB version v0.6.0 or v0.6.1.

@sspaeti
Copy link
Contributor Author

sspaeti commented Mar 9, 2023

Hi @edgao
Thanks so much for testing this.

Which source connector do you use? And how does your duckdb path look like? /local/test.duckdb or similar? And if so, was that file created in /tmp/local_airbyte/test.duckdb?

I normally use Faker which worked with and without normalization at some point.

Thanks for the pointer, I tried to build this feature branch with these instructions. Unfortunately, they do not work for me (see Fig. 1).

Versioning

Regarding new versions of DuckDB, that's a good question that I asked myself as well. DuckDB is updating quite often (every 3-4 weeks). The best would be if we would somehow automatically build different docker images for different versions and this could be specified inside the settings of the destination (same as schema, path, etc.). Not sure if that is feasible. For now, we'd need manually

Fig. 1:
image

@edgao
Copy link
Contributor

edgao commented Mar 9, 2023

interesting. I was using just test.duckdb, which passed connection tests (and generated the /tmp/airbyte_local/test.duckdb file correctly). Switching to /local/test.duckdb fixed it. That feels like a footgun - at minimum we should document it, and ideally this would cause the check method to fail. (or write should be able to handle this case)

confirmed that the destination is able to write to the raw table, using /local/test.duckdb.

but now I'm running into errors with the storage version in normalization :/ it looks like the connector is writing a version 0.6.x duckdb file, but normalization wants to read 0.7.x?

2023-03-09 16:16:22 normalization > Encountered an error:
Runtime Error
  IO Error: Trying to read a database file with version number 39, but we can only read version 43.
  The database file was created with DuckDB version v0.6.0 or v0.6.1.
  
  The storage of DuckDB is not yet stable; newer versions of DuckDB cannot read old database files and vice versa.
  The storage will be stabilized when version 1.0 releases.
  
  For now, we recommend that you load the database file in a supported version of DuckDB, and use the EXPORT DATABASE command followed by IMPORT DATABASE on the current version of DuckDB.
  
  See the storage page for more information: https://duckdb.org/internals/storage

unclear why this is happening - I'm using our official images, not even locally-built versions:

2023-03-09 16:13:32 INFO i.a.c.i.LineGobbler(voidCall):149 - Checking if airbyte/destination-duckdb:0.1.0 exists...
<snip...>
2023-03-09 16:15:09 INFO i.a.w.t.s.NormalizationActivityImpl(lambda$normalize$3):166 - Using normalization: airbyte/normalization-duckdb:0.2.26

@edgao
Copy link
Contributor

edgao commented Mar 9, 2023

I think it's expected that the gradle build will create new images? B/c it needs to pull in the new catalog json file? not super familiar with how this process works :/

@sspaeti
Copy link
Contributor Author

sspaeti commented Mar 9, 2023

That's a good catch. It handles when the path doesn't start with /local (see Fig. 1), but it doesn't when there is no path (I should add that yes).

Version error

I can't test end to end, but I committed a change to fix the versions for airbyte/destination-duckdb and airbyte/normalization-duckdb.

I think it's expected that the gradle build will create new images?

Yes, that would need a new image built with Gradle (and push to docker hub?). When building with Gradle locally (see below), versions look good on my end.

images built with Gradle:

./gradlew :airbyte-integrations:connectors:destination-duckdb:build
./gradlew :airbyte-integrations:bases:base-normalization:airbyteDockerDuckDB
docker image ls | grep duckdb
airbyte/normalization-duckdb         dev        c1299097b448   12 minutes ago   867MB
airbyte/destination-duckdb           dev        43cd23a0fa4e   6 hours ago      929MB

DuckDB version in destination image:

docker run -it --entrypoint /bin/bash airbyte/destination-duckdb:dev
root@27da9d164eda:/airbyte/integration_code# pip freeze | grep duckdb
destination-duckdb @ file:///airbyte/integration_code
duckdb==0.7.1

And DuckDB and dbt-duckdb versions in normalization image:

docker run -it --entrypoint /bin/bash airbyte/normalization-duckdb:dev
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
root@d58b42ca7926:/airbyte# pip freeze | grep duckdb
dbt-duckdb==1.4.0
duckdb==0.7.1

Catalog file

B/c it needs to pull in the new catalog json file? not super familiar with how this process works :/

Regarding the "new catalog json" is also where I'm struggling. I still don't see "Normalization" as an option in the DuckDB destination, even though we activated it (I followed the two pointers in Slack, and the one from the docs above, no luck 😢)

Fig. 1:
image

@edgao
Copy link
Contributor

edgao commented Mar 11, 2023

hm, that commit is updating the normalization image? i.e. I'm guessing you meant to push a different commit for the connector? (and yeah, we'll need to /publish whichever one gets updated)

though... maybe it's more correct to pin both the connector and normalization to the same version? Because otherwise syncs could fail (if we publish one but not the other)

@edgao
Copy link
Contributor

edgao commented Mar 11, 2023

also, no idea what's up with you running platform :( just to doublecheck, here's pretty much my exact process:

  1. cd ~/code/airbyte
  2. SUB_BUILD=ALL_CONNECTORS ./gradlew :airbyte-config:specs:generateOssConnectorCatalog
  3. cp airbyte-config/init/src/main/resources/seed/oss_catalog.json ~/code/airbyte-platform-internal/oss/airbyte-config/init/src/main/resources/seed/local_dev_catalog.json
    1. note the airbyte-platform-internal/oss - I think the docs page needs minor tweaking
  4. cd ~/code/airbyte-platform-internal/oss (again, this is in the oss subdir)
  5. ./gradlew build -x test
  6. VERSION=dev docker compose up

(which, I realize after typing it out, is different because I'm working off the internal copy of platform >.> )

@CLAassistant
Copy link

CLAassistant commented Jul 11, 2023

CLA assistant check
All committers have signed the CLA.

@aaronsteers
Copy link
Collaborator

aaronsteers commented Oct 17, 2024

Closing out this stale PR. While normalization is deprecated now, we have typing and deduping coming soon:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants