-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling Normalization for DuckDB #22566
Conversation
This PR will be revisited after the PR #22381 is merged. I added the missing
But running
I'm unsure how to debug or fix these errors. Any help @edgao or @ryankfu would be appreciated when you have more air to breathe. What I noted thought, that I get the same errors for TIDB:
So either it's a general error, or both have the same to fix. Any pointer or hint is greatly appreciated, as the end-to-end test was successful. |
/test connector=bases/base-normalization
Build FailedTest summary info:
|
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
❌ Destinations (16)
Connector | Version | Changelog | Publish |
---|---|---|---|
destination-bigquery |
1.2.14 |
✅ | ✅ |
destination-bigquery-denormalized |
1.2.14 |
✅ | ✅ |
destination-clickhouse |
0.2.2 |
✅ | ✅ |
destination-clickhouse-strict-encrypt |
0.2.2 |
🔵 (ignored) |
🔵 (ignored) |
destination-jdbc |
0.3.14 |
🔵 (ignored) |
🔵 (ignored) |
destination-mssql |
0.1.22 |
✅ | ✅ |
destination-mssql-strict-encrypt |
0.1.22 |
🔵 (ignored) |
🔵 (ignored) |
destination-mysql |
0.1.20 |
✅ | ✅ |
destination-mysql-strict-encrypt |
❌ 0.1.21 (mismatch: 0.1.20 ) |
🔵 (ignored) |
🔵 (ignored) |
destination-oracle |
0.1.19 |
✅ | ✅ |
destination-oracle-strict-encrypt |
0.1.19 |
🔵 (ignored) |
🔵 (ignored) |
destination-postgres |
0.3.26 |
✅ | ✅ |
destination-postgres-strict-encrypt |
0.3.26 |
🔵 (ignored) |
🔵 (ignored) |
destination-redshift |
0.4.0 |
✅ | ✅ |
destination-snowflake |
0.4.49 |
✅ | ✅ |
destination-tidb |
0.1.0 |
✅ | ✅ |
- See "Actionable Items" below for how to resolve warnings and errors.
👀 Other Modules (1)
- base-normalization
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryankfu were there other items you wanted to bring up for duckdb? I think this pr looks reasonable, assuming tests pass (modulo comments) but I'm also pretty behind on this whole thing
airbyte-config/init/src/main/resources/seed/destination_definitions.yaml
Outdated
Show resolved
Hide resolved
airbyte-integrations/bases/base-normalization/integration_tests/config.json
Outdated
Show resolved
Hide resolved
airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py
Show resolved
Hide resolved
@edgao Did not have any other remarks for getting DuckDB normalization back for the connector. Was mostly waiting on the fix for mitigating deeply nested objects and the freeze to be lifted before revisiting this PR but agree that this PR makes sense to me and reverts the filter logic you had in place |
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
Hey @ryankfu, thanks so much for your and @edgao's help. Regarding the missing normalization option (Image above) in the connection: if I rebuild duckdb destination as I mentioned below, should I see the option to choose normalization? Is something still missing to activate, or is that more of a local database problem (although I deleted all and restarted)? docker-compose down -v
docker image rm airbyte/destination-duckdb:0.1.0
./gradlew :airbyte-integrations:connectors:destination-duckdb:build && docker tag airbyte/destination-duckdb:0.1.0
docker tag airbyte/destination-duckdb:dev airbyte/destination-duckdb:0.1.0
docker image ls | grep duckdb
airbyte/destination-duckdb 0.1.0 502c829befb2 13 minutes ago 928MB
airbyte/destination-duckdb dev 502c829befb2 13 minutes ago 928MB
SUB_BUILD=PLATFORM ./gradlew build\nVERSION=dev docker-compose up I also tried to delete --delete specs for destination: get reinserted during start-up
delete
FROM public.actor_definition
where lower(name) like '%duck%' Any suggestions maybe? Not that the same happens when merged into master |
hmmmmmmmmmmmmmm things are pretty different after the monorepo shift and I'm still learning the new world myself asking the conn ops folks here https://airbytehq-team.slack.com/archives/C03VDJ4FMJB/p1677027104919629 |
/test connector=bases/base-normalization
|
/test connector=bases/base-normalization
|
something about this is being spooky on macos. The destination connector is complaining that rerunning on CI in case this is specific to macos, but I do think we should make the tests pass locally as well. also fyi - you can run the normalization testcases locally. E.g. logs are pretty confusing to parse, but this message (from the destination connector, i.e. before even running normalization) has the problem: {"type": "TRACE", "trace": {"type": "ERROR", "emitted_at": 1677108815431.992, "error": {"message": "Something went wrong in the connector. See the logs for more details.", "internal_message": "IO Error: Cannot open file \"/local/test.duckdb\": No such file or directory", "stack_trace": "Traceback (most recent call last):\n File \"/airbyte/integration_code/main.py\", line 11, in <module>\n DestinationDuckdb().run(sys.argv[1:])\n File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/destinations/destination.py\", line 119, in run\n for message in output_messages:\n File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/destinations/destination.py\", line 113, in run_cmd\n yield from self._run_write(config=config, configured_catalog_path=parsed_args.catalog, input_stream=wrapped_stdin)\n File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/destinations/destination.py\", line 49, in _run_write\n yield from self.write(config=config, configured_catalog=catalog, input_messages=input_messages)\n File \"/airbyte/integration_code/destination_duckdb/destination.py\", line 64, in write\n con = duckdb.connect(database=path, read_only=False)\nduckdb.IOException: IO Error: Cannot open file \"/local/test.duckdb\": No such file or directory\n", "failure_type": "system_error"}}} and you can pull out the stacktrace:
|
ah, I forgot that the setup_duckdb method is supposed to create |
/test connector=bases/base-normalization
|
@edgao Isn't this error expected if you run with pytest as And I couldn't get normalization enabled as with mono repo split. Easier to merge and check in master if it works? (it would not break anything, except adding the option for normalization that hopefully works). |
/test connector=bases/base-normalization
|
pytest will still run the I'll try and set this up locally tomorrow (mostly been dealing with oc stuff for the past few days...). If it works then we should make a separate pr with just the destination_definitions.yaml change (i.e. enable duckdb normalization in OSS) - everything else is just to keep the normalization build green, so we shouldn't merge it until it succeeds |
hm, I finally got a sync to run (followed instructions here to get the deployment working) but it didn't seem to quite work as expected. The source emitted nonzero records:
but the raw table was empty (which is weird? b/c this is just the connector itself, we haven't even run normalization yet)
and then normalization failed:
this was using full logs bda075de_9bc7_4495_a31a_60dff5d8bcf7_logs_53_txt.txt also - it looks like duckdb has already released a new version, with a different storage format, do we need to release a new version of destination-duckdb?
|
Hi @edgao Which source connector do you use? And how does your duckdb path look like? I normally use Faker which worked with and without normalization at some point. Thanks for the pointer, I tried to build this feature branch with these instructions. Unfortunately, they do not work for me (see Fig. 1). VersioningRegarding new versions of DuckDB, that's a good question that I asked myself as well. DuckDB is updating quite often (every 3-4 weeks). The best would be if we would somehow automatically build different docker images for different versions and this could be specified inside the settings of the destination (same as schema, path, etc.). Not sure if that is feasible. For now, we'd need manually |
interesting. I was using just confirmed that the destination is able to write to the raw table, using but now I'm running into errors with the storage version in normalization :/ it looks like the connector is writing a version 0.6.x duckdb file, but normalization wants to read 0.7.x?
unclear why this is happening - I'm using our official images, not even locally-built versions:
|
I think it's expected that the gradle build will create new images? B/c it needs to pull in the new catalog json file? not super familiar with how this process works :/ |
…vert to fetch from setup.py
That's a good catch. It handles when the path doesn't start with Version errorI can't test end to end, but I committed a change to fix the versions for
Yes, that would need a new image built with Gradle (and push to docker hub?). When building with Gradle locally (see below), versions look good on my end. images built with Gradle:
DuckDB version in destination image: docker run -it --entrypoint /bin/bash airbyte/destination-duckdb:dev
root@27da9d164eda:/airbyte/integration_code# pip freeze | grep duckdb
destination-duckdb @ file:///airbyte/integration_code
duckdb==0.7.1 And DuckDB and dbt-duckdb versions in normalization image: docker run -it --entrypoint /bin/bash airbyte/normalization-duckdb:dev
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
root@d58b42ca7926:/airbyte# pip freeze | grep duckdb
dbt-duckdb==1.4.0
duckdb==0.7.1 Catalog file
Regarding the "new catalog json" is also where I'm struggling. I still don't see "Normalization" as an option in the DuckDB destination, even though we activated it (I followed the two pointers in Slack, and the one from the docs above, no luck 😢) |
hm, that commit is updating the normalization image? i.e. I'm guessing you meant to push a different commit for the connector? (and yeah, we'll need to though... maybe it's more correct to pin both the connector and normalization to the same version? Because otherwise syncs could fail (if we publish one but not the other) |
also, no idea what's up with you running platform :( just to doublecheck, here's pretty much my exact process:
(which, I realize after typing it out, is different because I'm working off the internal copy of platform >.> ) |
Closing out this stale PR. While normalization is deprecated now, we have typing and deduping coming soon: |
What
Enabling Normalization that got disabled temporarily (see https://github.com/airbytehq/airbyte/pull/22528/files) and fix normalization integration tests. E.g.
setup_duck_db
missing.Can be tested with:
How
Adding
setup_duck_db
and fixing integration tests that failed.Recommended reading order
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes