Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genesis: Provide dbt adapter for CrateDB #2

Merged
merged 32 commits into from
Dec 21, 2024
Merged

Genesis: Provide dbt adapter for CrateDB #2

merged 32 commits into from
Dec 21, 2024

Conversation

amotl
Copy link
Member

@amotl amotl commented Nov 20, 2024

About

Provide dbt adapter for CrateDB, carved out of the canonical PostgreSQL adapter dbt-postgres, making a few test cases succeed.

Details

Unit Tests: 34 passed, 2 skipped
Integration Tests: 426 passed, 103 skipped

References

/cc @surister, @kneth

CrateDB does not understand:
- SQL's `CREATE SCHEMA` and `DROP SCHEMA` operations
- SQL's `EXCEPT` operation
- SQL's `CASCADE` modifier
- SQL's `CREATE TEMPORARY TABLE
- PostgreSQL's synthetic `pg_rewrite` table
- PostgreSQL's `pg_is_other_temp_schema()` function

dbt features disabled:
- Snapshots

CrateDB needs:
- Write synchronization using `REFRESH TABLE ...`
- Use dbt-core>=1.9.0b4
- Add configuration snippets for black, flake8, mypy, pytest
@amotl amotl force-pushed the genesis branch 2 times, most recently from 54f1401 to 6d37e3f Compare November 20, 2024 03:47
@amotl amotl force-pushed the genesis branch 5 times, most recently from 580a70e to b7146f7 Compare November 24, 2024 02:13
CrateDB does not support MATERIALIZED VIEW.
@amotl amotl requested a review from hammerhead November 26, 2024 23:07
Comment on lines 37 to +41
## Getting started

- [Install dbt](https://docs.getdbt.com/docs/installation)
- Read the [introduction](https://docs.getdbt.com/docs/introduction/) and [viewpoint](https://docs.getdbt.com/docs/about/viewpoint/)
- [Install dbt](https://docs.getdbt.com/docs/core/installation-overview)
- Read the [introduction](https://docs.getdbt.com/docs/introduction/) and
[viewpoint](https://docs.getdbt.com/community/resources/viewpoint)
Copy link
Member Author

@amotl amotl Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also link to the new canonical integration page at the CrateDB Guide here.

Copy link
Member Author

@amotl amotl Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been added per 0147433.

@amotl amotl marked this pull request as ready for review November 27, 2024 01:05
@amotl amotl force-pushed the genesis branch 3 times, most recently from 2c00602 to 915f3e0 Compare November 27, 2024 21:09
Comment on lines 11 to 60
rename_table_sql = """
{% set exists, table_source = get_or_create_relation(database=None, schema=target.schema, identifier="table_source", type="table") %}
{% set exists, table_target = get_or_create_relation(database=None, schema=target.schema, identifier="table_target", type="table") %}

{# Create table `source`. #}
{% set sql = get_create_table_as_sql(False, table_source, "SELECT 42 as value") %}
{% do run_query(sql) %}

{# Rename table to `target`. #}
{% do adapter.rename_relation(table_source, table_target) %}

-- Dummy statement.
SELECT TRUE AS id
"""


rename_view_sql = """
{% set exists, view_source = get_or_create_relation(database=None, schema=target.schema, identifier="view_source", type="view") %}
{% set exists, view_target = get_or_create_relation(database=None, schema=target.schema, identifier="view_target", type="view") %}

{# Create view `source`. #}
{% set sql = get_create_view_as_sql(view_source, "SELECT 42 as value") %}
{% do run_query(sql) %}

{# Rename view to `target`. #}
{% do adapter.rename_relation(view_source, view_target) %}

-- Dummy statement.
SELECT TRUE AS id
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @hlcianfagna,

you asked about the rename_relation macro over at crate/cratedb-examples#754 (review). Thank you very much.

Hereby, I am adding an integration test with CrateDB that is using vanilla dbt Core's / dbt-postgres' rename_relation macro, and it seems like it works well without much ado.

So, do you think the other rename_relation you provided per 1, originally taken from 2, will still be neeeded?

With kind regards,
Andreas.

Footnotes

  1. https://github.com/crate/cratedb-examples/blob/91c0bb9f74cf62b67a4aabe5a79f7c80b475d16b/framework/dbt/basic/macros/overrides.sql#L50-L86

  2. https://community.cratedb.com/t/using-dbt-with-cratedb/1566

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be removed now, thank you.

Comment on lines +84 to +125
def test_rename_relation_table(self, project):
"""
Validate the vanilla `rename_relation` macro from dbt-postgres on tables.
"""

result = run_dbt(["run", "--select", "rename_table"])
assert len(result) == 1

records = common.get_records(project, "table_target")
assert records == [(42,)]

def test_rename_relation_view(self, project):
"""
Validate the vanilla `rename_relation` macro from dbt-postgres on views.
"""

result = run_dbt(["run", "--select", "rename_view"])
assert len(result) == 1

records = common.get_records(project, "view_target")
assert records == [(42,)]
Copy link
Member Author

@amotl amotl Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to my last comment, just to outline the actual test cases which use both of the macro scripts above, to validate renaming both tables and views works well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, yes, starting from CrateDB 5.5 as long as it does an ALTER TABLE and not an ALTER VIEW it should work fine without a need for the override.

Copy link
Member Author

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. I just added a few comments about the proper slotting-in of the CrateDB macro override cratedb__reset_csv_table, with a humble request for feedback about your concerns, any improvements, or acknowledgements. Thanks!

Comment on lines +1 to +13
{# CrateDB: Use `DELETE FROM` instead of `TRUNCATE` #}
{% macro cratedb__reset_csv_table(model, full_refresh, old_relation, agate_table) %}
{% set sql = "" %}
{% if full_refresh %}
{{ adapter.drop_relation(old_relation) }}
{% set sql = create_csv_table(model, agate_table) %}
{% else %}
{{ adapter.truncate_relation(old_relation) }}
{% set sql = "delete from " ~ old_relation.render() %}
{% endif %}

{{ return(sql) }}
{% endmacro %}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've derived this from @hlcianfagna's contributions at Using dbt with CrateDB. However, it has not been activated yet, and I don't know yet what it is about / which code paths, subsystems, and test cases it could unlock.

This has been activated now, accompanied by a corresponding software test. See next comment.

Comment on lines +43 to +60
reset_csv_table = """
{# Create a random table. #}
{% set exists, random_table = get_or_create_relation(database=None, schema=target.schema, identifier="random_table", type="table") %}
{% set sql = get_create_table_as_sql(False, random_table, "SELECT 42 as value") %}
{% do run_query(sql) %}

{#
This is a little excerpt from dbt/include/global_project/macros/materializations/seeds/seed.sql
#}
{% set full_refresh = (should_full_refresh()) %}
{% set agate_table = None %}
{% set sql_ddl = reset_csv_table(model=model, full_refresh=full_refresh, old_relation=random_table, agate_table=agate_table) %}
{% set sql_dml = load_csv_rows(model, agate_table) %}
{% set sql = get_csv_sql(sql_ddl, sql_dml) %}
{% do run_query(sql) %}

SELECT TRUE
"""
Copy link
Member Author

@amotl amotl Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the test case body for validating CrateDB's macro override cratedb__reset_csv_table. It is being used by the upstream materialization macro materialization seed, default in macros/materializations/seeds/seed.sql.

The test case is a bit too verbose for my taste, but I haven't been able to come up with a better one, because of my lack of knowledge and fluency in dbt.

@hlcianfagna: Maybe you have a better and more concise dbt recipe at hand, which (maybe indirectly) tests whether CrateDB's custom override cratedb__reset_csv_table is properly in place?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing this manually, I had a tiny csv in the seeds folder and I was running dbt seed twice.

Copy link
Member Author

@amotl amotl Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I added a corresponding software test that invokes dbt seed twice per ad768c5, but this one apparently does not trigger reset_csv_table. So I've kept the previous software test in place for now. 🤷

Comment on lines +127 to +136
def test_reset_csv(self, project):
"""
CrateDB needs an override for the `reset_csv_table` macro. Make sure it is in place.
"""

result = run_dbt(["run", "--select", "reset_csv_table"])
assert len(result) == 1

records = common.get_records(project, "reset_csv_table")
assert records == [(True,)]
Copy link
Member Author

@amotl amotl Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: It's not much that is being validated here within the test case definition. Effectively, it is just about invoking the dbt run.

Copy link

@hlcianfagna hlcianfagna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amotl amotl merged commit fa164d4 into main Dec 21, 2024
5 checks passed
@amotl amotl deleted the genesis branch December 21, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants