-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genesis: Provide dbt adapter for CrateDB #2
Conversation
CrateDB does not understand: - SQL's `CREATE SCHEMA` and `DROP SCHEMA` operations - SQL's `EXCEPT` operation - SQL's `CASCADE` modifier - SQL's `CREATE TEMPORARY TABLE - PostgreSQL's synthetic `pg_rewrite` table - PostgreSQL's `pg_is_other_temp_schema()` function dbt features disabled: - Snapshots CrateDB needs: - Write synchronization using `REFRESH TABLE ...`
- Use dbt-core>=1.9.0b4 - Add configuration snippets for black, flake8, mypy, pytest
54f1401
to
6d37e3f
Compare
580a70e
to
b7146f7
Compare
CrateDB does not support MATERIALIZED VIEW.
## Getting started | ||
|
||
- [Install dbt](https://docs.getdbt.com/docs/installation) | ||
- Read the [introduction](https://docs.getdbt.com/docs/introduction/) and [viewpoint](https://docs.getdbt.com/docs/about/viewpoint/) | ||
- [Install dbt](https://docs.getdbt.com/docs/core/installation-overview) | ||
- Read the [introduction](https://docs.getdbt.com/docs/introduction/) and | ||
[viewpoint](https://docs.getdbt.com/community/resources/viewpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also link to the new canonical integration page at the CrateDB Guide here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been added per 0147433.
2c00602
to
915f3e0
Compare
rename_table_sql = """ | ||
{% set exists, table_source = get_or_create_relation(database=None, schema=target.schema, identifier="table_source", type="table") %} | ||
{% set exists, table_target = get_or_create_relation(database=None, schema=target.schema, identifier="table_target", type="table") %} | ||
|
||
{# Create table `source`. #} | ||
{% set sql = get_create_table_as_sql(False, table_source, "SELECT 42 as value") %} | ||
{% do run_query(sql) %} | ||
|
||
{# Rename table to `target`. #} | ||
{% do adapter.rename_relation(table_source, table_target) %} | ||
|
||
-- Dummy statement. | ||
SELECT TRUE AS id | ||
""" | ||
|
||
|
||
rename_view_sql = """ | ||
{% set exists, view_source = get_or_create_relation(database=None, schema=target.schema, identifier="view_source", type="view") %} | ||
{% set exists, view_target = get_or_create_relation(database=None, schema=target.schema, identifier="view_target", type="view") %} | ||
|
||
{# Create view `source`. #} | ||
{% set sql = get_create_view_as_sql(view_source, "SELECT 42 as value") %} | ||
{% do run_query(sql) %} | ||
|
||
{# Rename view to `target`. #} | ||
{% do adapter.rename_relation(view_source, view_target) %} | ||
|
||
-- Dummy statement. | ||
SELECT TRUE AS id | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @hlcianfagna,
you asked about the rename_relation
macro over at crate/cratedb-examples#754 (review). Thank you very much.
Hereby, I am adding an integration test with CrateDB that is using vanilla dbt Core's / dbt-postgres' rename_relation
macro, and it seems like it works well without much ado.
So, do you think the other rename_relation
you provided per 1, originally taken from 2, will still be neeeded?
With kind regards,
Andreas.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be removed now, thank you.
def test_rename_relation_table(self, project): | ||
""" | ||
Validate the vanilla `rename_relation` macro from dbt-postgres on tables. | ||
""" | ||
|
||
result = run_dbt(["run", "--select", "rename_table"]) | ||
assert len(result) == 1 | ||
|
||
records = common.get_records(project, "table_target") | ||
assert records == [(42,)] | ||
|
||
def test_rename_relation_view(self, project): | ||
""" | ||
Validate the vanilla `rename_relation` macro from dbt-postgres on views. | ||
""" | ||
|
||
result = run_dbt(["run", "--select", "rename_view"]) | ||
assert len(result) == 1 | ||
|
||
records = common.get_records(project, "view_target") | ||
assert records == [(42,)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to my last comment, just to outline the actual test cases which use both of the macro scripts above, to validate renaming both tables and views works well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, yes, starting from CrateDB 5.5 as long as it does an ALTER TABLE
and not an ALTER VIEW
it should work fine without a need for the override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. I just added a few comments about the proper slotting-in of the CrateDB macro override cratedb__reset_csv_table
, with a humble request for feedback about your concerns, any improvements, or acknowledgements. Thanks!
{# CrateDB: Use `DELETE FROM` instead of `TRUNCATE` #} | ||
{% macro cratedb__reset_csv_table(model, full_refresh, old_relation, agate_table) %} | ||
{% set sql = "" %} | ||
{% if full_refresh %} | ||
{{ adapter.drop_relation(old_relation) }} | ||
{% set sql = create_csv_table(model, agate_table) %} | ||
{% else %} | ||
{{ adapter.truncate_relation(old_relation) }} | ||
{% set sql = "delete from " ~ old_relation.render() %} | ||
{% endif %} | ||
|
||
{{ return(sql) }} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've derived this from @hlcianfagna's contributions at Using dbt with CrateDB. However, it has not been activated yet, and I don't know yet what it is about / which code paths, subsystems, and test cases it could unlock.
This has been activated now, accompanied by a corresponding software test. See next comment.
reset_csv_table = """ | ||
{# Create a random table. #} | ||
{% set exists, random_table = get_or_create_relation(database=None, schema=target.schema, identifier="random_table", type="table") %} | ||
{% set sql = get_create_table_as_sql(False, random_table, "SELECT 42 as value") %} | ||
{% do run_query(sql) %} | ||
|
||
{# | ||
This is a little excerpt from dbt/include/global_project/macros/materializations/seeds/seed.sql | ||
#} | ||
{% set full_refresh = (should_full_refresh()) %} | ||
{% set agate_table = None %} | ||
{% set sql_ddl = reset_csv_table(model=model, full_refresh=full_refresh, old_relation=random_table, agate_table=agate_table) %} | ||
{% set sql_dml = load_csv_rows(model, agate_table) %} | ||
{% set sql = get_csv_sql(sql_ddl, sql_dml) %} | ||
{% do run_query(sql) %} | ||
|
||
SELECT TRUE | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the test case body for validating CrateDB's macro override cratedb__reset_csv_table
. It is being used by the upstream materialization macro materialization seed, default
in macros/materializations/seeds/seed.sql.
The test case is a bit too verbose for my taste, but I haven't been able to come up with a better one, because of my lack of knowledge and fluency in dbt.
@hlcianfagna: Maybe you have a better and more concise dbt recipe at hand, which (maybe indirectly) tests whether CrateDB's custom override cratedb__reset_csv_table
is properly in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was testing this manually, I had a tiny csv in the seeds
folder and I was running dbt seed
twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I added a corresponding software test that invokes dbt seed
twice per ad768c5, but this one apparently does not trigger reset_csv_table
. So I've kept the previous software test in place for now. 🤷
def test_reset_csv(self, project): | ||
""" | ||
CrateDB needs an override for the `reset_csv_table` macro. Make sure it is in place. | ||
""" | ||
|
||
result = run_dbt(["run", "--select", "reset_csv_table"]) | ||
assert len(result) == 1 | ||
|
||
records = common.get_records(project, "reset_csv_table") | ||
assert records == [(True,)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: It's not much that is being validated here within the test case definition. Effectively, it is just about invoking the dbt run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
About
Provide dbt adapter for CrateDB, carved out of the canonical PostgreSQL adapter dbt-postgres, making a few test cases succeed.
Details
Unit Tests: 34 passed, 2 skipped
Integration Tests: 426 passed, 103 skipped
References
/cc @surister, @kneth