Set default file format to parquet #422

jtcohen6 · 2022-08-09T13:32:14Z

resolves #363

Description

We already set this default here, but it's clearly not flowing through properly, because AdapterConfig doesn't really seem to be working in dbt-core (dbt-labs/dbt-core#5236):

dbt-spark/dbt/adapters/spark/impl.py

Line 38 in 8744cf1

file_format: str = "parquet"

Let's set the default explicitly in the macro, so that if the user hasn't provided a file_format, dbt-spark uses Parquet. This prevents weird issues with query rewrite in Databricks SQL warehouses, too (#236).

Update: If the user is connecting via ODBC (= Databricks), let's use delta as the default file format instead. This helps out Databricks SQL warehouse connections, which can only write Delta-formatted tables.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-spark next" section.

github-actions · 2022-08-11T09:29:40Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-spark contributing guide.

jtcohen6 · 2022-08-17T10:09:36Z

Running into more issues on ODBC cluster + endpoint than I'm able to debug right now. I'm going to close this PR for now; someone else could pick up this work and try to carry it over the finish line.

jtcohen6 requested review from Fleid, lostmygithubaccount, a team and VersusFacit August 9, 2022 13:32

cla-bot bot added the cla:yes label Aug 9, 2022

jtcohen6 added 6 commits August 11, 2022 12:56

Set default file format to parquet

2586db6

Fix unit tests

c2431c5

Try alternative logic

ede7e5b

Delta if ODBC, otherwise Parquet

f84d9e6

Fix unit tests

e22f1e1

Add changelog entry

62e3c61

jtcohen6 force-pushed the fix/default-file-format-parquet branch from e860c50 to 62e3c61 Compare August 11, 2022 11:02

jtcohen6 closed this Aug 17, 2022

jtcohen6 mentioned this pull request Jan 6, 2023

[CT-1757] [Feature] - Add support for other file formats like parquet in Python model using Databricks #583

Closed

3 tasks

mikealfare deleted the fix/default-file-format-parquet branch March 1, 2023 00:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set default file format to parquet #422

Set default file format to parquet #422

jtcohen6 commented Aug 9, 2022 •

edited

Loading

github-actions bot commented Aug 11, 2022

jtcohen6 commented Aug 17, 2022

Set default file format to parquet #422

Set default file format to parquet #422

Conversation

jtcohen6 commented Aug 9, 2022 • edited Loading

Description

Checklist

github-actions bot commented Aug 11, 2022

jtcohen6 commented Aug 17, 2022

jtcohen6 commented Aug 9, 2022 •

edited

Loading