Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fewer adapters will need to re-implement basic_load_csv_rows #3623

Merged
merged 11 commits into from
Aug 31, 2021

Conversation

dataders
Copy link
Contributor

@dataders dataders commented Jul 24, 2021

resolves #3622

Description

Implements a get_binding_char macro, which allows for easy substitution. This will allow dbt-sqlserver, dbt-oracle, and other adapters to only implement get_binding_char and not overload the entire basic_load_csv_rows

Questions

  1. are we at the point yet of obviating basic_load_csv_rows in favor of default__load_csv_rows where get_batch_size() is another dispatched macro?
  2. related to above are dispatched macros the best way to store the param string and default_batch_size? perhaps it's cleaner to add this as a SQLAdapterClass property?
  3. I thought it a bit strange to that 3 different macros are defines, followed by their default__ implementations. I followed this pattern, but it feels weird to have get_binding_char so far from default__get_binding_char. any strong opinions?

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@dataders dataders temporarily deployed to Redshift July 24, 2021 00:29 Inactive
@cla-bot cla-bot bot added the cla:yes label Jul 24, 2021
@dataders dataders temporarily deployed to Redshift July 24, 2021 00:29 Inactive
@dataders dataders temporarily deployed to Bigquery July 24, 2021 00:29 Inactive
@dataders dataders temporarily deployed to Bigquery July 24, 2021 00:29 Inactive
@dataders dataders temporarily deployed to Snowflake July 24, 2021 00:29 Inactive
@dataders dataders temporarily deployed to Snowflake July 24, 2021 00:29 Inactive
@dataders dataders temporarily deployed to Redshift July 26, 2021 16:24 Inactive
@dataders dataders temporarily deployed to Redshift July 26, 2021 16:24 Inactive
@dataders dataders temporarily deployed to Bigquery July 26, 2021 16:24 Inactive
@dataders dataders temporarily deployed to Bigquery July 26, 2021 16:24 Inactive
@dataders dataders temporarily deployed to Snowflake July 26, 2021 16:24 Inactive
@dataders dataders temporarily deployed to Snowflake July 26, 2021 16:24 Inactive
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all about it—let's make these macros more modular, and make the lives of adapter plugin maintainers easier.

Answers to your questions:

  1. Agreed, it's a bit silly that the only difference between default__load_csv_rows and basic_load_csv_rows is setting the batch size to 10000. I'd be open to new dispatched macros for get_csv_batch_size() and get_csv_insert_into_sql().
  2. I gave this a bit of thought. Our general rule is: All explicit SQL should be wrapped in Jinja macros. All implicit or non-SQL behavior of databases should be set in python. There are adapter plugins that leverage multiple connection options—e.g. dbt-spark uses pyodbc for some, not others—which is an argument in favor of adding a SQLConnectionManager property/method. This one is tricky, but I think a SQL param is SQL, sort of, so I'm happy setting it in a Jinja macro.
  3. No strong preference here. In other places, I think we tend to define a dispatched macro, immediately followed by its default implementation, and I agree that's a bit easier to read.

setup.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@dataders dataders requested a review from jtcohen6 August 1, 2021 19:59
@dataders
Copy link
Contributor Author

dataders commented Aug 1, 2021

took another stab at refactoring. The diff is ugly (the pedant in my is toying with two separate PRs) so make this better.

I decided to keep the params for load_csv_rows() at two, so adapter maintainers don't have breaking changes. However, I'm not in love with the hacky workaround of introducingget_batch_size() with a var() call. On the same note, I'm not in love with the batch_size parameter name, as it isn't immediately clear from the name as to whether it means the maximum number of rows to include in each query or the max number of SQL params to include in each query. The truth is the former, so recently we merged dbt-msft/dbt-sqlserver#151 which will auto-calculate batch size.

@dataders
Copy link
Contributor Author

@jtcohen6, I responded to your feedback, but I'm not sure why some of these tests are failing

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swanderz Looking good! So sorry for the delay getting back to you on this one.

When you get a chance, could you pull the latest changes from develop? Both in terms of the changelog placement, and also because we just overhauled our CI testing suite. I'd be curious to see if the same tests are failing as the ones you saw a few weeks ago.

@dataders dataders requested a review from jtcohen6 August 30, 2021 22:52
@jtcohen6
Copy link
Contributor

Going to close and reopen just to trigger additional adapter tests

@jtcohen6 jtcohen6 closed this Aug 31, 2021
@jtcohen6 jtcohen6 reopened this Aug 31, 2021
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @swanderz!

@jtcohen6 jtcohen6 merged commit 464beca into dbt-labs:develop Aug 31, 2021
IS-Josh pushed a commit to IS-Josh/dbt that referenced this pull request Sep 4, 2021
…#3623)

* fewer adapters will need to re-implemnt basic_load_csv_rows

* hack version

* reordering per convention

* make redundant basic_load_csv_rows

* for next version

* Update core/dbt/include/global_project/macros/materializations/seed/seed.sql

Co-authored-by: Jeremy Cohen <jtcohen6@gmail.com>

* Move up changelog entry

Co-authored-by: Jeremy Cohen <jtcohen6@gmail.com>
Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>
TeddyCr pushed a commit to TeddyCr/dbt that referenced this pull request Sep 9, 2021
…#3623)

* fewer adapters will need to re-implemnt basic_load_csv_rows

* hack version

* reordering per convention

* make redundant basic_load_csv_rows

* for next version

* Update core/dbt/include/global_project/macros/materializations/seed/seed.sql

Co-authored-by: Jeremy Cohen <jtcohen6@gmail.com>

* Move up changelog entry

Co-authored-by: Jeremy Cohen <jtcohen6@gmail.com>
Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>
hovaesco added a commit to starburstdata/dbt-trino that referenced this pull request Oct 5, 2021
dbt 0.21.0 introduced changes which dropped basic_load_csv_rows dbt-labs/dbt-core#3623. Now it's able to use the default macro default__load_csv_rows
hovaesco added a commit to starburstdata/dbt-trino that referenced this pull request Oct 7, 2021
dbt 0.21.0 introduced changes which dropped basic_load_csv_rows dbt-labs/dbt-core#3623. Now it's able to use the default macro default__load_csv_rows
@dataders dataders changed the title fewer adapters will need to re-implemnt basic_load_csv_rows fewer adapters will need to re-implement basic_load_csv_rows Nov 11, 2021
EminUZUN pushed a commit to EminUZUN/dbt-trino that referenced this pull request Feb 14, 2023
dbt 0.21.0 introduced changes which dropped basic_load_csv_rows dbt-labs/dbt-core#3623. Now it's able to use the default macro default__load_csv_rows
damian3031 pushed a commit to damian3031/dbt-trino that referenced this pull request Sep 9, 2024
dbt 0.21.0 introduced changes which dropped basic_load_csv_rows dbt-labs/dbt-core#3623. Now it's able to use the default macro default__load_csv_rows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow custom adapters to replace default param marker in seed's basic_load_csv_rows
2 participants