Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some dbt_utils macros do not work with Spark SQL #291

Closed
1 of 5 tasks
bdelamotte opened this issue Nov 5, 2020 · 5 comments
Closed
1 of 5 tasks

Some dbt_utils macros do not work with Spark SQL #291

bdelamotte opened this issue Nov 5, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@bdelamotte
Copy link

Describe the bug

It appears that many dbt_utils macros are not supported with Spark SQL due to how a lot of the macros are casting with :: instead of cast

Steps to reproduce

  1. Add dbt_utils to your packages.yml file:
packages:
  - package: fishtown-analytics/dbt_utils
    version: 0.6.2
  1. Create a simple model with the following SQL:
select {{ dbt_utils.current_timestamp() }}

Expected results

A model should be created just fine.

Actual results

Runtime Error in model dbt_test (models/staging/dbt_test.sql)
  Database Error
    Error running query: org.apache.spark.sql.catalyst.parser.ParseException: 
    mismatched input ':' expecting <EOF>(line 6, pos 21)
    
    == SQL ==
    /* {"app": "dbt", "dbt_version": "0.18.1", "profile_name": "databricks", "target_name": "dev", "node_id": "model.dbt_databricks.dbt_test"} */
    create or replace view brian_dev_stg.dbt_test
      
      as
        select 
        current_timestamp::
    ---------------------^^^
        timestamp
    

Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Screenshots and log output

Screen Shot 2020-11-04 at 5 38 50 PM

System information

The contents of your packages.yml file:
packages:

  • package: fishtown-analytics/dbt_utils
    version: 0.6.2

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: Spark SQL)

The output of dbt --version:

installed version: 0.18.1
   latest version: 0.18.1

Up to date!

Plugins:
  - bigquery: 0.18.1
  - snowflake: 0.18.1
  - redshift: 0.18.1
  - postgres: 0.18.1
  - spark: 0.18.0

The operating system you're using:
macOS Catalina Version 10.15.7
The output of python --version:
Python 3.7.6

Additional context

Here is one place where casting is done by :: instead of cast()
https://github.com/fishtown-analytics/dbt-utils/blob/9feaccc327a7409298a2bc362db53c2e597024fa/macros/cross_db_utils/current_timestamp.sql#L6

Are you interested in contributing the fix?

Yes! I am happy to change all the :: to cast() or provide another solution. I would love to contribute to dbt-utils!

@bdelamotte bdelamotte added bug Something isn't working triage labels Nov 5, 2020
@emilieschario
Copy link
Contributor

@jtcohen6 I know that dbt utils is outside of core, but do you have any thoughts here on where the core-supported plugins start and end vs things like this? I was under the impression the dbt core --> has plugins and if that was the case, dbt utils would support those core plugins. But maybe that's not true?

Or maybe Spark is just behind so this is the first time this has been caught?

@clrcrl
Copy link
Contributor

clrcrl commented Nov 5, 2020

I'm not a spark expert, but I do know that there's a separate spark-utils package that is likely useful here!

@jtcohen6
Copy link
Contributor

jtcohen6 commented Nov 5, 2020

IMO "plugin packages" are definitely the answer here, and spark-utils is the proof point.

Thanks to adapter-based macro dispatching, introduced in v0.18.0, you can install a package that "shims" support for another package's macros. So far, I've created spark__ implementations of a subset of dbt-utils macros that don't natively work on Spark.

As long as you install both packages, and define a var in dbt_project.yml like:

vars:
  dbt_utils_dispatch_list: ['spark_utils']

Throughout your project, you can call macros such as dbt_utils.dateadd, and dbt will actually use spark_utils.spark__dateadd behind the scenes.

@otosky
Copy link

otosky commented Aug 26, 2021

just want to update the note above to v0.20.0 syntax for anyone stumbling across this issue like I did

dispatch:
 - macro_namespace: dbt_utils
   search_order: ['spark_utils', 'dbt_utils']

This is a super cool feature btw!

dispatch docs

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 4, 2022

And it works even more "out of the box" in v1.2: dbt-labs/dbt-spark#359

Closing this one, which has been open for a while :)

@jtcohen6 jtcohen6 closed this as completed Aug 4, 2022
@jtcohen6 jtcohen6 removed the triage label Aug 4, 2022
@jtcohen6 jtcohen6 removed their assignment Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants