Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure SQL Data Warehouse adapter #1625

Closed
ichavez opened this issue Jul 23, 2019 · 24 comments
Closed

Azure SQL Data Warehouse adapter #1625

ichavez opened this issue Jul 23, 2019 · 24 comments
Labels
adapter_plugins Issues relating to third-party adapter plugins

Comments

@ichavez
Copy link

ichavez commented Jul 23, 2019

Hi,

I'm working on an adapter for Azure SQL Data Warehouse. I've read through: https://docs.getdbt.com/docs/building-a-new-adapter

I think I'm all set, but opening the ticket to start convo. I plan on basing the implementation on the postgres/redshift adapters.

Let me know if you've got any other tips or things I should know.

@drewbanin
Copy link
Contributor

Hey @ichavez! That's great! There may be minor differences between that doc and reality - I think we changed a couple of tiny things in the adapter contract between 0.13.x and 0.14.0.

If you're looking for inspiration, check out https://github.com/fishtown-analytics/dbt-spark -- this is an 0.14.0 compatible plugin for Spark. Definitely also check out the dbt-postgres, dbt-snowflake, and dbt-bigquery folders over here: https://github.com/fishtown-analytics/dbt/tree/dev/0.14.1/plugins

Let me know if there's anything I can help out with as you get underway :)

@norton120
Copy link

I would be down to help with this as well. was about to start in tonight :)

@norton120
Copy link

A couple things we may need to consider early:

  • ADW does not support updates from CTEs
  • ADW does not allow joins in CTAS (ie SELECT ... INTO)
  • Like most DWs ADW does not support correlated subqueries

@drewbanin
Copy link
Contributor

@norton120 awesome! Do you know how compatible the sqlserver and ADW connection semantics / SQL dialects are? Check out this prior art! https://github.com/mikaelene/dbt-sqlserver

@ichavez
Copy link
Author

ichavez commented Aug 2, 2019

Hi all, sorry for the delay.

It's about 90% the same queries. However SQL Data Warehouse supports more "warehouse" features such as CTAS. See: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse?view=aps-pdw-2016-au7

And a few commands are slightly different: https://docs.microsoft.com/en-us/sql/t-sql/statements/rename-transact-sql?view=aps-pdw-2016-au7

In terms of DBT, one important feature of Data Warehouse is the "distribution", which determines whether your table is partitioned or replicated to all the nodes. See: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute

Unfortunately dbt-sqlserver uses the pymssql driver, AFAIK it does not support connecting to SQL Data Warehouse. Microsoft only endorses pyodbc, which supports DW connections. See: https://docs.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server?view=sql-server-2017

@jacobm001 started a dbt plugin using pyodbc at https://github.com/jacobm001/dbt-mssql .

And... generously added an Apache license to it.

I will be building our SQL Data Warehouse adapter on his, and expect to publish it by next week.

@drewbanin
Copy link
Contributor

Thanks for the update @ichavez!

One thing is becoming clear: we (the core dbt maintainers) need to do a better job of collecting and sharing information about which community-contributed plugins exist. I wrote up a quick issue over here #1655 - if you have any thoughts / opinions on the topic, I'd love to hear them!

@mikaelene
Copy link
Contributor

Hi! I can change my license for dbt-SQLServer if that’s a problem? Regarding pymsql vs pyodbc, it was just so much easier to connect to sql server with pymsql from a Mac. You probably just need to change the connections.py to pyodbc. All other will work the same. If someone have a azure sql dw to share, I can chip in on the development 😀

@mikaelene
Copy link
Contributor

One more thing. The CTAS feature of dw will probably make it much easier in the development of the adapter! Why isn’t that in standard SQLServer?

@norton120
Copy link

I never got the @ notification sorry. A couple other things worth noting, the Authentication option needs to be set for Azure AD or service discovery (I actually have never got it to work using pyodbc without the option set). There is also a weird hanging transaction that seems to cause the initial handshake to return an error with ADW; setting autocommit=true in the option avoids this behavior.
so this should probably have options for Authentication and Autocommit si?

@norton120
Copy link

@ichavez do you have a fork up I can contribute to? I would really like to help get this over the curb

@norton120
Copy link

I'm going to take a pass at this using pyodbc. I'll get a WIP up as soon as I can

@DVAlexHiggs
Copy link

Hi there, we're looking at the Azure SQL Data Warehouse platform and the dbt space but it doesn't have that functionality right now. We would be interested in contributing. Let me know! @ichavez

@norton120
Copy link

So the test environment is a little more complex than I expected - I'm wrapping together a docker container so dev easy and we can chip away. ADW seems sad about execute('BEGIN') at this very moment ... I'll try to get that WIP up in the next hour or so

@ichavez
Copy link
Author

ichavez commented Aug 23, 2019

Hi all @norton120 @DVAlexHiggs .
This is the initial commit: https://github.com/1A-Auto/dbt-azuredw

I just created a few models with table, view, incremental materializations, and all seem to be working. But, as you can see, it's pretty basic for now.

I'm sure performance can be improved over the default "incremental" implementation, but it works for now.

Also, the table materialization supports a "distribution" setting. It does not work while setting it as default on your dbt_project though.

@norton120
Copy link

norton120 commented Aug 23, 2019

Hey @ichavez , sorry I missed this - I put up a WIP repo here https://github.com/norton120/dbt-azuredatawarehouse just now. The docker container loads up jaffle_shop and the integration test repo as well if that helps? I found I also need to specify 'Authentication', not sure if that's due to our ADW security settings or what.

I looked at the connections file quick, it looks like you are overriding add_begin_query() and add_commit_query() - is DBT running outside a transaction then? I'm finding ADW has some weird transaction behavior.

@norton120
Copy link

I looked at the connections file quick, it looks like you are overriding add_begin_query() and add_commit_query() - is DBT running outside a transaction then? I'm finding ADW has some weird transaction behavior.

and now I'm realizing how limited ADW transaction support is :( so autocommit seems like it may be the best way to go here then.

@ichavez
Copy link
Author

ichavez commented Aug 23, 2019

About transactions... yes, we typically want to be able to completely recreate our models from source tables, so there's little use for transactions. Even with incremental or snapshot models, the best practice seems to implement them using CTAS, which is minimally logged operation.

As you probably know, DDL operations are not supported, so it would be hard to work around that in dbt. See: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-develop-transactions#limitations

@norton120
Copy link

@ichavez do you want to make yours a stand-alone repo so I can fork it and PR against it - since mssql server / ADW will end up being 2 different adapters? I'd like to move the docker env and the authentication stuff to yours

@ichavez
Copy link
Author

ichavez commented Aug 26, 2019

hi @norton120 I'm not sure what you mean by stand-alone? You can certainly go to https://github.com/1A-Auto/dbt-azuredw and fork it.

I would like to track Jacobs' dbt-mssql repo for a while. SQL Server and SQL Data Warehouse are close enough that features can almost certainly work for both. In prod though, CTAS would offer better performance in DW, so I would be implementing those cases.

I'm still not 100% sure how things would develop, but we're considering using an instance of SQL Server for dev/testing and then SQL Data Warehouse as prod.

In such scenario, it may not be a bad idea to make dbt-azuredw depend on dbt-mssql.

This is not new for dbt, I've been reading redshift's plugin as an example:

https://github.com/fishtown-analytics/dbt/blob/dev/0.14.1/plugins/redshift/dbt/include/redshift/macros/adapters.sql#L62

@jacobm001
Copy link

@ichavez Interesting use case. I don't have a use for Azure myself (at least not atm), but I'd be happy to collaborate on dbt-mssql to try and minimize the development needed for the azuredw plugin.

@DVAlexHiggs
Copy link

Hi thanks for the replies, apologies for the delay in my response!
How best can I contribute? Where is development needed most? Looking forward to working on this

@norton120
Copy link

@DVAlexHiggs I think the convention we are working with would be to log / work from issues here https://github.com/jacobm001/dbt-mssql, if it's an ADW-specific feature PR against https://github.com/1A-Auto/dbt-azuredw

@drewbanin
Copy link
Contributor

closing this one - out of scope for core

@mattberns
Copy link

We were also looking for this and ended up rolling our own with some help from the community re: macros. Most of the existing repos were abandoned ~ dbt 0.15 and we're now supported 0.18. We're committed to supported this since it is also required for our day-to-day business.

https://github.com/embold-health/dbt-azuresynapse

apologies in advance for the necropost!

@jtcohen6 jtcohen6 added the adapter_plugins Issues relating to third-party adapter plugins label Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter_plugins Issues relating to third-party adapter plugins
Projects
None yet
Development

No branches or pull requests

8 participants