-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure SQL Data Warehouse adapter #1625
Comments
Hey @ichavez! That's great! There may be minor differences between that doc and reality - I think we changed a couple of tiny things in the adapter contract between 0.13.x and 0.14.0. If you're looking for inspiration, check out https://github.com/fishtown-analytics/dbt-spark -- this is an 0.14.0 compatible plugin for Spark. Definitely also check out the dbt-postgres, dbt-snowflake, and dbt-bigquery folders over here: https://github.com/fishtown-analytics/dbt/tree/dev/0.14.1/plugins Let me know if there's anything I can help out with as you get underway :) |
I would be down to help with this as well. was about to start in tonight :) |
A couple things we may need to consider early:
|
@norton120 awesome! Do you know how compatible the sqlserver and ADW connection semantics / SQL dialects are? Check out this prior art! https://github.com/mikaelene/dbt-sqlserver |
Hi all, sorry for the delay. It's about 90% the same queries. However SQL Data Warehouse supports more "warehouse" features such as CTAS. See: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse?view=aps-pdw-2016-au7 And a few commands are slightly different: https://docs.microsoft.com/en-us/sql/t-sql/statements/rename-transact-sql?view=aps-pdw-2016-au7 In terms of DBT, one important feature of Data Warehouse is the "distribution", which determines whether your table is partitioned or replicated to all the nodes. See: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute Unfortunately dbt-sqlserver uses the pymssql driver, AFAIK it does not support connecting to SQL Data Warehouse. Microsoft only endorses pyodbc, which supports DW connections. See: https://docs.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server?view=sql-server-2017 @jacobm001 started a dbt plugin using pyodbc at https://github.com/jacobm001/dbt-mssql . And... generously added an Apache license to it. I will be building our SQL Data Warehouse adapter on his, and expect to publish it by next week. |
Thanks for the update @ichavez! One thing is becoming clear: we (the core dbt maintainers) need to do a better job of collecting and sharing information about which community-contributed plugins exist. I wrote up a quick issue over here #1655 - if you have any thoughts / opinions on the topic, I'd love to hear them! |
Hi! I can change my license for dbt-SQLServer if that’s a problem? Regarding pymsql vs pyodbc, it was just so much easier to connect to sql server with pymsql from a Mac. You probably just need to change the connections.py to pyodbc. All other will work the same. If someone have a azure sql dw to share, I can chip in on the development 😀 |
One more thing. The CTAS feature of dw will probably make it much easier in the development of the adapter! Why isn’t that in standard SQLServer? |
I never got the @ notification sorry. A couple other things worth noting, the |
@ichavez do you have a fork up I can contribute to? I would really like to help get this over the curb |
I'm going to take a pass at this using |
Hi there, we're looking at the Azure SQL Data Warehouse platform and the dbt space but it doesn't have that functionality right now. We would be interested in contributing. Let me know! @ichavez |
So the test environment is a little more complex than I expected - I'm wrapping together a docker container so dev easy and we can chip away. ADW seems sad about |
Hi all @norton120 @DVAlexHiggs . I just created a few models with table, view, incremental materializations, and all seem to be working. But, as you can see, it's pretty basic for now. I'm sure performance can be improved over the default "incremental" implementation, but it works for now. Also, the table materialization supports a "distribution" setting. It does not work while setting it as default on your dbt_project though. |
Hey @ichavez , sorry I missed this - I put up a WIP repo here https://github.com/norton120/dbt-azuredatawarehouse just now. The docker container loads up I looked at the connections file quick, it looks like you are overriding |
and now I'm realizing how limited ADW transaction support is :( so autocommit seems like it may be the best way to go here then. |
About transactions... yes, we typically want to be able to completely recreate our models from source tables, so there's little use for transactions. Even with incremental or snapshot models, the best practice seems to implement them using CTAS, which is minimally logged operation. As you probably know, DDL operations are not supported, so it would be hard to work around that in dbt. See: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-develop-transactions#limitations |
@ichavez do you want to make yours a stand-alone repo so I can fork it and PR against it - since mssql server / ADW will end up being 2 different adapters? I'd like to move the docker env and the authentication stuff to yours |
hi @norton120 I'm not sure what you mean by stand-alone? You can certainly go to https://github.com/1A-Auto/dbt-azuredw and fork it. I would like to track Jacobs' dbt-mssql repo for a while. SQL Server and SQL Data Warehouse are close enough that features can almost certainly work for both. In prod though, CTAS would offer better performance in DW, so I would be implementing those cases. I'm still not 100% sure how things would develop, but we're considering using an instance of SQL Server for dev/testing and then SQL Data Warehouse as prod. In such scenario, it may not be a bad idea to make dbt-azuredw depend on dbt-mssql. This is not new for dbt, I've been reading redshift's plugin as an example: |
@ichavez Interesting use case. I don't have a use for Azure myself (at least not atm), but I'd be happy to collaborate on dbt-mssql to try and minimize the development needed for the azuredw plugin. |
Hi thanks for the replies, apologies for the delay in my response! |
@DVAlexHiggs I think the convention we are working with would be to log / work from issues here https://github.com/jacobm001/dbt-mssql, if it's an ADW-specific feature PR against https://github.com/1A-Auto/dbt-azuredw |
closing this one - out of scope for core |
We were also looking for this and ended up rolling our own with some help from the community re: macros. Most of the existing repos were abandoned ~ dbt 0.15 and we're now supported 0.18. We're committed to supported this since it is also required for our day-to-day business. https://github.com/embold-health/dbt-azuresynapse apologies in advance for the necropost! |
Hi,
I'm working on an adapter for Azure SQL Data Warehouse. I've read through: https://docs.getdbt.com/docs/building-a-new-adapter
I think I'm all set, but opening the ticket to start convo. I plan on basing the implementation on the postgres/redshift adapters.
Let me know if you've got any other tips or things I should know.
The text was updated successfully, but these errors were encountered: