Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul of Adapter Development Contributing Docs #1727

Merged
merged 44 commits into from
Sep 13, 2022
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
6584608
overview table
dataders Jul 14, 2022
5744fd3
update source URLs for macros
dataders Jul 14, 2022
7d904d4
update for grants and and new doc pages
dataders Jul 14, 2022
bef6519
Merge branch 'current' into outlining-and-monospacing-of-code
dataders Jul 14, 2022
326fdbd
cordon into new location
dataders Jul 15, 2022
bb5a186
initialize "what are adapters" page
dataders Jul 15, 2022
fa60adb
adding in that special @jasnonaz spice!
dataders Jul 15, 2022
5a22978
add diagram
dataders Jul 15, 2022
9d33209
now in next section
dataders Jul 15, 2022
e2dfe28
wrong doc ids
dataders Jul 15, 2022
a12e28a
correctly reference static images
dataders Jul 18, 2022
0901dcd
Update website/sidebars.js
dataders Jul 18, 2022
53a82dc
Merge branch 'current' into cordoning-adapter-docs
dataders Jul 18, 2022
7833d3d
Merge branch 'current' of github.com:dbt-labs/docs.getdbt.com into co…
dataders Aug 31, 2022
6576780
add adapter verification
dataders Aug 31, 2022
23d8d55
draft adapter overview
dataders Sep 1, 2022
9e87fe6
Merge branch 'current' into cordoning-adapter-docs
dataders Sep 1, 2022
289e5c1
Update website/sidebars.js
dataders Sep 1, 2022
cbecb45
docusaurus ref instead of relative dir paths
dataders Sep 6, 2022
a963e7c
close tags
dataders Sep 6, 2022
361c48d
Merge branch 'current' of github.com:dbt-labs/docs.getdbt.com into co…
dataders Sep 6, 2022
4d7c805
escape accidental HTML tag
dataders Sep 6, 2022
5f58414
url link fixes
dataders Sep 6, 2022
cca9a19
restructure
dataders Sep 6, 2022
8267fce
Merge branch 'outlining-and-monospacing-of-code' of github.com:dbt-la…
dataders Sep 6, 2022
4bc1965
Update website/docs/docs/contributing/adapter-development/1-what-are-…
dataders Sep 7, 2022
caefa58
cleaner language
dataders Sep 7, 2022
15a102c
typo cleanups h/t @sdurry
dataders Sep 7, 2022
3078de7
Merge branch 'cordoning-adapter-docs' of github.com:dbt-labs/docs.get…
dataders Sep 7, 2022
f3c8882
code formatting
dataders Sep 7, 2022
a7e4965
Merge branch 'current' of github.com:dbt-labs/docs.getdbt.com into co…
dataders Sep 7, 2022
00836b0
Update 1-what-are-adapters.md
matthewshaver Sep 8, 2022
d46566b
Update 5-promoting-a-new-adapter.md
matthewshaver Sep 9, 2022
be8d755
Update 5-promoting-a-new-adapter.md
matthewshaver Sep 9, 2022
edfb101
thanks @jtcohen6
dataders Sep 12, 2022
d68b154
Update website/docs/docs/contributing/adapter-development/6-verifying…
dataders Sep 12, 2022
add9a08
it's got the juice!
dataders Sep 12, 2022
65554be
Merge branch 'cordoning-adapter-docs' of github.com:dbt-labs/docs.get…
dataders Sep 12, 2022
9ef6795
more note responses
dataders Sep 13, 2022
4d138de
index page and prereq page
dataders Sep 13, 2022
63b2b79
Merge branch 'current' into cordoning-adapter-docs
dataders Sep 13, 2022
08dcff3
Update website/docs/docs/contributing/adapter-development/6-promoting…
dataders Sep 13, 2022
2562f51
actually introduce topic
dataders Sep 13, 2022
cccbdcc
Merge branch 'cordoning-adapter-docs' of github.com:dbt-labs/docs.get…
dataders Sep 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions _redirects
Original file line number Diff line number Diff line change
Expand Up @@ -335,3 +335,6 @@ https://tutorial.getdbt.com/* https://docs.getdbt.com/:splat 301!
/docs/guides/getting-help /guides/legacy/getting-help 302
/docs/guides/migration-guide/* /guides/migration/versions/:splat 301!
/docs/guides/* /guides/legacy/:splat 301!
docs/contributing/building-a-new-adapter /docs/contributing/adapter-development/2-building-a-new-adapter 302
docs/contributing/testing-a-new-adapter /docs/contributing/adapter-development/3-testing-a-new-adapter 302
docs/contributing/documenting-a-new-adapter /docs/contributing/adapter-development/4-documenting-a-new-adapter 302
2 changes: 1 addition & 1 deletion website/docs/docs/available-adapters.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ id: "available-adapters"

dbt connects to and runs SQL against your database, warehouse, platform, or query engine. It works by using a dedicated **adapter** for each technology. All the adapters listed below are open source and free to use, just like dbt.

If you have a new adapter, please add it to this list using a pull request! See [Documenting your adapter](/docs/contributing/documenting-a-new-adapter.md) for more information.
If you have a new adapter, please add it to this list using a pull request! See [Documenting your adapter](4-documenting-a-new-adapter) for more information.

### Installation

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
title: "What are adapters? Why do we need them?"
id: "1-what-are-adapters"
---

Here is a quick intro as to why adapters need to exist and how they are currently constructed. For any questions you may have, don't hesitate to ask in the [#adapter-ecosystem](https://getdbt.slack.com/archives/C030A0UF5LM) Slack channel. The community is very helpful and likely has experienced a similar issue as you.
dataders marked this conversation as resolved.
Show resolved Hide resolved

## No one ever: "Aren't all databases the same?"

There's a huge amount of work that goes into creating a database. At a high level, here's the "layers" that go into a database (outermost inwards):
- SQL API
- Client Library / Driver
- Server Connection Manager
- Query parser
- Query optimizer
- Runtime
- Storage Access Layer
- Storage

There's a lot more there than just SQL as a language (no insult intended to Donald Chamberlain). Ultimately, the reason that databases (and later data warehouses ) are so popular is that you can abstract away a great deal of the complexity from your brain to the database itself. This leaves you to focus more on the data

Enter the radical notion that is dbt. By further abstracting and standardizing the outermost layers of a database (SQL API, client library, connection manager), into a framework, it both:
1. opens database technology to less technical users (webmaster -> web developer), and
2. enables more meaningful conversations about how data warehousing should be done.

Enter dbt adapters.

## What exactly needs to be adapted?

dbt "adapters" are responsible for _adapting_ dbt's "standard" functionality to a given database. For a variety of reasons, our prototypical database and adapter is PostgreSQL and dbt-postgres, and most of our adapters are somewhat based on the functionality described in dbt-postgres.

If there's a new database with which you'd like dbt to work, chances are that you'll need to either build a new adapter, or extend an existing one.

The outermost layers of a database mentioned above map roughly to the areas in which the dbt adapter framework encapsulates inter-database differences:

### SQL API

Even amongst ANSI compliant databases, there are virtually always differences in the SQL grammar. Here's some categories and specific examples of SQL statements that can be constructed differently.


| category | specific area of differences | examples |
|----------------------------------------------|--------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| statement syntax | the grammar of using `IF EXISTS` | `IF <TABLE> EXISTS, DROP TABLE`<br></br>vs<br></br>`DROP <TABLE> IF EXISTS` |
| workflow definition & semantics | incremental updates | `MERGE` vs. `DELETE; INSERT` |
| relation and column attributes/configuration | database-specifc materialization configs | `DIST = ROUND_ROBIN` (Synapse)<br></br>vs<br></br>`DIST = EVEN` (Redshift) |
| permissioning | grant statements that can only take one grantee at a time vs those that accept lists of grantees | `grant SELECT on table hogwarts.house_pts to dumbledore, snape` <br></br> ```<br></br>grant SELECT on table hogwarts.house_pts to dumbledore<br></br>grant SELECT on table hogwarts.house_pts to snape<br></br>``` |

### Python Client Library & Connection Manager

The other big category of inter-database differences comes with how the client connects to the database, and executes queries against said connection. In order to integrate with dbt, a data platform needs to have a pre-existing python client library, or at least support ODBC in which case a generic python library like pyodbc can be used.

| category | specific area of differences | examples |
|------------------------------|-------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| credentials & authentication | authentication | username & password<br></br>vs<br></br>MFA with `boto3` or Okta token |
| connection opening/closing | create a new connection to db | `psycopg2.connect(connection_string)`<br></br>vs<br></br>`google.cloud.bigquery.Client( ... )` |
| inserting local data | load seed .`csv` files into Python memory | `Adapter.upload_file()` (BigQuery)<br></br>`INSERT ... INTO VALUES ...` prepared statement (all other databases) |


## How does dbt encapsulate and abstract these differences?

Differences between databases are encoded into discrete areas:

| Components | Code Path | Function |
|------------------|---------------------------------------------------|-------------------------------------------------------------------------------|
| Python Classes | `adapters/<adapter_name>` | Configuration (See above [Python classes](##python classes) |
| Macros | `include/<adapter_name>/macros/adapters/` | SQL API & statement syntax (e.g. how to create schema, how to get table info) |
| Materializations | `include/<adapter_name>/macros/materializations/` | table/view/snapshot/ workflow definitions |


### Python Classes

These classes implement all the methods responsible for
1. connecting to a database and issuing queries, and
2. providing dbt with database-specific configuration information

| Class | Description |
|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| AdapterClass | high-level configuration type conversion and any database-specifc python methods needed |
| AdapterCredentials | typed dictionary of possible profiles and associated methods |
| AdapterConnectionManager | all the methods responsible for connecting to a database and issuing queries |
| AdapterRelation | how relation names should be rendered, printed, and/or quoted? Do relation names use all three parts?`catalog.model_name` (two-part name) or `database.schema.model_name` (three-part name) |
| AdapterColumn | how names should be rendered, and database-specific properties |

### Macros

A set of *macros* responsible for generating SQL that is compliant with the target database.

### Materializations

At the highest level: a set of *<Term id="materialization">materializations</Term>* and their corresponding helper macros that are defined in dbt using jinja and SQL. They codify for dbt how model files should be persisted into the database (i.e. materialized).

## Adapter Architecture


Below is a diagram of how dbt-postgres, the adapter at the center of dbt-core, works.

<Lightbox src="/img/adapter-guide/adapter architecture - postgres.png" title="adapter architecture diagram"/>
Loading