-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a simple plugin system for loading data from external sources #141
Changes from all commits
446e6ae
417773a
390a264
50e4c76
328c9af
5c2319f
58eafaf
e017af8
b9d0768
4f69a8c
aad97bf
fe8b9d1
0ada396
a48d223
0dc41b8
3f0a153
7acb680
1ab241f
3e094de
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -77,3 +77,4 @@ target/ | |
|
||
.DS_Store | ||
.idea/ | ||
.vscode/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
import abc | ||
import importlib | ||
from typing import Any | ||
from typing import Dict | ||
|
||
from ..utils import SourceConfig | ||
from dbt.dataclass_schema import dbtClassMixin | ||
|
||
|
||
class PluginConfig(dbtClassMixin): | ||
"""A helper class for defining the configuration settings a particular plugin uses.""" | ||
|
||
pass | ||
|
||
|
||
class Plugin(abc.ABC): | ||
WELL_KNOWN_PLUGINS = { | ||
"excel": "dbt.adapters.duckdb.plugins.excel.ExcelPlugin", | ||
"gsheet": "dbt.adapters.duckdb.plugins.gsheet.GSheetPlugin", | ||
"iceberg": "dbt.adapters.duckdb.plugins.iceberg.IcebergPlugin", | ||
"sqlalchemy": "dbt.adapters.duckdb.plugins.sqlalchemy.SQLAlchemyPlugin", | ||
} | ||
|
||
@classmethod | ||
def create(cls, impl: str, config: Dict[str, Any]) -> "Plugin": | ||
module_name, class_name = impl.rsplit(".", 1) | ||
module = importlib.import_module(module_name) | ||
Class = getattr(module, class_name) | ||
if not issubclass(Class, Plugin): | ||
raise TypeError(f"{impl} is not a subclass of Plugin") | ||
return Class(config) | ||
|
||
@abc.abstractmethod | ||
def __init__(self, plugin_config: Dict): | ||
pass | ||
|
||
def load(self, source_config: SourceConfig): | ||
"""Load data from a source config and return it as a DataFrame-like object that DuckDB can read.""" | ||
raise NotImplementedError |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
import pathlib | ||
from typing import Dict | ||
|
||
import pandas as pd | ||
|
||
from . import Plugin | ||
from ..utils import SourceConfig | ||
|
||
|
||
class ExcelPlugin(Plugin): | ||
def __init__(self, config: Dict): | ||
self._config = config | ||
|
||
def load(self, source_config: SourceConfig): | ||
ext_location = source_config.meta["external_location"] | ||
ext_location = ext_location.format(**source_config.as_dict()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This allows dynamically inserting values from the config using a format string. Could you give an example of how this is intended to be used? Does this, for example, contain the seed directory? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, so the idea here is that you can define these meta properties in one of two places, either a) the ...and then there's an even more advanced version of the pattern here: #116 |
||
source_location = pathlib.Path(ext_location.strip("'")) | ||
sheet_name = source_config.meta.get("sheet_name", 0) | ||
return pd.read_excel(source_location, sheet_name=sheet_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what the
buenavista
class does?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a common dev workflow problem that folks run into using dbt-duckdb is that they want to be able to simultaneously update and run their dbt models in an IDE while querying the relations that the dbt run generates using a BI/DB query tool (DBeaver, Superset, etc.) DuckDB's execution model makes this pattern hard to do b/c if any process has the write lock on the underlying DB file, then no other process is allowed to read the file, so you end up with this awkwardness where you need to switch back and forth between dbt owning the file and the query tool owning the file.
Buena Vista started out as my attempt to solve this problem: it's a Python server that speaks the Postgres (and more recently, Presto) protocols and takes ownership of the DuckDB file so that multiple processes can operate on it at the same time. To make BV work with dbt-duckdb, I created a notion of environments, so that dbt-duckdb could distinguish between cases where it should execute everything against the local Python process (i.e., how things work normally) or whether it should execute the code against a remote Python process that is speaking the Postgres protocol (i.e., BV.) I suspect that in the not too distant future there will be more types of remote server environments that folks will want to be able to execute their DuckDB queries against (an early example here: (https://github.com/boilingdata/boilingdata-http-gw ) and I want them to be able to use dbt-duckdb as their adapter without having to write their own. /cc @dforsber
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sounds really good 👍🏻. We would be happy to integrate/create/help writing plugin for BoilingData too.