dbt-ml-preprocessing

A package for dbt which enables standardization of data sets. You can use it to build a feature store in your data warehouse, without using external libraries like Spark's mllib or Python's scikit-learn.

The package contains a set of macros that mirror the functionality of the scikit-learn preprocessing module. Originally they were developed as part of the 2019 Medium article Feature Engineering in Snowflake.

Currently they have been tested in Snowflake, Redshift and BigQuery. The test case expectations have been built using scikit-learn (see *.py in integration_tests/data/sql), so you can expect behavioural parity with it.

The macros are:

scikit-learn function	macro name	Snowflake	BigQuery	Redshift
KBinsDiscretizer	k_bins_discretizer	Y	Y	Y
LabelEncoder	label_encoder	Y	Y	Y
MaxAbsScaler	max_abs_scaler	Y	Y	Y
MinMaxScaler	min_max_scaler	Y	Y	Y
Normalizer	normalizer	Y	Y	Y
OneHotEncoder	one_hot_encoder	Y	Y	Y
QuantileTransformer	quantile_transformer	Y	Y	N
RobustScaler	robust_scaler	Y	Y	Y
StandardScaler	standard_scaler	Y	Y	Y

* 2D charts taken from scikit-learn.org, GIFs are my own

Installation

To use this in your dbt project, create or modify packages.yml to include:

packages:
  - package: "omnata-labs/dbt_ml_preprocessing"
    version: [">=1.0.0"]

(replace the revision number with the latest)

Then run: dbt deps to import the package.

Usage

To read the macro documentation and see examples, simply generate your docs, and you'll see macro documentation in the Projects tree under dbt_ml_preprocessing:

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
docs		docs
images		images
integration_tests		integration_tests
macros		macros
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dbt_project.yml		dbt_project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt-ml-preprocessing

Installation

Usage

About

Releases 2

Packages

Languages

License

dataders/dbt-ml-preprocessing

Folders and files

Latest commit

History

Repository files navigation

dbt-ml-preprocessing

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages