Skip to content
This repository has been archived by the owner on Oct 31, 2024. It is now read-only.

astrojuanlu/kedro-catalog-experiment

kedro-catalog

Documentation Status Code style: ruff-format PyPI

Prototype of a next-generation DataCatalog for Kedro.

Niceties:

  • Basic dataset loading
  • Basic factory resolution
  • Catalog items are lazily loaded (#2829)
  • Creating custom datasets is easier (#1936)
  • It gets trivially represented on REPLs (#1721)
  • Has public API to retrieve dataset objects (#1778)
  • ...which in turn have public properties (#3929)
  • No ABCs are needed because there's no shared logic, only Protocols (#4138)

The codebase is lean and makes heavy use of @dataclass and Pydantic models. I'm no software engineer so I'm not claiming it's well designed, but hopefully it's easy to understand (and therefore criticise).

Of course, it's tiny because it leaves lots of things out of the table. It critically does not support:

Usage

In [1]: catalog_config = {
   ...:     "ds1": {
   ...:         "type": "polars.CSVDataset",
   ...:         "filepath": "iris.csv",
   ...:     },
   ...:     "ds_{name}": {
   ...:         "type": "polars.CSVDataset",
   ...:         "filepath": "{name}.csv",
   ...:     },
   ...: }

In [2]: from kedro_catalog import DataCatalog

In [3]: catalog = DataCatalog.from_config(catalog_config)
   ...: catalog
Out[3]: DataCatalog(_dataset_configs={...}, _resolver=FactoryResolver())

In [4]: catalog.load("ds1").head(1)
Out[4]:
shape: (1, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
│ sepal_lengthsepal_widthpetal_lengthpetal_widthspecies │
│ ---------------     │
│ f64f64f64f64str     │
╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
│ 5.13.51.40.2setosa  │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘

In [5]: catalog.load("ds_iris").head(1)
Out[5]:
shape: (1, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
│ sepal_lengthsepal_widthpetal_lengthpetal_widthspecies │
│ ---------------     │
│ f64f64f64f64str     │
╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
│ 5.13.51.40.2setosa  │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘

Installation

To install, run

$ uv pip install kedro-catalog

Development

To run style checks:

$ uv tool install pre-commit
$ pre-commit run -a

About

Next-generation DataCatalog for Kedro

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published