Contributing to Feature Fabrica

⚙️ The Framework to Simplify and Scale Feature Engineering ⚙️

For data scientists, ML engineers, and AI researchers who want to simplify feature engineering, manage complex dependencies, and boost productivity.

Introduction

Feature Fabrica is an open-source Python library designed to improve engineering practices and transparency in feature engineering. It allows users to define features declaratively using YAML, manage dependencies between features, and apply complex transformations in a scalable and convenient manner.

By providing a structured approach to feature engineering, Feature Fabrica aims to save time, reduce errors, and enhance the transparency and reproducibility of your machine learning workflows. Whether you're working on small projects or managing large-scale pipelines, Feature Fabrica is designed to meet your needs.

Key Features

📝 Declarative Feature Definitions: Define features, data types, and dependencies using a simple YAML configuration.
🔄 Transformations: Apply custom transformations to raw features to derive new features.
🔗 Dependency Management: Automatically handle dependencies between features.
✔️ Pydantic Validation: Ensure data types and values conform to expected formats.
🛡️ Fail-Fast with Beartype: Catch type-related errors instantly during development, ensuring your transformations are robust.
🚀 Scalability: Designed to scale from small projects to large machine learning pipelines.
🔧 Hydra Integration: Leverage Hydra for configuration management, enabling flexible and dynamic configuration of transformations.

🛠️ Quick Start

Installation

To install Feature Fabrica, simply run:

pip install feature-fabrica

Defining Features in YAML

Features are defined in a YAML file. See examples in examples/ folder. Here’s an example:

feature_a:
  description: "Raw feature A"
  data_type: "int32"
  group: "training"

feature_b:
  description: "Raw feature B"
  data_type: "float32"
  group: "training"
  transformation:
    scale_feature:
      _target_: ().scale(factor=2)

feature_c:
  description: "Derived feature C"
  data_type: "float32"
  group: "training_experiment"
  dependencies: ["feature_a", "feature_b"]
  transformation:
    solve:
      _target_: (feature_a + feature_b) / 2

feature_e:
  description: "Raw feature E"
  data_type: "int32"
  group: "draft"
  transformation:
    _target_: ().upper().lower().one_hot(categories=['apple', 'orange'])

Creating and Using Transformations

You can define custom transformations by subclassing the Transformation class:

from feature_fabrica.transform import Transformation


class MyCustomTransform(Transformation):
    _name_ = "my_custom_transform"

    def execute(self, data):
        return data * 2

feature_a:
  description: "Raw feature A"
  data_type: "int32"
  group: "training"
  transformation:
    _target_: ().my_custom_transform()

Compiling and Executing Features

To compile and execute features:

import numpy as np
from feature_fabrica.core import FeatureManager

data = {
    "feature_a": np.array([10.0], dtype=np.float32),
    "feature_b": np.array([20.0], dtype=np.float32),
}
feature_manager = FeatureManager(
    config_path="../examples", config_name="basic_features"
)
results = feature_manager.compute_features(data)
print(results["feature_c"])  # 0.5 * (10 + 20) = 15.0
print(results.feature_c)  # 0.5 * (10 + 20) = 15.0

Visualize Features and Dependencies

Track & trace Transformation Chains

import numpy as np
from feature_fabrica.core import FeatureManager

data = {
    "feature_a": np.array([10.0], dtype=np.float32),
    "feature_b": np.array([20.0], dtype=np.float32),
}
feature_manager = FeatureManager(
    config_path="../examples", config_name="basic_features"
)
results = feature_manager.compute_features(data)
print(feature_manager.features.feature_c.get_transformation_chain())
# Transformation Chain: (Transformation: sum_fn, Value: 30.0 Time taken: 9.5367431640625e-07 seconds) -> (Transformation: scale_feature, Value: 15.0, Time taken:  9.5367431640625e-07 seconds)

Visualize Dependencies

from feature_fabrica.core import FeatureManager

feature_manager = FeatureManager(
    config_path="../examples", config_name="basic_features"
)
feature_manager.get_visual_dependency_graph()

Contributing to Feature Fabrica

First, thank you for taking the time to contribute! 🎉 Contributions are essential to making Feature Fabrica a better library, and we truly appreciate your involvement.

The following is a set of guidelines for contributing to Feature Fabrica, including reporting bugs, adding new features, and improving documentation.

Roadmap

NLP support
Embeddings support
Simplify UI
Better visualizations/reports

How to Contribute

Fork and Clone the Repo

Fork the repository to your own GitHub account by clicking the "Fork" button at the top of the page.

Clone your fork locally:

git clone https://github.com/your-username/feature-fabrica.git
cd feature-fabrica

Set the original repository as a remote:

git remote add upstream https://github.com/cowana-ai/feature-fabrica.git

Before creating a new branch, ensure your main branch is up-to-date:
```
git checkout main
git pull upstream main
```

Create a Branch

Create a new branch for your feature or bug fix:
```
git checkout -b feature/my-new-feature
```
Make your changes in this new branch.

Reporting Bugs

If you discover a bug in Feature Fabrica, please open an issue on GitHub. Before submitting your report, please check if an issue already exists to avoid duplicates. Include the following details in your report:

A clear and concise description of the bug.
Steps to reproduce the issue.
Expected behavior vs. actual behavior.
If applicable, screenshots or code snippets.

Suggesting Enhancements

We welcome suggestions to improve Feature Fabrica. Feel free to open an issue describing the enhancement. Please be as detailed as possible in describing:

The feature you'd like to see.
The reason it would be beneficial.
Any potential drawbacks.

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
.github/workflows		.github/workflows
examples		examples
feature_fabrica		feature_fabrica
media		media
tests		tests
use_cases		use_cases
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚙️ The Framework to Simplify and Scale Feature Engineering ⚙️

Introduction

Key Features

🛠️ Quick Start

Installation

Defining Features in YAML

Creating and Using Transformations

Compiling and Executing Features

Visualize Features and Dependencies

Contributing to Feature Fabrica

Roadmap

Table of Contents

How to Contribute

Fork and Clone the Repo

Create a Branch

Reporting Bugs

Suggesting Enhancements

About

Releases 9

Packages

Contributors 3

Languages

License

cowana-ai/feature-fabrica

Folders and files

Latest commit

History

Repository files navigation

⚙️ The Framework to Simplify and Scale Feature Engineering ⚙️

Introduction

Key Features

🛠️ Quick Start

Installation

Defining Features in YAML

Creating and Using Transformations

Compiling and Executing Features

Visualize Features and Dependencies

Contributing to Feature Fabrica

Roadmap

Table of Contents

How to Contribute

Fork and Clone the Repo

Create a Branch

Reporting Bugs

Suggesting Enhancements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 3

Languages

Packages