Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f929c07
New project
htahir1 Aug 19, 2025
f05e785
New project
htahir1 Aug 19, 2025
3b32788
Remove Azure Kubernetes Service production configuration
htahir1 Aug 19, 2025
962a895
Create lightweight preview plot and visualizations bundle
htahir1 Aug 19, 2025
b524ed4
Remove unnecessary steps in training pipeline
htahir1 Aug 19, 2025
5141fae
Add FloraCast to projects, focusing on timeseries prediction
htahir1 Aug 19, 2025
02d617e
Increase training epochs to 20 and enable progress bar
htahir1 Aug 19, 2025
84f979b
New project
htahir1 Aug 19, 2025
c553c40
Model changes
htahir1 Aug 19, 2025
808c09a
Update pipeline cache settings and improve data visualization
htahir1 Aug 19, 2025
6026a53
Model changes
htahir1 Aug 19, 2025
b68b290
Refactor styling and colors for evaluation visualization
htahir1 Aug 19, 2025
9cc3bab
Add visualization and architecture details in README
htahir1 Aug 19, 2025
3381e7f
Delete forecasting excalidraw file
htahir1 Aug 19, 2025
159e677
Add pipelines section to README.md
htahir1 Aug 19, 2025
5d5c4d6
Add FloraCast architecture diagram to README.md
htahir1 Aug 19, 2025
9b2060b
Update architecture.png image file
htahir1 Aug 19, 2025
768bda2
PR review
htahir1 Aug 19, 2025
32ba967
Update training and inference configurations
htahir1 Aug 19, 2025
9e2690a
Refactor batch inference step function definition
htahir1 Aug 19, 2025
17be69e
Add new imports and reorganize existing ones
htahir1 Aug 19, 2025
0379052
Update floracast/README.md
htahir1 Aug 20, 2025
38b1ad7
Update floracast/README.md
htahir1 Aug 20, 2025
793685d
Formatting
htahir1 Aug 20, 2025
0a12011
Formatting
htahir1 Aug 20, 2025
ec03f95
Merge remote-tracking branch 'origin/main' into project/dartsexample
htahir1 Aug 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ etc.
| [Vertex Registry and Deployer](vertex-registry-and-deployer) | 🚀 MLOps | 📦 Model Registry, 🚀 Deployment | vertex, gcp, zenml |
| [Eurorate Predictor](eurorate-predictor) | 📊 Data | ⏱️ Time Series, 🧹 ETL | airflow, bigquery, xgboost |
| [RetailForecast](retail-forecast) | 📊 Data | ⏱️ Time Series, 📈 Forecasting, 🔄 Multi-Model | prophet, zenml, pandas |
| [FloraCast](floracast) | 📊 Data | ⏱️ Timeseries Prediction, 📈 Forecasting, 🔄 Batch Inference | darts, pytorch, zenml, pandas |
| [Bank Subscription Prediction](bank_subscription_prediction) | 📊 Data | 💼 Classification, ⚖️ Imbalanced Data, 🔍 Feature Selection | xgboost, plotly, zenml |
| [Credit Scorer](credit-scorer) | 📊 Data | 💰 Credit Risk, 📊 Explainability, 🇪🇺 EU AI Act | scikit-learn, fairlearn, zenml |

Expand Down
2 changes: 1 addition & 1 deletion credit-scorer/src/steps/deployment/post_run_annex.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def generate_enhanced_annex_iv_html(
"name", "Credit Scoring Pipeline"
)
pipeline_version = metadata.get("pipeline", {}).get("version", "Unknown")
pipeline_run = metadata.get("pipeline_run", {})
_ = metadata.get("pipeline_run", {})
stack_info = metadata.get("stack", {})
git_info = metadata.get("git_info", {})

Expand Down
343 changes: 343 additions & 0 deletions floracast/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,343 @@
# 🌸 FloraCast: Building a Forecasting Platform, Not Just Models

A production-ready MLOps pipeline for **timeseries prediction** and **forecasting** using ZenML and [Darts](https://unit8co.github.io/darts/index.html), designed for enterprise demand and sales forecasting across retail, e-commerce, and supply chain use cases.

## 🚀 Product Overview

FloraCast demonstrates how to build end-to-end MLOps workflows for time series forecasting. Built with ZenML's robust framework, it showcases enterprise-grade machine learning pipelines that can be deployed in both development and production environments.

Focus: **Timeseries Prediction** and **Forecasting**.

### Key Features

- **End-to-End Timeseries Prediction & Forecasting Pipeline**: From data ingestion to batch inference on a schedule.
- **Darts Integration**: Support for advanced forecasting models like [TFT (Temporal Fusion Transformer)](https://unit8co.github.io/darts/generated_api/darts.models.forecasting.tft_model.html)
- **Custom Materializers**: Production-ready artifact handling with visualizations
- **Model Versioning**: Track and compare different model versions
- **Flexible Configuration**: YAML-based configuration for different environments
- **Cloud Ready**: Built with EKS/GKE/AKS deployment in mind

## 💡 How It Works

### ✈️ Pipelines

FloraCast consists of two main pipelines:

#### 1. Training Pipeline

The training pipeline handles the complete ML workflow:

1. **Data Ingestion** - Loads ecommerce sales data (synthetic by default)
2. **Preprocessing** - Converts to Darts TimeSeries with train/validation split
3. **Model Training** - Trains TFT model with configurable parameters
4. **Evaluation** - Computes SMAPE metrics on validation set

![Model Evaluation Results](assets/eval_vis.png)
*FloraCast achieves excellent forecasting performance with SMAPE scores under 13, showing AI predictions closely tracking ground truth data.*

#### 2. Batch Inference Pipeline

The inference pipeline generates predictions using trained models:

1. **Data Ingestion** - Loads new data for predictions
2. **Preprocessing** - Applies same transformations as training
3. **Batch Inference** - Generates forecasts and saves to CSV

![Batch Inference Visualization](assets/batch_inference_timeseries_viz.png)
*Automated batch inference generates future predictions on a schedule, enabling proactive business planning and inventory management.*

### 🔧 Architecture

![FloraCast Architecture](assets/architecture.png)
*Complete system architecture showing data flow through ZenML pipelines, from raw data ingestion to model training, evaluation, and automated batch inference.*

![ZenML Model Control Plane](assets/mcp_floracast.png)
*FloraCast leverages ZenML's Model Control Plane for enterprise-grade model versioning, lineage tracking, and automated deployment workflows.*

## 📦 Installation

### Prerequisites

- Python 3.9+
- [Deployed ZenML](https://docs.zenml.io/deploying-zenml/deploying-zenml)
- Virtual environment (recommended)

### Setup

1. **Clone the repository** (if part of zenml-projects):
```bash
cd zenml-projects/floracast
```

2. **Create virtual environment**:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
```

3. **Install dependencies**:
```bash
pip install -r requirements.txt
```

## ⚡ Quick Start

### Local Development

1. **Run training pipeline**:
```bash
python run.py --config configs/training.yaml --pipeline train
```

2. **Run inference pipeline**:
```bash
python run.py --config configs/inference.yaml --pipeline inference
```

3. **View results**:
- Check the predictions artifact for predictions
- Use ZenML dashboard to view artifacts and metrics

## ⚙️ Configuration Files

FloraCast uses semantically named configuration files for different deployment scenarios:

### Available Configurations

- **`configs/training.yaml`** - Local development and training pipeline configuration
- **`configs/inference.yaml`** - Batch inference pipeline configuration for production models

### Customization Options

Edit the appropriate config file to customize:

- **Model parameters**: TFT hyperparameters, training epochs
- **Data settings**: Date columns, frequency, validation split
- **Evaluation**: Forecasting horizon, metrics
- **Output**: File paths and formats

```
floracast/
├── README.md
├── requirements.txt
├── .env.example
├── configs/
│ ├── training.yaml # Training pipeline config
│ └── inference.yaml # Inference pipeline config
├── data/
│ └── ecommerce_daily.csv # Example input data
├── pipelines/
│ ├── train_forecast_pipeline.py # Training pipeline definition
│ └── batch_inference_pipeline.py # Batch inference pipeline definition
├── steps/
│ ├── ingest.py # Data ingestion step
│ ├── preprocess.py # Preprocessing step (train/val split, scaling)
│ ├── train.py # Model training step
│ ├── evaluate.py # Model evaluation step
│ ├── promote.py # Model registration/promotion step
│ ├── batch_infer.py # Batch inference step
│ └── load_model.py # Model loading utilities
├── materializers/
│ ├── tft_materializer.py # Custom TFTModel materializer
│ └── timeseries_materializer.py # Custom TimeSeries materializer
├── utils/
│ └── metrics.py # Forecasting metrics (e.g., SMAPE)
└── run.py # CLI entry point for running pipelines
```

### Key Components

- **Custom Materializers**: Proper serialization for Darts TimeSeries with visualizations
- **Flexible Models**: TFT primary, ExponentialSmoothing fallback
- **Comprehensive Logging**: Detailed pipeline execution tracking
- **Artifact Naming**: Clear, descriptive names for all pipeline outputs

## 🚀 Production Deployment

### ZenML Azure Stack Setup

To run FloraCast on Azure with ZenML, set up a ZenML stack backed by Azure services:

- **Artifact Store**: Azure Blob Storage
- **Container Registry**: Azure Container Registry (ACR)
- **Orchestrator**: Kubernetes Orchestrator targeting AKS
- **Optional**: AzureML Step Operator for managed training; Azure Key Vault for secrets

Quick start (CLI):

```bash
# Artifact Store (Azure Blob)
zenml artifact-store register azure_store --flavor=azure \
--account_name=<AZURE_STORAGE_ACCOUNT> \
--container=<AZURE_STORAGE_CONTAINER>

# Container Registry (ACR)
zenml container-registry register azure_acr --flavor=azure \
--uri=<ACR_LOGIN_SERVER>

# Orchestrator (AKS via Kubernetes)
zenml orchestrator register aks_k8s --flavor=kubernetes \
--kubernetes_context=<AKS_KUBE_CONTEXT> \
--namespace=<NAMESPACE>

# (Optional) AzureML Step Operator
zenml step-operator register azureml_ops --flavor=azureml \
--subscription_id=<SUBSCRIPTION_ID> \
--resource_group=<RESOURCE_GROUP> \
--workspace_name=<AML_WORKSPACE>

# Compose the stack
zenml stack register azure_aks_stack \
-a azure_store -c azure_acr -o aks_k8s --set
```

Read more:

- **Set up an MLOps stack on Azure**: [ZenML Azure guide](https://docs.zenml.io/stacks/popular-stacks/azure-guide)
- **Kubernetes Orchestrator (AKS)**: [Docs](https://docs.zenml.io/stacks/stack-components/orchestrators/kubernetes)
- **Azure Blob Artifact Store**: [Docs](https://docs.zenml.io/stacks/stack-components/artifact-stores/azure)
- **Azure Container Registry**: [Docs](https://docs.zenml.io/stacks/stack-components/container-registries/azure)
- **AzureML Step Operator**: [Docs](https://docs.zenml.io/stacks/stack-components/step-operators/azureml)
- **Terraform stack recipe for Azure**: [Hashicorp Registry](https://registry.terraform.io/modules/zenml-io/zenml-stack/azure/latest)


## 🔬 Advanced Usage

### Custom Data Sources

Replace the default ecommerce data:

1. **Update configuration**:
```yaml
# configs/training.yaml
steps:
ingest_data:
parameters:
data_source: "csv" # or "ecommerce_default"
data_path: "path/to/your/data.csv"
datetime_col: "timestamp"
target_col: "sales"
preprocess_for_training:
parameters:
datetime_col: "timestamp"
target_col: "sales"
freq: "D"
val_ratio: 0.2
```

```yaml
# configs/inference.yaml
steps:
ingest_data:
parameters:
data_source: "csv"
data_path: "path/to/your/data.csv"
datetime_col: "timestamp"
target_col: "sales"
preprocess_for_inference:
parameters:
datetime_col: "timestamp"
target_col: "sales"
freq: "D"
```

2. **Ensure data format**:
- DateTime index column
- Numeric target variable
- Daily frequency (or update `freq` parameter)

### Model Experimentation

Try different forecasting models by updating the config:

```yaml
# configs/training.yaml
steps:
train_model:
parameters:
model_name: "ExponentialSmoothing" # Switch from TFT to ES
# Note: ExponentialSmoothing uses default params in code currently
```

### Custom Metrics

Extend `utils/metrics.py` to add additional forecasting metrics:

```python
from typing import Union
import numpy as np
from darts import TimeSeries

def mase(actual: Union[TimeSeries, np.ndarray], predicted: Union[TimeSeries, np.ndarray]) -> float:
"""Mean Absolute Scaled Error (example stub)."""
# Implement MASE here
return 0.0
```

Update `steps/evaluate.py` to import and route to the new metric:

```python
from utils.metrics import smape, mase

# ... inside evaluate(...)
if metric == "smape":
score = smape(actual, predictions_for_eval)
elif metric == "mase":
score = mase(actual, predictions_for_eval)
else:
raise ValueError(f"Unknown metric: {metric}")
```

Then configure the metric in your training config (after updating `evaluate`):

```yaml
# configs/training.yaml
steps:
evaluate:
parameters:
metric: "mase"
```

## 🤝 Contributing

FloraCast follows ZenML best practices and is designed to be extended:

1. **Add New Models**: Extend `steps/train.py` with additional Darts models
2. **Custom Materializers**: Create materializers for new data types
3. **Additional Metrics**: Expand evaluation capabilities
4. **New Data Sources**: Add support for different input formats

## 📝 Next Steps

After running FloraCast successfully:

1. **Explore ZenML Dashboard**: View pipeline runs, artifacts, and metrics
2. **Experiment with Models**: Try different TFT configurations
3. **Add Real Data**: Replace synthetic data with your forecasting use case
4. **Deploy to Production**: Use AKS configuration for scale

## 🆘 Troubleshooting

### Common Issues

**TFT Training Fails**:
- Check `add_relative_index: true` in configuration
- Verify sufficient data length (>30 points for input_chunk_length=30)

**Materializer Errors**:
- Ensure datetime columns are properly formatted
- Check that TimeSeries can be created from your data

**Memory Issues**:
- Reduce `batch_size` or `hidden_size` in model parameters
- Use ExponentialSmoothing for lighter resource usage

## 📚 Resources

- [ZenML Documentation](https://docs.zenml.io/)
- [Darts Documentation](https://unit8co.github.io/darts/)
- [Azure ML Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/)

---

Built with ❤️ using [ZenML](https://zenml.io) - *The MLOps Framework for Production AI*
3 changes: 3 additions & 0 deletions floracast/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
"""FloraCast - ZenML Forecasting Template for DFG."""

__version__ = "0.1.0"
Binary file added floracast/assets/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added floracast/assets/eval_vis.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added floracast/assets/mcp_floracast.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 26 additions & 0 deletions floracast/configs/inference.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# FloraCast Inference Configuration
# This config is used for running batch inference with production models

settings:
docker:
requirements: requirements.txt
python_package_installer: uv

model:
name: floracast_tft
version: production # Use production model for inference

steps:
ingest_data:
parameters:
data_source: "ecommerce_default"
data_path: null
datetime_col: "ds"
target_col: "y"

batch_inference_predict:
parameters:
datetime_col: "ds"
target_col: "y"
freq: "D"
horizon: 14
Loading