Skip to content

Commit 43aefb4

Browse files
authored
Merge pull request #241 from zenml-io/project/dartsexample
Add a new project focused on Darts library + timeseries forecasting
2 parents ad3b5d3 + ec03f95 commit 43aefb4

30 files changed

+3287
-4
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ etc.
6969
| [Vertex Registry and Deployer](vertex-registry-and-deployer) | 🚀 MLOps | 📦 Model Registry, 🚀 Deployment | vertex, gcp, zenml |
7070
| [Eurorate Predictor](eurorate-predictor) | 📊 Data | ⏱️ Time Series, 🧹 ETL | airflow, bigquery, xgboost |
7171
| [RetailForecast](retail-forecast) | 📊 Data | ⏱️ Time Series, 📈 Forecasting, 🔄 Multi-Model | prophet, zenml, pandas |
72+
| [FloraCast](floracast) | 📊 Data | ⏱️ Timeseries Prediction, 📈 Forecasting, 🔄 Batch Inference | darts, pytorch, zenml, pandas |
7273
| [Bank Subscription Prediction](bank_subscription_prediction) | 📊 Data | 💼 Classification, ⚖️ Imbalanced Data, 🔍 Feature Selection | xgboost, plotly, zenml |
7374
| [Credit Scorer](credit-scorer) | 📊 Data | 💰 Credit Risk, 📊 Explainability, 🇪🇺 EU AI Act | scikit-learn, fairlearn, zenml |
7475

credit-scorer/src/steps/deployment/post_run_annex.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ def generate_enhanced_annex_iv_html(
175175
"name", "Credit Scoring Pipeline"
176176
)
177177
pipeline_version = metadata.get("pipeline", {}).get("version", "Unknown")
178-
pipeline_run = metadata.get("pipeline_run", {})
178+
_ = metadata.get("pipeline_run", {})
179179
stack_info = metadata.get("stack", {})
180180
git_info = metadata.get("git_info", {})
181181

floracast/README.md

Lines changed: 343 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,343 @@
1+
# 🌸 FloraCast: Building a Forecasting Platform, Not Just Models
2+
3+
A production-ready MLOps pipeline for **timeseries prediction** and **forecasting** using ZenML and [Darts](https://unit8co.github.io/darts/index.html), designed for enterprise demand and sales forecasting across retail, e-commerce, and supply chain use cases.
4+
5+
## 🚀 Product Overview
6+
7+
FloraCast demonstrates how to build end-to-end MLOps workflows for time series forecasting. Built with ZenML's robust framework, it showcases enterprise-grade machine learning pipelines that can be deployed in both development and production environments.
8+
9+
Focus: **Timeseries Prediction** and **Forecasting**.
10+
11+
### Key Features
12+
13+
- **End-to-End Timeseries Prediction & Forecasting Pipeline**: From data ingestion to batch inference on a schedule.
14+
- **Darts Integration**: Support for advanced forecasting models like [TFT (Temporal Fusion Transformer)](https://unit8co.github.io/darts/generated_api/darts.models.forecasting.tft_model.html)
15+
- **Custom Materializers**: Production-ready artifact handling with visualizations
16+
- **Model Versioning**: Track and compare different model versions
17+
- **Flexible Configuration**: YAML-based configuration for different environments
18+
- **Cloud Ready**: Built with EKS/GKE/AKS deployment in mind
19+
20+
## 💡 How It Works
21+
22+
### ✈️ Pipelines
23+
24+
FloraCast consists of two main pipelines:
25+
26+
#### 1. Training Pipeline
27+
28+
The training pipeline handles the complete ML workflow:
29+
30+
1. **Data Ingestion** - Loads ecommerce sales data (synthetic by default)
31+
2. **Preprocessing** - Converts to Darts TimeSeries with train/validation split
32+
3. **Model Training** - Trains TFT model with configurable parameters
33+
4. **Evaluation** - Computes SMAPE metrics on validation set
34+
35+
![Model Evaluation Results](assets/eval_vis.png)
36+
*FloraCast achieves excellent forecasting performance with SMAPE scores under 13, showing AI predictions closely tracking ground truth data.*
37+
38+
#### 2. Batch Inference Pipeline
39+
40+
The inference pipeline generates predictions using trained models:
41+
42+
1. **Data Ingestion** - Loads new data for predictions
43+
2. **Preprocessing** - Applies same transformations as training
44+
3. **Batch Inference** - Generates forecasts and saves to CSV
45+
46+
![Batch Inference Visualization](assets/batch_inference_timeseries_viz.png)
47+
*Automated batch inference generates future predictions on a schedule, enabling proactive business planning and inventory management.*
48+
49+
### 🔧 Architecture
50+
51+
![FloraCast Architecture](assets/architecture.png)
52+
*Complete system architecture showing data flow through ZenML pipelines, from raw data ingestion to model training, evaluation, and automated batch inference.*
53+
54+
![ZenML Model Control Plane](assets/mcp_floracast.png)
55+
*FloraCast leverages ZenML's Model Control Plane for enterprise-grade model versioning, lineage tracking, and automated deployment workflows.*
56+
57+
## 📦 Installation
58+
59+
### Prerequisites
60+
61+
- Python 3.9+
62+
- [Deployed ZenML](https://docs.zenml.io/deploying-zenml/deploying-zenml)
63+
- Virtual environment (recommended)
64+
65+
### Setup
66+
67+
1. **Clone the repository** (if part of zenml-projects):
68+
```bash
69+
cd zenml-projects/floracast
70+
```
71+
72+
2. **Create virtual environment**:
73+
```bash
74+
python -m venv venv
75+
source venv/bin/activate # On Windows: venv\\Scripts\\activate
76+
```
77+
78+
3. **Install dependencies**:
79+
```bash
80+
pip install -r requirements.txt
81+
```
82+
83+
## ⚡ Quick Start
84+
85+
### Local Development
86+
87+
1. **Run training pipeline**:
88+
```bash
89+
python run.py --config configs/training.yaml --pipeline train
90+
```
91+
92+
2. **Run inference pipeline**:
93+
```bash
94+
python run.py --config configs/inference.yaml --pipeline inference
95+
```
96+
97+
3. **View results**:
98+
- Check the predictions artifact for predictions
99+
- Use ZenML dashboard to view artifacts and metrics
100+
101+
## ⚙️ Configuration Files
102+
103+
FloraCast uses semantically named configuration files for different deployment scenarios:
104+
105+
### Available Configurations
106+
107+
- **`configs/training.yaml`** - Local development and training pipeline configuration
108+
- **`configs/inference.yaml`** - Batch inference pipeline configuration for production models
109+
110+
### Customization Options
111+
112+
Edit the appropriate config file to customize:
113+
114+
- **Model parameters**: TFT hyperparameters, training epochs
115+
- **Data settings**: Date columns, frequency, validation split
116+
- **Evaluation**: Forecasting horizon, metrics
117+
- **Output**: File paths and formats
118+
119+
```
120+
floracast/
121+
├── README.md
122+
├── requirements.txt
123+
├── .env.example
124+
├── configs/
125+
│ ├── training.yaml # Training pipeline config
126+
│ └── inference.yaml # Inference pipeline config
127+
├── data/
128+
│ └── ecommerce_daily.csv # Example input data
129+
├── pipelines/
130+
│ ├── train_forecast_pipeline.py # Training pipeline definition
131+
│ └── batch_inference_pipeline.py # Batch inference pipeline definition
132+
├── steps/
133+
│ ├── ingest.py # Data ingestion step
134+
│ ├── preprocess.py # Preprocessing step (train/val split, scaling)
135+
│ ├── train.py # Model training step
136+
│ ├── evaluate.py # Model evaluation step
137+
│ ├── promote.py # Model registration/promotion step
138+
│ ├── batch_infer.py # Batch inference step
139+
│ └── load_model.py # Model loading utilities
140+
├── materializers/
141+
│ ├── tft_materializer.py # Custom TFTModel materializer
142+
│ └── timeseries_materializer.py # Custom TimeSeries materializer
143+
├── utils/
144+
│ └── metrics.py # Forecasting metrics (e.g., SMAPE)
145+
└── run.py # CLI entry point for running pipelines
146+
```
147+
148+
### Key Components
149+
150+
- **Custom Materializers**: Proper serialization for Darts TimeSeries with visualizations
151+
- **Flexible Models**: TFT primary, ExponentialSmoothing fallback
152+
- **Comprehensive Logging**: Detailed pipeline execution tracking
153+
- **Artifact Naming**: Clear, descriptive names for all pipeline outputs
154+
155+
## 🚀 Production Deployment
156+
157+
### ZenML Azure Stack Setup
158+
159+
To run FloraCast on Azure with ZenML, set up a ZenML stack backed by Azure services:
160+
161+
- **Artifact Store**: Azure Blob Storage
162+
- **Container Registry**: Azure Container Registry (ACR)
163+
- **Orchestrator**: Kubernetes Orchestrator targeting AKS
164+
- **Optional**: AzureML Step Operator for managed training; Azure Key Vault for secrets
165+
166+
Quick start (CLI):
167+
168+
```bash
169+
# Artifact Store (Azure Blob)
170+
zenml artifact-store register azure_store --flavor=azure \
171+
--account_name=<AZURE_STORAGE_ACCOUNT> \
172+
--container=<AZURE_STORAGE_CONTAINER>
173+
174+
# Container Registry (ACR)
175+
zenml container-registry register azure_acr --flavor=azure \
176+
--uri=<ACR_LOGIN_SERVER>
177+
178+
# Orchestrator (AKS via Kubernetes)
179+
zenml orchestrator register aks_k8s --flavor=kubernetes \
180+
--kubernetes_context=<AKS_KUBE_CONTEXT> \
181+
--namespace=<NAMESPACE>
182+
183+
# (Optional) AzureML Step Operator
184+
zenml step-operator register azureml_ops --flavor=azureml \
185+
--subscription_id=<SUBSCRIPTION_ID> \
186+
--resource_group=<RESOURCE_GROUP> \
187+
--workspace_name=<AML_WORKSPACE>
188+
189+
# Compose the stack
190+
zenml stack register azure_aks_stack \
191+
-a azure_store -c azure_acr -o aks_k8s --set
192+
```
193+
194+
Read more:
195+
196+
- **Set up an MLOps stack on Azure**: [ZenML Azure guide](https://docs.zenml.io/stacks/popular-stacks/azure-guide)
197+
- **Kubernetes Orchestrator (AKS)**: [Docs](https://docs.zenml.io/stacks/stack-components/orchestrators/kubernetes)
198+
- **Azure Blob Artifact Store**: [Docs](https://docs.zenml.io/stacks/stack-components/artifact-stores/azure)
199+
- **Azure Container Registry**: [Docs](https://docs.zenml.io/stacks/stack-components/container-registries/azure)
200+
- **AzureML Step Operator**: [Docs](https://docs.zenml.io/stacks/stack-components/step-operators/azureml)
201+
- **Terraform stack recipe for Azure**: [Hashicorp Registry](https://registry.terraform.io/modules/zenml-io/zenml-stack/azure/latest)
202+
203+
204+
## 🔬 Advanced Usage
205+
206+
### Custom Data Sources
207+
208+
Replace the default ecommerce data:
209+
210+
1. **Update configuration**:
211+
```yaml
212+
# configs/training.yaml
213+
steps:
214+
ingest_data:
215+
parameters:
216+
data_source: "csv" # or "ecommerce_default"
217+
data_path: "path/to/your/data.csv"
218+
datetime_col: "timestamp"
219+
target_col: "sales"
220+
preprocess_for_training:
221+
parameters:
222+
datetime_col: "timestamp"
223+
target_col: "sales"
224+
freq: "D"
225+
val_ratio: 0.2
226+
```
227+
228+
```yaml
229+
# configs/inference.yaml
230+
steps:
231+
ingest_data:
232+
parameters:
233+
data_source: "csv"
234+
data_path: "path/to/your/data.csv"
235+
datetime_col: "timestamp"
236+
target_col: "sales"
237+
preprocess_for_inference:
238+
parameters:
239+
datetime_col: "timestamp"
240+
target_col: "sales"
241+
freq: "D"
242+
```
243+
244+
2. **Ensure data format**:
245+
- DateTime index column
246+
- Numeric target variable
247+
- Daily frequency (or update `freq` parameter)
248+
249+
### Model Experimentation
250+
251+
Try different forecasting models by updating the config:
252+
253+
```yaml
254+
# configs/training.yaml
255+
steps:
256+
train_model:
257+
parameters:
258+
model_name: "ExponentialSmoothing" # Switch from TFT to ES
259+
# Note: ExponentialSmoothing uses default params in code currently
260+
```
261+
262+
### Custom Metrics
263+
264+
Extend `utils/metrics.py` to add additional forecasting metrics:
265+
266+
```python
267+
from typing import Union
268+
import numpy as np
269+
from darts import TimeSeries
270+
271+
def mase(actual: Union[TimeSeries, np.ndarray], predicted: Union[TimeSeries, np.ndarray]) -> float:
272+
"""Mean Absolute Scaled Error (example stub)."""
273+
# Implement MASE here
274+
return 0.0
275+
```
276+
277+
Update `steps/evaluate.py` to import and route to the new metric:
278+
279+
```python
280+
from utils.metrics import smape, mase
281+
282+
# ... inside evaluate(...)
283+
if metric == "smape":
284+
score = smape(actual, predictions_for_eval)
285+
elif metric == "mase":
286+
score = mase(actual, predictions_for_eval)
287+
else:
288+
raise ValueError(f"Unknown metric: {metric}")
289+
```
290+
291+
Then configure the metric in your training config (after updating `evaluate`):
292+
293+
```yaml
294+
# configs/training.yaml
295+
steps:
296+
evaluate:
297+
parameters:
298+
metric: "mase"
299+
```
300+
301+
## 🤝 Contributing
302+
303+
FloraCast follows ZenML best practices and is designed to be extended:
304+
305+
1. **Add New Models**: Extend `steps/train.py` with additional Darts models
306+
2. **Custom Materializers**: Create materializers for new data types
307+
3. **Additional Metrics**: Expand evaluation capabilities
308+
4. **New Data Sources**: Add support for different input formats
309+
310+
## 📝 Next Steps
311+
312+
After running FloraCast successfully:
313+
314+
1. **Explore ZenML Dashboard**: View pipeline runs, artifacts, and metrics
315+
2. **Experiment with Models**: Try different TFT configurations
316+
3. **Add Real Data**: Replace synthetic data with your forecasting use case
317+
4. **Deploy to Production**: Use AKS configuration for scale
318+
319+
## 🆘 Troubleshooting
320+
321+
### Common Issues
322+
323+
**TFT Training Fails**:
324+
- Check `add_relative_index: true` in configuration
325+
- Verify sufficient data length (>30 points for input_chunk_length=30)
326+
327+
**Materializer Errors**:
328+
- Ensure datetime columns are properly formatted
329+
- Check that TimeSeries can be created from your data
330+
331+
**Memory Issues**:
332+
- Reduce `batch_size` or `hidden_size` in model parameters
333+
- Use ExponentialSmoothing for lighter resource usage
334+
335+
## 📚 Resources
336+
337+
- [ZenML Documentation](https://docs.zenml.io/)
338+
- [Darts Documentation](https://unit8co.github.io/darts/)
339+
- [Azure ML Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/)
340+
341+
---
342+
343+
Built with ❤️ using [ZenML](https://zenml.io) - *The MLOps Framework for Production AI*

floracast/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
"""FloraCast - ZenML Forecasting Template for DFG."""
2+
3+
__version__ = "0.1.0"

floracast/assets/architecture.png

847 KB
Loading
196 KB
Loading

floracast/assets/eval_vis.png

251 KB
Loading

floracast/assets/mcp_floracast.png

138 KB
Loading

floracast/configs/inference.yaml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# FloraCast Inference Configuration
2+
# This config is used for running batch inference with production models
3+
4+
settings:
5+
docker:
6+
requirements: requirements.txt
7+
python_package_installer: uv
8+
9+
model:
10+
name: floracast_tft
11+
version: production # Use production model for inference
12+
13+
steps:
14+
ingest_data:
15+
parameters:
16+
data_source: "ecommerce_default"
17+
data_path: null
18+
datetime_col: "ds"
19+
target_col: "y"
20+
21+
batch_inference_predict:
22+
parameters:
23+
datetime_col: "ds"
24+
target_col: "y"
25+
freq: "D"
26+
horizon: 14

0 commit comments

Comments
 (0)