Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time series features analysis #13

Merged
merged 1 commit into from
Jan 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,33 @@ Submit your results to the leaderboard by creating a pull request that adds your

The final `all_results.csv` file should contain `98` lines (one for each dataset configuration) and `15` columns: `4` for dataset, model, domain and num_variates and `11` for the evaluation metrics.

## Time Series Features Analysis

Add NUM_CPUS to your .env file to run the analysis in parallel.

```
echo "NUM_CPUS={N}" >> .env
```

To replicate the time series feature analysis in the paper, run the following command:

```
python -m cli.analysis datasets=all_datasets
```
This will run the analysis for all the datasets in the benchmark and generate two folders under `outputs/analysis/test`:
1. `datasets`: This folder contains the individual features for each dataset along with some some plots visualizing those features.
2. `all_datasets`: This folder contains the aggregated features for all the datasets along with some some plots visualizing those features.

Note: Expect the analysis to take long, we recommend running it on a large cpu cluster and setting the `NUM_CPUS` environment variable to the number of cores you have access to.

If you just want to try the analysis out you can run it with a few datasets by creating a new config file in the `cli/conf/analysis/datasets` folder. Follow the [`sample`](cli/conf/analysis/datasets/sample.yaml) file shared.

```
python -m cli.analysis datasets=sample
```



## Citation
If you find this benchmark useful, please consider citing:

Expand Down
Empty file added cli/__init__.py
Empty file.
32 changes: 32 additions & 0 deletions cli/analysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

import hydra
from hydra.utils import instantiate
from omegaconf import DictConfig
from hydra.core.hydra_config import HydraConfig
from gift_eval.analysis.utils import plot_histogram


@hydra.main(version_base="1.3", config_path="conf/analysis", config_name="default")
def main(cfg: DictConfig):
output_dir = HydraConfig.get().runtime.output_dir
analyzer = instantiate(cfg.analyzer, _convert_="all")
analyzer.print_datasets()

print(analyzer.freq_distribution_by_dataset)
print(analyzer.freq_distribution_by_ts)
print(analyzer.freq_distribution_by_ts_length)
print(analyzer.freq_distribution_by_window)

# plot a histogram of all three frequncy distributions and save it to output_dir
plot_histogram(analyzer.freq_distribution_by_dataset,
"dataset", output_dir)
plot_histogram(analyzer.freq_distribution_by_ts, "time series", output_dir)
plot_histogram(analyzer.freq_distribution_by_ts_length,
"ts length", output_dir)
plot_histogram(analyzer.freq_distribution_by_window, "window", output_dir)

analyzer.features_by_window(output_dir)


if __name__ == "__main__":
main()
222 changes: 222 additions & 0 deletions cli/conf/analysis/datasets/all_datasets.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
name : all_datasets
datasets:
- _target_: gift_eval.data.Dataset
name: m4_yearly
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: m4_quarterly
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: m4_monthly
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: m4_weekly
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: m4_daily
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: m4_hourly
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: electricity/15T
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: electricity/H
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: electricity/D
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: electricity/W
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: solar/10T
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: solar/H
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: solar/D
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: solar/W
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: hospital
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: covid_deaths
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: us_births/D
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: us_births/M
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: us_births/W
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: saugeenday/D
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: saugeenday/M
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: saugeenday/W
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: temperature_rain_with_missing
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: kdd_cup_2018_with_missing/H
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: kdd_cup_2018_with_missing/D
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: car_parts_with_missing
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: restaurant
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: hierarchical_sales/D
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: hierarchical_sales/W
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: LOOP_SEATTLE/5T
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: LOOP_SEATTLE/H
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: LOOP_SEATTLE/D
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: SZ_TAXI/15T
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: SZ_TAXI/H
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: M_DENSE/H
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: M_DENSE/D
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: ett1/15T
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: ett1/H
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: ett1/D
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: ett1/W
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: ett2/15T
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: ett2/H
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: ett2/D
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: ett2/W
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: jena_weather/10T
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: jena_weather/H
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: jena_weather/D
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: bitbrains_fast_storage/5T
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: bitbrains_fast_storage/H
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: bitbrains_rnd/5T
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: bitbrains_rnd/H
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: bizitobs_application
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: bizitobs_service
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: bizitobs_l2c/5T
term: short
to_univariate: true
- _target_: gift_eval.data.Dataset
name: bizitobs_l2c/H
term: short
to_univariate: true
3 changes: 3 additions & 0 deletions cli/conf/analysis/datasets/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
name : default
datasets:

10 changes: 10 additions & 0 deletions cli/conf/analysis/datasets/sample.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name : sample
datasets:
- _target_: gift_eval.data.Dataset
name: m4_weekly
term: short
to_univariate: false
- _target_: gift_eval.data.Dataset
name: m4_hourly
term: short
to_univariate: false
11 changes: 11 additions & 0 deletions cli/conf/analysis/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
defaults:
- datasets: all_datasets
- _self_

hydra:
run:
dir: outputs/${hydra:job.name}/${name}/${datasets.name}
analyzer:
_target_: gift_eval.analysis.Analyzer
datasets: ${datasets.datasets}
name: "test"
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,10 @@ dependencies = [
"hydra-core==1.3",
"datasets~=2.17.1",
"orjson",
"matplotlib~=3.9.2"
"matplotlib~=3.9.2",
"tsfeatures",
"ray",
"scipy~=1.11.3",
]
requires-python = ">=3.10"
authors = [
Expand Down
3 changes: 3 additions & 0 deletions src/gift_eval/analysis/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .analyzer import Analyzer

__all__ = ["Analyzer"]
Loading