This repository contains the official implementation of SAMformer, a transformer-based model for time series forecasting from
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention. Romain Ilbert*, Ambroise Odonnat*, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, Ievgen Redko.
*Equal contribution.
Click here to access Romain Ilbert's ICML oral presentation on SAMformer.
SAMformer is a lightweight transformer architecture designed for time series forecasting. It uniquely integrates Sharpness-Aware Minimization (SAM) with a Channel-Wise Attention mechanism. This method provides state-of-the-art performance in multivariate long-term forecasting across various forecasting tasks. In particular, SAMformer surpasses TSMixer by
SAMformer takes as input a
💡 Shallow transformer encoder. The neural network at the core of SAMformer is a shallow encoder of a simplified Transformer. Channel-wise attention is applied to the input, followed by a residual connection. Instead of the usual feedforward block, a linear layer is directly applied on top of the residual connection to output the prediction.
💡 Channel-Wise Attention. Contrary to the usual temporal attention in
- Feature permutation invariance, eliminating the need for positional encoding, commonly applied before the attention layer;
- Reduced time and memory complexity as
$D \leq L$ in most of the real-world datasets.
💡 Reversible Instance Normalization (RevIN). The resulting network is equipped with RevIN, a two-step normalization scheme to handle the shift between the training and testing time series.
💡 Sharpness-Aware Minimization (SAM). As suggested by our empirical and theoretical analysis, we optimize the model with SAM to make it converge towards flatter minima, hence improving its generalization capacity.
SAMformer uniquely combines all these components in a lightweight implementation with very few hyperparameters. We display below the resulting architecture.
We conduct our experiments on various multivariate time series forecasting benchmarks.
🥇 Improved performance. SAMformer outperforms its competitors in
🚀 Computational efficiency and versatility. SAMformer has a lightweight implementation with few learnable parameters, contrary to most of its competitors, leading to improved computational efficiency. SAMformer significantly outperforms the SOTA in multivariate time series despite having fewer parameters. In addition, the same architecture is used for all the datasets, while most of the other baselines require heavy hyperparameter tuning, which showcases the versatility of our approach.
📚 Qualitative benefits. We display in our paper the benefits of SAMformer in terms of smoothness of the loss landscape, robustness to the prediction horizons, and signal propagation in the attention layer.
To get started with SAMformer, clone this repository and install the required packages.
git clone https://github.com/romilbert/samformer.git
cd SAMformer
pip install -r requirements.txt
Make sure you have Python 3.8 or a newer version installed.
SAMformer consists of several key modules:
models/
: Contains the SAMformer architecture along with necessary components for normalization and optimization.utils/
: Contains the utilities for data processing, training, callbacks, and to save the results.dataset/
: Directory for storing the datasets used in experiments. For illustration purposes, this directory only contains the ETTh1 dataset in .csv format. You can download all the datasets used in our experiments (ETTh1, ETTh2, ETTm1, ETTm2, electricity, weather, traffic, exchange_rate) here.
To launch the training and evaluation process, use the run_script.sh
script with the appropriate arguments :
sh run_script.sh -m [model_name] -d [dataset_name] -s [sequence_length] -u -a
-m
: Model name.-d
: Dataset name.-s
: Sequence length. The default is 512.-u
: Activate Sharpness-Aware Minimization (SAM). Optional.-a
: Activate additional results saving. Optional.
sh run_script.sh -m transformer -d ETTh1 -u -a
Do not hesitate to contribute to this project, we would be happy to receive feedback and integrate your suggestions.
The code is distributed under the MIT license.
Romain Ilbert led the development of SAMformer, including the model's architecture, codebase, experimental design and co-led the writing process. Ambroise Odonnat contributed to the theoretical insights, and co-led the writing process. Vasilii Feofanov provided the PyTorch implementation of the SAMformer's model. All authors contributed to discussions and writing. Correspondence to romain.ilbert@hotmail.fr and ambroiseodonnattechnologie@gmail.com.
This work was conducted on the Jean Zay supercomputer at IDRIS, with access granted under the GENCI allocation AD011013858R1-ILBERT. We sincerely thank IDRIS and GENCI for providing the computational resources that made this research possible.
We would like to express our gratitude to all the researchers and developers whose work and open-source software have contributed to the development of SAMformer. Special thanks to the authors of SAM, TSMixer, RevIN and
If you find this work useful in your research, please cite:
@InProceedings{ilbert2024samformer,
title = {SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention},
author = {Ilbert, Romain and Odonnat, Ambroise and Feofanov, Vasilii and Virmaux, Aladin and Paolo, Giuseppe and Palpanas, Themis and Redko, Ievgen},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
year = {2024},
volume = {235},
publisher = {PMLR},
url = {https://proceedings.mlr.press/v235/ilbert24a.html},
}