This repository contains the code used for the paper DAEMA: Denoising Autoencoder with Mask Attention. The documentation of the code, generated by sphinx, is available here.
Please cite as
@article{tihon2021daema,
title={DAEMA: Denoising Autoencoder with Mask Attention},
author={Tihon, Simon and Javaid, Muhammad Usama and Fourure, Damien and Posocco, Nicolas and Peel, Thomas},
journal={arXiv preprint arXiv:2106.16057},
year={2021}
}
Create and activate the conda environment with python 3.8.2
conda create --name <env-name> python=3.8.2
conda activate <env-name>
Install the libraries listed in requirements.txt
pip install -r requirements.txt
Run the code
cd src
python run.py
The repo also contains Dockerfile to run the code
docker build -t <image_name>:<tag> .
docker run -t --name <container-name> <image_name> <experiment-to-run>
Example:
docker build -t daema:latest .
docker run -t --name daema_container daema:latest python run.py
You can test your installation by running
PYTHONPATH=src/ pytest tests
- DAEMA:
python run.py
- DAE:
python run.py --daema_attention_mode no --daema_ways 1
- AimNet:
python run.py --model Holoclean --batch_size 0 --lr 0.05 --metric_steps 18 19 20 21 22
- MIDA:
python run.py --model MIDA --batch_size -1 --metric_steps 492 494 496 498 500 --scaler MinMax
- MissForest:
python run.py --model MissForest --metric_steps 0 --scaler MinMax
- Mean:
python run.py --model Mean --metric_steps 0
- Real:
python run.py --model Real --metric_steps 0
- Same as above, but with an additional argument:
--ms_setting mnar
- Same as above, but with an additional argument (e.g. for 10% missingness):
--ms_prop 0.1
- Full:
python run.py
- Classic:
python run.py --daema_attention_mode classic
- Sep.:
python run.py --daema_attention_mode sep
- DAEMA:
python run.py
- Reduced loss:
python run.py --daema_loss_type dropout_only
- Full loss:
python run.py --daema_loss_type full
- No art. miss.:
python run.py --daema_pre_drop 0
To test the code on a local dataset:
- put the dataset in
files/data/<name>.csv
; - update the
src/pipeline/datasets/DATASETS
variable to add your dataset; - run the tests;
- use the --datasets argument to select it for the experiments (e.g.
python run.py --datasets <name>
).
To test the code on a custom model:
- implement the model following the expected interface
(see
src/models/baseline_imputations/Identity
for the basic structure); - update the
src/models/__init__/MODELS
variable to add your model; - run the tests;
- use the --model argument to select it for the experiments (e.g.
python run.py --model <Name>
).