Aggregating seemingly different latent spaces.
Setup the development environment:
git clone git@github.com:crisostomi/latent-aggregation.git
cd latent-aggregation
conda env create -f env.yaml
conda activate la
pre-commit install
Run the tests:
pre-commit run --all-files
We use HuggingFace Datasets throughout the project; assuming you already have a HF account (create one if you don't), you will have to login via
huggingface-cli login
which will prompt you to either create a new token or paste an existing one.
Re-install the project in edit mode:
pip install -e '.[dev]'
Each experiment exp_name
in part_shared_part_novel, same_classes_disj_samples, totally_disjoint
has three scripts:
prepare_data_${exp_name}.py
divides the data in tasks according to what the experiment expects;run_${exp_name}.py
trains the task-specific models and uses them to embed the data for each task;analyze_${exp_name}.py
obtains the results for the experiment.
Each script has a corresponding conf file in conf/
with the same name.
So, to run the part_shared_part_novel
, you have to first configure the experiment in conf/prepare_data_part_shared_part_novel.yaml
. In this case, you have to choose a value for num_shared_classes
and num_novel_classes_per_task
. Now you will prepare the data via
python src/la/scripts/prepare_data_part_shared_part_novel.py
this will populate the data/${dataset_name}/part_shared_part_novel/
folder. Then you'll embed the data by running
python src/la/scripts/run_part_shared_part_novel.py
so that now you will have the encoded data in data/${dataset_name}/part_shared_part_novel/S${num_shared_classes}_N${num_novel_classes_per_task}
.
Having all the latent spaces, you can now run the actual experiment and collect the results by running
python src/la/scripts/analyze_part_shared_part_novel.py
The results can now be found in results/part_shared_part_novel
.