Multi-objective Representation Learning for Scientific Document Retrieval

Download data

To download the training data and the ICLR2022 benchmark for our S3 bucket, run download_data.sh:

source download_data.sh

Training

Single objective training

Command to run the training on independent-cropping:

python train.py \
    --save_dir logs/ \
    --data_dirs ./datasets/training-data/independent-cropping \
    --weights 100 \
    --batch_size 16 \
    --num_workers 2 \
    --steps 200000 \
    --grad_accum 2 \
    --val_check_interval 10000 \
    --pooling mean \
    --loss mnrl \
    --sampling mixed

Multi-objective training

Command to run the training on independent-cropping and unarxiv-q2d using in-batch mixing with 50-50 mix:

python train.py \
    --save_dir logs/ \
    --data_dirs ./datasets/training-data/independent-cropping ./datasets/training-data/unarxiv-q2d \
    --weights 50 50 \
    --batch_size 16 \
    --num_workers 2 \
    --steps 200000 \
    --grad_accum 2 \
    --val_check_interval 10000 \
    --pooling mean \
    --loss mnrl \
    --sampling mixed

Command to run the training on independent-cropping and unarxiv-q2d using alternate batch:

python train.py \
    --save_dir logs/ \
    --data_dirs ./datasets/training-data/independent-cropping ./datasets/training-data/unarxiv-q2d \
    --batch_size 16 \
    --num_workers 2 \
    --steps 200000 \
    --grad_accum 2 \
    --val_check_interval 10000 \
    --pooling mean \
    --loss mnrl \
    --sampling alternate

Evaluate

To run the evaluation on SciDocs, you should download the data following the instructions here: https://github.com/allenai/scidocs . We need the 3 metadata files:

data/paper_metadata_mag_mesh.json
data/paper_metadata_view_cite_read.json
data/paper_metadata_recomm.json

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
utils		utils
LICENSE		LICENSE
README.md		README.md
download_data.sh		download_data.sh
evaluate.py		evaluate.py
generate_embeddings.py		generate_embeddings.py
get_all_metrics.py		get_all_metrics.py
losses.py		losses.py
models.py		models.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-objective Representation Learning for Scientific Document Retrieval

Download data

Training

Single objective training

Multi-objective training

Evaluate

About

Releases

Packages

Languages

License

zetaalphavector/multi-obj-repr-learning

Folders and files

Latest commit

History

Repository files navigation

Multi-objective Representation Learning for Scientific Document Retrieval

Download data

Training

Single objective training

Multi-objective training

Evaluate

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages