Quantifying-Intimacy-in-Language

Official Github Repo for EMNLP 2020 paper Quantifying Intimacy in Language by Jiaxin Pei and David Jurgens.

Data

Annotated question intimacy data:

data/annotated_question_intimacy_data

Code

Python pacakge for intimacy prediction

If pip is installed, question-intimacy could be installed directly via pip:

pip3 install question-intimacy

Pre-trained model

Our model is also available on Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification

#load tokenizer and model, both will be automatically downloaded for the first usage
tokenizer = AutoTokenizer.from_pretrained("pedropei/question-intimacy")
model = AutoModelForSequenceClassification.from_pretrained("pedropei/question-intimacy")

Code to train the intimacy regressor

To fine-tune the roberta-base model over our intimacy dataset

python3 train_intimacy_model.py --mode=train \
--model_name=roberta-base \
--pre_trained_model_name_or_path=roberta-base \
--train_path=data/annotated_question_intimacy_data/final_train.txt \
--val_path=data/annotated_question_intimacy_data/final_val.txt \
--test_path=data/annotated_question_intimacy_data/final_test.txt \
--model_saving_path=outputs

The best model will be saved at outputs/

after training, to get the score on our annotated test and out-domain set,

python3 train_intimacy_model.py --mode=internal-test \
--model_name=roberta-base \
--pre_trained_model_name_or_path=outputs \
--train_path=data/annotated_question_intimacy_data/final_train.txt \
--val_path=data/annotated_question_intimacy_data/final_val.txt \
--test_path=data/annotated_question_intimacy_data/final_test.txt \
--predict_data_path=data/annotated_question_intimacy_data/final_external.txt

to run the fine-tuned model over your own data, prepare a file with a list of input text like data/inference.txt and run the following command

python3 train_intimacy_model.py --mode=inference \
--model_name=roberta-base \
--pre_trained_model_name_or_path=outputs \
--predict_data_path=data/inference.txt \
--test_saving_path=ooo.txt

if you want to do language modeling fine-tuning for the roberta-base model, please checkout the code from Hugging Face Transformers

to train the fine-tuned roberta model over our intimacy dataset, put the model under saved_model and run the following command:

python3 train_intimacy_model.py --mode=train \
--model_name=roberta-ft \
--pre_trained_model_name_or_path=saved_model \
--train_path=data/annotated_question_intimacy_data/final_train.txt \
--val_path=data/annotated_question_intimacy_data/final_val.txt \
--test_path=data/annotated_question_intimacy_data/final_test.txt \
--model_saving_path=outputs

Please email Jiaxin Pei (pedropei@umich.edu) to request the roberta-base model fine-tuned over 3M questions.

Contact

Jiaxin Pei (pedropei@umich.edu)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
train_intimacy_model.py		train_intimacy_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantifying-Intimacy-in-Language

Data

Annotated question intimacy data:

Code

Python pacakge for intimacy prediction

Pre-trained model

Code to train the intimacy regressor

Contact

About

Releases

Packages

Languages

License

Jiaxin-Pei/Quantifying-Intimacy-in-Language

Folders and files

Latest commit

History

Repository files navigation

Quantifying-Intimacy-in-Language

Data

Annotated question intimacy data:

Code

Python pacakge for intimacy prediction

Pre-trained model

Code to train the intimacy regressor

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages