Around the BERT model: from a basic implementation to advanced optimizations and multitask learning

CS224N Final Project

This project is the result of the collaboration between Ines, Yoni, and Joachim for our CS224N Final Project (Winter 2023).

Description

The goal of this project is to develop a simple and efficient architecture for multitask learning in natural language processing, based on a pretrained BERT model. Using BERT contextualized embeddings and making them go through a single additional layer per task, the Multitask BERT consistently achieves decent scores on three target language problems: sentiment analysis, paraphrase detection and semantic textual similarity (less than 20% below state-of-the-art performances).

During training, one could choose to freeze BERT weights and only update additional parameters, but we found that fine-tuning the BERT block to fit the underlying distribution yielded better results

The use of PCGrad to get rid of conflicting gradients, as well as gradient accumulation to boost training and additional datasets to better understand diverse language styles are a few examples of ideas leveraged by our model.

Our main contribution is the simplicity of the model, which requires less that a single specific dense layer per task, contrary to many state-of-the-art papers in existing literature.

This architecture is thus particularly adapted to end users with limited resources willing to simultaneously achieve acceptable baseline results on several downstream language tasks.

Final Report

The final report is available here: http://web.stanford.edu/class/cs224n/final-reports/final-report-169376110.pdf (CS224N website).

Acknowledgement

The BERT implementation part of the project was adapted from the "minbert" assignment developed at Carnegie Mellon University's CS11-711 Advanced NLP, created by Shuyan Zhou, Zhengbao Jiang, Ritam Dutt, Brendon Boldt, Aditya Veerubhotla, and Graham Neubig.

Parts of the code are from the transformers library (Apache License 2.0).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pdf		pdf
predictions		predictions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
STRUCTURE.md		STRUCTURE.md
base_bert.py		base_bert.py
bert.py		bert.py
classifier.py		classifier.py
config.py		config.py
constants.py		constants.py
datasets.py		datasets.py
evaluation.py		evaluation.py
multitask_classifier.py		multitask_classifier.py
optimizer.py		optimizer.py
optimizer_test.npy		optimizer_test.npy
optimizer_test.py		optimizer_test.py
pcgrad.py		pcgrad.py
prepare_submit.py		prepare_submit.py
sanity_check.data		sanity_check.data
sanity_check.py		sanity_check.py
setup.sh		setup.sh
test.py		test.py
test.sbatch		test.sbatch
tokenizer.py		tokenizer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Around the BERT model: from a basic implementation to advanced optimizations and multitask learning

CS224N Final Project

Description

Final Report

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

joa-stdn/multitask_bert

Folders and files

Latest commit

History

Repository files navigation

Around the BERT model: from a basic implementation to advanced optimizations and multitask learning

CS224N Final Project

Description

Final Report

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages