DuluthNLP at SemEval-2023 Task 12

This repository contains the source code for our paper DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset. The paper includes a description of a pretrained model, described below, that was trained from scratch on Twi, the predominant language in Ghana.

TwiBERT

Model Description

TwiBERT is a pre-trained language model specifically designed for the Twi language, which is widely spoken in Ghana, West Africa. This model has 61 million parameters, 6 layers, 6 attention heads, 768 hidden units, and a feed-forward size of 3072. To optimize its performance, TwiBERT was trained using a combination of the Asanti Twi Bible and a dataset sourced through crowdsourcing efforts.

Limitations:

The model was trained on a relatively limited dataset (approximately 5MB), which may hinder its ability to learn intricate contextual embeddings and effectively generalize. Additionally, the dataset's focus on the Bible could potentially introduce a strong religious bias in the model's output.

How to use it

You can use TwiBERT by finetuning it on a downtream task. The example code below illustrates how you can use the TwiBERT model on a downtream task:

from transformers import AutoTokenizer, AutoModelForTokenClassification
model = AutoModelForTokenClassification.from_pretrained("sakrah/TwiBERT")
tokenizer = AutoTokenizer.from_pretrained("sakrah/TwiBERT")

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
task_classification		task_classification
taskset		taskset
.gitignore		.gitignore
README.md		README.md
afritwiberta.py		afritwiberta.py
dataset.py		dataset.py
pyvenv.cfg		pyvenv.cfg
requirements.txt		requirements.txt
train_baseline.py		train_baseline.py
train_mbert.py		train_mbert.py
train_roberta.py		train_roberta.py
train_twibert.py		train_twibert.py
twibert.py		twibert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DuluthNLP at SemEval-2023 Task 12

TwiBERT

Model Description

Limitations:

How to use it

About

Uh oh!

Releases

Packages

Uh oh!

Languages

akrahdan/SemEval2023

Folders and files

Latest commit

History

Repository files navigation

DuluthNLP at SemEval-2023 Task 12

TwiBERT

Model Description

Limitations:

How to use it

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages