Gen+ Translator 🧠🚽

World's first bi-drectional brainrot translator.

Youwei Zhen 2024

Woah! View the complete Gen+ Translator Repos

Data Collection

Imagine collecting thousands of brainrotted message on tiktok? Who wants that?? Data collection for Gen+ Translator is inspired by self-instruct .

Gen+ Translator is what can be considered as a distilled model. The definition of all slang words was taken from List of Generation Z slang - Wikipedia. Using a locally ran LLM mistral-nemo, precisely 6,000 datasets of translations were generated. These translations included the most common topics in day to day life (reference topics.json). Additionally, each translation included the slang words used.

These 6,000 datasets were then split into half, where half were english to slang and the other were slang to english.

Training

Ver. 1 Gen+ Translator is finetuned based on gpt2-large and trained on 2x NVIDIA RTX 3090. The model was finetuned using PeFT and Lora to reduce computations.

Ver. 2 (current ver) Gen+ Translator is finetuned based on quantized-gptq llama2-7b and trained on 2x NVIDIA RTX 3090. The model was finetuned using PeFT and Lora to reduce computations.

Setup

Create a python virtual environment (optional):

python -m venv venv

Install the required dependencies:

pip install -r requirements.txt

Change parameters in the .env (yes, I know I committed the .env, because it is being used as a config):

API_ENDPOINT="http://localhost:11434/api/generate" <- Using ollama for running model locally
MODEL_NAME="mistral-nemo:latest"
FINETUNE_MODEL="gpt2-large"
DEVICE="cuda:0"

Generating data. Running data_generation.py will use the local LLM to generate 6,000 datasets with 10 threads. To change these parameters please modify the file.

python data_generation.py

Run finetune.py. If your CUDA runs out of memory, adjust the training parameters inside the file.

python finetune.py

The finetuned Peft will be saved inside ./adapter

Loading the model:

python en-to-slang.py <- loads the english to slang
python slang-to-en.py <- loads the slang to english

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gen+ Translator 🧠🚽

Woah! View the complete Gen+ Translator Repos

Data Collection

Training

Setup

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
__pycache__		__pycache__
adapter		adapter
data		data
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_generation.py		data_generation.py
en-to-slang.py		en-to-slang.py
finetune.py		finetune.py
requirements.txt		requirements.txt
slang-to-en.py		slang-to-en.py
topics.json		topics.json
vocabulary.json		vocabulary.json

License

AntoDono/GenPlus-Translator

Folders and files

Latest commit

History

Repository files navigation

Gen+ Translator 🧠🚽

Woah! View the complete Gen+ Translator Repos

Data Collection

Training

Setup

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages