Maya Dataset Creation

The Repository contains the code for dataset creation for the Training the Maya: Multilingual Aya Model

Prerequisites

If you have GPU in your machine, Make sure ensure all the requirements are satisfied.
If you don't have a GPU, you can contribute by writing the CPU based code and raise the PR.🤗

Setup

Note: All the steps mentioned below are for Windows Machine.

If you have other machines and successfully able to setup and run the translation script. Please contribute to README.md to add the steps.🤗

Create and activate virtual environment using the below commands.

python -m venv venv

venv\Scripts\activate

To install all the required packages run the below command:
Note: Based on your cuda version, you may need to change the requirements.txt file.

pip install -r requirements.txt

Create .env file and place it in your root folder.
Note: To get the Cohere API Key, You can DM Surya Guthikonda or Karthik in Discord.

COHERE_API_KEY=<API_KEY>

Execution

Note: Currently, sample dataset of 100 rows are provided in translation/data/blip_laion_cc_sbu_100.json. You can use original version
Change your working directory to the translation folder from root folder and run the script as follows:

cd translation\
python run.py

Contribution Guidelines

Follow the discussion in #maya-dataset-team discord channel to understand the pending tasks.

View the Issues Tab for the tasks information.
Clone the repository and create your seperate branch to work on the Issue.
Once you have finished the task, you can raise PR from your branch to main by tagging the Issue.
Either Surya Guthikonda or Karthik will review the PR and merge into the main

Note: If you have any doubts, please feel free to ping in the discord.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
benchmarks		benchmarks
instruct-fine-tuning		instruct-fine-tuning
notebooks		notebooks
pre-training		pre-training
translation		translation
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maya Dataset Creation

Prerequisites

Setup

Execution

Contribution Guidelines

About

Releases

Packages

Contributors 4

Languages

License

SuryaKrishna02/maya-dataset-creation

Folders and files

Latest commit

History

Repository files navigation

Maya Dataset Creation

Prerequisites

Setup

Execution

Contribution Guidelines

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages