The Repository contains the code for dataset creation for the Training the Maya: Multilingual Aya Model
-
If you have GPU in your machine, Make sure ensure all the requirements are satisfied.
-
If you don't have a GPU, you can contribute by writing the CPU based code and raise the PR.🤗
Note: All the steps mentioned below are for Windows Machine.
If you have other machines and successfully able to setup and run the translation script. Please contribute to README.md to add the steps.🤗
- Create and activate virtual environment using the below commands.
python -m venv venv
venv\Scripts\activate
- To install all the required packages run the below command:
Note: Based on your cuda version, you may need to change the requirements.txt file.
pip install -r requirements.txt
- Create
.env
file and place it in your root folder.
Note: To get the Cohere API Key, You can DMSurya Guthikonda
orKarthik
in Discord.
COHERE_API_KEY=<API_KEY>
Note: Currently, sample dataset of 100 rows are provided in translation/data/blip_laion_cc_sbu_100.json
. You can use original version
Change your working directory to the translation folder from root folder and run the script as follows:
cd translation\
python run.py
Follow the discussion in #maya-dataset-team
discord channel to understand the pending tasks.
- View the Issues Tab for the tasks information.
- Clone the repository and create your seperate branch to work on the Issue.
- Once you have finished the task, you can raise PR from your branch to
main
by tagging the Issue. - Either
Surya Guthikonda
orKarthik
will review the PR and merge into themain
Note: If you have any doubts, please feel free to ping in the discord.