This code tested locally using Python 3.8
and deployed on production using Python 3.10
. So, you can assume that any Python version >= 3.8
should work fine without any issue.
You should install Redis locally to run the application. You just need to follow the official documentation to install Redis. To make sure that the installation is working fine, you should run redis-cli ping
, and it should return PONG
.
You need to run the following command to install the required packages:
pip install -r requirements.txt -r requirements_dev.txt
To run the server and be able to work with all endpoints, you should set two environment variables:
HF_SECRET_KEY
: Secret key for HuggingFace APIs. You can get one by creating an account on huggingface.co and go to the Access Tokens page.GH_SECRET_KEY
: Secret key for GitHub APIs. You can get one by creating an account on github.com and go to Personal Access Tokens page.
Then, you can run flask run
to start the server. Also, you can set the environment variables within the same command as follows:
HF_SECRET_KEY=<HuggingFace secret key here> GH_SECRET_KEY=<GitHub secret key here> flask run
To run the server with auto reloading, you can run the command with FLASK_DEBUG
environment variable set to 1
. For example:
FLASK_DEBUG=1 flask run
Make sure to apply pre-commit
hooks before submitting any pull request by running pre-commit run --all-files
. It should be ran automatically when you commit your changes, just double check that it ran before submitting the PR.
- Method:
GET
- Description: Returns the list of available features for the datasets.
- Path Arguments: N/A
- Parameters: N/A
- Data: N/A
- Return Type:
JSON
- Example Link: https://arbml.github.io/masader-webservice/datasets/schema
- Example Output:
[
"Name",
"Subsets",
"HF Link",
...
]
- Method:
GET
- Description: Returns the list of available datasets based on the passed
query
and the requestedfeatures
. - Path Arguments: N/A
- Parameters:
query
(Optional): Filtration query will be applied on the dataset before selecting the required features and returning the output (e.g.query=Year>2003 and Year<2008 and Unit=='tokens'
). The query language should follow Pandas query language, for more information see here.features
(Optional): The list of required features to be returned for each dataset (e.g.features=Name,Year,Unit
).
- Data: N/A
- Return Type:
JSON
- Example Link: https://arbml.github.io/masader-webservice/datasets?features=Name,Year,Unit&query=Year>2003 and Year<2008 and Unit=='tokens'
- Example Output:
[
{
"Name": "LC-STAR: Standard Arabic Phonetic lexicon",
"Unit": "tokens",
"Year": 2007
},
{
"Name": "NEMLAR: Written Corpus",
"Unit": "tokens",
"Year": 2006
},
{
"Name": "Arabic Treebank: Part 3",
"Unit": "tokens",
"Year": 2005
},
...
]
- Method:
GET
- Description: Returns specific dataset from the available datasets based on its
index
. - Path Arguments:
index
: The index of the required dataset. Theindex
should be within range[1, maximum number of datasets in Masader]
.
- Parameters:
features
(Optional): The list of required features to be returned for each dataset (e.g.features=Name,Year
).
- Data: N/A
- Return Type:
JSON
- Example Link: https://arbml.github.io/masader-webservice/datasets/1?features=Name,Year
- Example Output:
{
"Name": "Shami",
"Year": 2018
}
- Method:
GET
- Description: Returns the unique values of the requested features.
- Path Arguments: N/A
- Parameters:
features
(Optional): The list of required features to return their unique values (e.g.features=Dialect,Year
).
- Data: N/A
- Return Type:
JSON
- Example Link: https://arbml.github.io/masader-webservice/datasets/tags?features=Dialect,Year
- Example Output:
{
"Dialect": [
"Algeria",
"Bahrain",
"Classic",
...
],
"Year": [
2001,
2002,
2003,
...
]
}
- Method:
POST
- Description: Creates a new GitHub issue related to the dataset that assoiated with
index
. - Path Arguments:
index
: The index of the required dataset. Theindex
should be within range[1, maximum number of datasets in Masader]
.
- Parameters: N/A
- Data:
title
: The issue's title. This will be prefixed with the dataset name.body
: The issue's body.
- Return Type:
JSON
- Example Link: https://arbml.github.io/masader-webservice/datasets/1/issues
- Example Output:
{
"issue_url": "https://github.com/ARBML/masader/issues/64"
}
- Method:
GET
- Description: Returns short string highlights the new changes in Masader.
- Path Arguments: N/A
- Parameters: N/A
- Data: N/A
- Return Type:
JSON
- Example Link: https://arbml.github.io/masader-webservice/highlights
- Example Output:
{
"highlights": "Masader is COOL!"
}
- Method:
GET
- Description: Refreshes the in-memory datasets and their tags, embeddings, and clusters.
- Path Arguments:
password
: Simple string authentication to prevent anonymous actors from requesting this endpoint.
- Parameters: N/A
- Data: N/A
- Return Type:
JSON
- Example Link: https://arbml.github.io/masader-webservice/refresh/123456
- Example Output:
"Datasets refresh process initiated successfully!"