We welcome contributions to the AI21 Tokenizer. Please read the following guidelines before submitting your pull request.
- Bug fixes
- Documentation improvements
- Additional tests
Include the following information in your post:
- Describe what you expected to happen.
- If possible, include a minimal reproducible example to help us identify the issue. This also helps check that the issue is not with your own code.
- Describe what actually happened. Include the full traceback if there was an exception.
- List your Python version. If possible, check if this issue is already fixed in the latest releases or the latest code in the repository.
Fork the AI21 Tokenizer repository and clone it to your local machine. Create a new branch for your changes:
git clone https://github.com:AI21Labs/USERNAME/ai21-tokenizer
cd ai21-tokenizer
git checkout -b my-fix-branch master
We recommend running the provided init.sh
script to install the required dependencies and set up the development environment. This script will install poetry if not already installed. To run the script, simply run:
./init.sh
We recommend using poetry to install the required dependencies and set up the development environment. To install poetry, run:
pip install poetry
Then, to install the required dependencies, run:
poetry install
After that Install pre-commit and run:
pre-commit install --install-hooks -t pre-commit -t commit-msg
Installing the pre-commit hooks would take care of formatting and linting your code before committing. Please make sure you have the pre-commit hooks installed before committing your code.
We recommend creating your own venv using pyenv or virtualenv when working on this repository, in order to eliminate unnecessary dependencies from external libraries
Each commit should be a single logical change and should be aligned with the Conventional Commits specification. Since we are using a pre-commit hook to enforce this, any other commit message format will be rejected.
$ inv --list
Available tasks:
clean clean (remove) packages
lint python lint
outdated outdated packages
test Run unit tests
update update packages
audit run safety checks on project dependencies
formatter auto formats the modified files
We use pytest for testing. To run the tests, run:
inv test
If adding a new test, please make sure to add it to the tests
directory and have the file location be under the same hierarchy as the file being tested.
Make sure you use pytest
for tests writing and not any other testing framework.
Push your branch to your forked repository and open a pull request against the main
branch of the AI21 Tokenizer repository. Please make sure to include a description of your changes in the pull request.
The title of the pull request should follow the above-mentioned Conventional Commits specification.
If you have any questions or feedback, please feel free to reach out to us.
We appreciate and encourage any contributions to the AI21 Tokenizer. Please take the reviewer feedback positively and make the necessary changes to your pull request.