We appreciate any contribution to Datumaro, whether it's in the form of a Pull Request, Feature Request or general comments/issue that you found. For feature requests and issues, please feel free to create a GitHub Issue in this repository.
- Python (3.9+)
To set up your development environment, please follow the steps below.
-
Because Datumaro has some C++ and Rust implementations to improve Python performance, you should install C++ compiler (
apt-get install build-essential
) and a Rust toolchain in your system to build the binary extensions. -
Fork the repo.
-
clone the forked repo.
git clone <forked_repo>
-
Optionally, install a virtual environment (recommended):
python -m pip install virtualenv python -m virtualenv venv . venv/bin/activate
-
Install Datumaro with optional dependencies:
cd /path/to/the/cloned/repo/ pip install -e .[tf,tfds,torch,default]
-
Install dev & test dependencies:
pip install -r requirements-dev.txt pip install -r tests/requirements.txt
-
Set up pre-commit hooks in the repo. See Code style.
pre-commit install pre-commit run
-
Create your branch based off the
develop
branch and make changes. -
Verify your code by running unit tests and integration tests. See Testing
pytest -v
or
python -m pytest -v
-
Push your changes.
Now you are ready to create a PR(Pull Request) and get review.
Developer should install the following optional components for running our tests:
- OpenVINO
- Accuracy Checker
- TensorFlow
- PyTorch
- MxNet
- Caffe
datum --help
python -m datumaro --help
python datumaro/ --help
python datum.py --help
import datumaro
Try to be readable and consistent with the existing codebase.
The project uses Black for code formatting and isort for sorting import statements.
You can find corresponding configurations in pyproject.toml
in the repository root.
No trailing whitespaces, at most 100 characters per line.
Datumaro includes a Git pre-commit hook, .pre-commit-config.yaml
that can help you follow the style requirements. To install, make sure isort and black are installed on your system, then run pre-commit run
.
The recommended editor is VS Code with the Python language plugin.
It is expected that all Datumaro functionality is covered and checked by
unit tests. Tests are placed in the tests/unit/
directory. Additional
pre-generated files for tests can be stored in the tests/assets/
directory.
CLI tests are separated from the core tests, they are stored in the
tests/integration/cli/
directory.
Currently, we use pytest
for testing.
To run tests use:
pytest -v
or
python -m pytest -v
For better integration with CI and requirements tracking, we use special annotations for tests.
A test needs to linked with a requirement it is related to. To link a test, use:
from unittest import TestCase
from .requirements import Requirements, mark_requirement
class MyTests(TestCase):
@mark_requirement(Requirements.DATUM_GENERAL_REQ)
def test_my_requirement(self):
... do stuff ...
Such marking will apply markings from the requirement specified. They can be overridden for a specific test:
import pytest
class MyTests(TestCase):
@pytest.mark.priority_low
@mark_requirement(Requirements.DATUM_GENERAL_REQ)
def test_my_requirement(self):
... do stuff ...
Requirements and other links need to be added to tests/requirements.py
:
DATUM_244 = "Add Snyk integration"
DATUM_BUG_219 = "Return format is not uniform"
# Fully defined in GitHub issues:
@pytest.mark.reqids(Requirements.DATUM_244, Requirements.DATUM_333)
# And defined any other way:
@pytest.mark.reqids(Requirements.DATUM_GENERAL_REQ)
Markings are defined in tests/conftest.py
.
A list of requirements and bugs
@pytest.mark.requids(Requirements.DATUM_123)
@pytest.mark.bugs(Requirements.DATUM_BUG_456)
A priority
@pytest.mark.priority_low
@pytest.mark.priority_medium
@pytest.mark.priority_high
Component The marking used for indication of different system components
@pytest.mark.components(DatumaroComponent.Datumaro)
Skipping tests
@pytest.mark.skip(SkipMessages.NOT_IMPLEMENTED)
Parametrized runs
Parameters are used for running the same test with different parameters e.g.
@pytest.mark.parametrize("numpy_array, batch_size", [
(np.zeros([2]), 0),
(np.zeros([2]), 1),
(np.zeros([2]), 2),
(np.zeros([2]), 5),
(np.zeros([5]), 2),
])
Tests are documented with docs strings. Test descriptions must contain
the following: sections: Description
, Expected results
and Steps
.
def test_can_convert_polygons_to_mask(self):
"""
<b>Description:</b>
Ensure that the dataset polygon annotation can be properly converted
into dataset segmentation mask.
<b>Expected results:</b>
Dataset segmentation mask converted from dataset polygon annotation
is equal to an expected mask.
<b>Steps:</b>
1. Prepare dataset with polygon annotation
2. Prepare dataset with expected mask segmentation mode
3. Convert source dataset to target, with conversion of annotation
from polygon to mask.
4. Verify that resulting segmentation mask is equal to the expected mask.
"""