- TLDR;
- Setting up your development environment
- Definition of Done
- Automation
- Testing
- Benchmarks
- Updating Dependencies
- Code style
- Adding a new provider/model
- Create your own fork of the repo
- Make changes to the code in your fork
- Run unit tests and integration tests
- Check the code with linters
- Submit PR from your fork to main branch of the project repo
The development requires Python 3.11 due to significant improvement on performance, optimizations which benefit modern ML, AI, LLM, NL stacks, and improved asynchronous processing capabilities.
# clone your fork
git clone https://github.com/YOUR-GIT-PROFILE/service.git
# move into the directory
cd service
# setup your environment with pdm
pdm install
# now either do (with backticks)
`pdm venv activate`
# which will active the venv where you can further work, or prefix the rest of commands with `pdm run`, eg. `pdm run make test`
# run unit+integration tests (e2e test requires a running OLS instance)
make test-unit && make test-integration
# code formatting
# (this is also run automatically as part of pre-commit hook if configured)
make format
# code style and docstring style
# (this is also run automatically as part of pre-commit hook if configured)
make verify
# check type hints
# (this is also run automatically as part of pre-commit hook)
make check-types
Happy hacking!
- Code is complete, commented, and merged to the relevant release branch
- User facing documentation written (where relevant)
- Acceptance criteria in the related Jira ticket (where applicable) are verified and fulfilled
- Pull request title+commit includes Jira number
- Changes are covered by unit tests that run cleanly in the CI environment (where relevant)
- Changes are covered by integration tests that run cleanly in the CI environment (where relevant)
- Changes are covered by E2E tests that run cleanly in the CI environment (where relevant)
- All linters are running cleanly in the CI environment
- Code changes reviewed by at least one peer
- Code changes acked by at least one project owner
It is possible to run formatters and linters automatically for all commits. You just need
to copy file hooks/pre-commit
into subdirectory .git/hooks/
. It must be done manually
because the copied file is an executable script (so from GIT point of view it is unsafe
to enable it automatically).
During testing, code coverage is measured. If the coverage is below defined threshold (see pyproject.toml
settings for actual value stored in section [tool.coverage.report]
), tests will fail. We measured and checked code coverage in order to be able to develop software with high quality.
Code coverage reports are generated in JSON and also in format compatible with JUnit test automation framework. It is also possible to start make coverage-report
to generate code coverage reports in the form of interactive HTML pages. These pages are stored in the htmlcov
subdirectory. Just open the index page from this subdirectory in your web browser.
Overall code coverage measured for both unit tests and integration tests can be checked by following command:
make check-coverage
The threshold set for combined coverage (currently 94%) is larger than threshold for unit tests and integration tests because it is preferred to have most statements covered by at least one type of tests.
It is possible to check if type hints added into the code are correct and whether assignments, function calls etc. use values of the right type. This check is invoked by following command:
make check-types
Please note that type hints check might be very slow on the first run. Subsequent runs are much faster thanks to the cache that Mypy uses. This check is part of a CI job that verifies sources.
Ruff tool is used as a linter. There are a bunch of linter rules enabled for this repository. All of them are specified in pyproject.toml
in section [tool.ruff]
. Some specific rules can be disabled using ignore
parameter (empty now). List of all linters recognized by Ruff can be retrieved by:
ruff linter
Description of all rules are available on https://docs.astral.sh/ruff/rules/
Static security check is performed by Bandit tool. The check can be started by using:
make security-check
or within PDM-controlled environment:
pdm run security-check
Three groups of software tests are used in this repository, each group from the test suite having different granularity. These groups are designed to represent three layers:
- Unit Tests
- Integration Tests
- End to End tests (e2e)
Unit tests followed by integration tests can be started by using the following command:
make tests
It is also possible to run just one selected group of tests:
make test-unit Run the unit tests
make test-integration Run integration tests tests
make test-e2e Run end to end tests
All tests are based on the Pytest framework and code coverage is measured by the plugin pytest-cov. For mocking and patching, the unittest framework is used.
Currently code coverage threshold for integration tests is set to 60%. This value is specified directly in Makefile, because the coverage threshold is different from threshold required for unit tests.
As specified in Definition of Done, new changes need to be covered by tests.
For patching, for example introducing mock objects instead of real objects, it is
possible to use patch
imported from unittest.mock
. It is usually used as
decorator put before test implementation:
@patch("redis.StrictRedis", new=MockRedis)
def test_conversation_cache_in_redis(redis_cache_config):
...
...
...
new=
allow us to use different function or classreturn_value=
allow us to define return value (no mock will be called)
It is also possible to use it inside the test implementation as context manager:
def test_xyz():
ml = mock_llm_chain({"text": retval})
with patch("ols.src.query_helpers.question_validator.LLMChain", new=ml):
...
...
...
Sometimes it is needed to test whether some exception is thrown from a tested function or method. In this case pytest.raises
can be used:
def test_conversation_cache_wrong_cache(invalid_cache_type_config):
"""Check if wrong cache env.variable is detected properly."""
with pytest.raises(ValueError):
CacheFactory.conversation_cache(invalid_cache_type_config)
It is also possible to check if the exception is thrown with the expected message. The message (or its part) is written as regexp:
def test_constructor_no_provider():
"""Test that constructor checks for provider."""
# we use bare Exception in the code, so need to check
# message, at least
with pytest.raises(Exception, match="ERROR: Missing provider"):
load_llm(provider=None)
It is possible to capture stdout and stderr by using standard fixture capsys
:
def test_foobar(capsys):
"""Test the foobar function that prints to stdout."""
foobar("argument1", "argument2")
# check captured log output
captured_out = capsys.readouterr().out
assert captured_out == "Output printed by foobar function"
captured_err = capsys.readouterr().err
assert captured_err == ""
Capturing logs:
@patch.dict(os.environ, {"LOG_LEVEL": "INFO"})
def test_logger_show_message_flag(mock_load_dotenv, capsys):
"""Test logger set with show_message flag."""
logger = Logger(logger_name="foo", log_level=logging.INFO, show_message=True)
logger.logger.info("This is my debug message")
# check captured log output
# the log message should be captured
captured_out = capsys.readouterr().out
assert "This is my debug message" in captured_out
# error output should be empty
captured_err = capsys.readouterr().err
assert captured_err == ""
End to end tests can be run locally, but in this case, some tests that assume they run on a cluster need to be skipped.
Set the TEST_TAGS
environment variable before the tests are started:
export TEST_TAGS="not cluster"
make test-e2e
It is also possible to skip tests marked by different test markers. The
TEST_TAGS
need to contain a "pytest mark expression" in this case. For
example if it is needed to skip tests that assume that RAG is presented,
do the following:
export TEST_TAGS="not cluster and not rag"
make test-e2e
E2E tests expects app running on http://localhost:8080 or whatever is defined through the
OLS_URL
environment variable.
It is possible to start the service using the following command:
coverage run runner.py
The service can be used as expected (via REST API or via GradioUI).
When the service is stopped, it is possible to display which statements were executed and which was omitted:
coverage report -m
The output format is the same as coverage report for unit tests and integrations tests:
Name Stmts Miss Cover Missing
---------------------------------------------------------------------------
ols/__init__.py 0 0 100%
ols/app/__init__.py 0 0 100%
ols/app/endpoints/__init__.py 0 0 100%
ols/app/endpoints/authorized.py 13 2 85% 54-55
ols/app/endpoints/feedback.py 45 24 47% 37-39, 51, 62-74, 84-86, 127-142
...
...
...
Benchmarks for selected functions and methods can be run by using following command:
pdm run benchmarks
Benchmarks are based on the pytest-benchmark
package. To create a new benchmark, the regular test function needs
to be created and benchmark
parameter (fixture) should be passed to it. Then the benchmark can be initialized
by calling this fixture, passing a tested function/method to it, together with required parameters.
def test_format_attachment_plain_text_format(benchmark):
"""Benchmark the function to format one attachment in plain text format."""
attachment = Attachment(
attachment_type="log", content_type="text/plain", content="foo\nbar\nbaz"
)
benchmark(format_attachment, attachment)
def test_format_attachment_yaml(test_yaml, benchmark):
"""Benchmark the function to format one attachment in YAML format."""
attachment = Attachment(
attachment_type="log", content_type="application/yaml", content=test_yaml
)
benchmark(format_attachment, attachment)
---------------------------------------------------------------------------------------------------------------------- benchmark: 12 tests ----------------------------------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_format_attachment_empty_text_plain_format 222.5000 (1.0) 940.2000 (1.0) 254.4121 (1.0) 14.9043 (1.0) 253.8000 (1.0) 13.4500 (1.35) 13112;937 3,930,631.0144 (1.0) 104265 20
test_format_attachment_plain_text_format 248.3889 (1.12) 1,715.5556 (1.82) 271.9183 (1.07) 15.1047 (1.01) 270.4444 (1.07) 9.9999 (1.0) 8233;3279 3,677,575.2636 (0.94) 196387 18
test_format_attachment_empty_yaml_format 8,063.0007 (36.24) 36,815.9999 (39.16) 9,056.3143 (35.60) 654.7283 (43.93) 8,976.0006 (35.37) 320.2501 (32.03) 520;587 110,420.1957 (0.03) 14297 1
test_validate_question_short_question 38,743.0000 (174.13) 106,781.9994 (113.57) 59,001.3999 (231.91) 27,868.7079 (>1000.0) 50,208.0002 (197.83) 30,449.7503 (>1000.0) 1;0 16,948.7504 (0.00) 5 1
test_load_non_existent_config 41,184.0001 (185.10) 76,090.9998 (80.93) 44,220.5563 (173.81) 1,363.8963 (91.51) 43,990.0004 (173.33) 1,327.0001 (132.70) 1798;435 22,613.9172 (0.01) 8910 1
test_validate_question_medium_question 44,770.9999 (201.22) 120,769.9997 (128.45) 47,751.7365 (187.69) 2,178.0324 (146.13) 47,571.5001 (187.44) 1,565.0003 (156.50) 247;89 20,941.6468 (0.01) 3848 1
test_validate_question_long_question 141,497.0002 (635.94) 187,345.9996 (199.26) 149,891.7847 (589.17) 5,996.6999 (402.35) 147,861.9997 (582.59) 7,773.7500 (777.38) 1207;100 6,671.4797 (0.00) 4737 1
test_format_attachment_yaml 151,618.0000 (681.43) 246,112.0002 (261.77) 160,714.3633 (631.71) 4,419.8849 (296.55) 160,190.9998 (631.17) 5,349.4996 (534.95) 632;32 6,222.2192 (0.00) 2841 1
test_load_invalid_config_stream 1,621,458.9996 (>1000.0) 1,785,047.0003 (>1000.0) 1,688,481.4733 (>1000.0) 34,127.4243 (>1000.0) 1,677,702.0001 (>1000.0) 61,820.7500 (>1000.0) 215;0 592.2481 (0.00) 543 1
test_load_valid_config_stream 1,818,708.0004 (>1000.0) 2,007,125.9996 (>1000.0) 1,865,838.5534 (>1000.0) 29,096.4064 (>1000.0) 1,858,818.0001 (>1000.0) 29,326.9998 (>1000.0) 125;38 535.9521 (0.00) 459 1
test_load_valid_config_file 2,069,870.9995 (>1000.0) 4,357,346.0007 (>1000.0) 2,168,878.9340 (>1000.0) 141,905.1851 (>1000.0) 2,149,720.0005 (>1000.0) 69,473.2501 (>1000.0) 4;5 461.0677 (0.00) 379 1
test_load_invalid_config_file 2,092,171.0002 (>1000.0) 2,327,104.0000 (>1000.0) 2,184,864.3750 (>1000.0) 45,054.4108 (>1000.0) 2,182,544.4996 (>1000.0) 76,116.0004 (>1000.0) 150;0 457.6943 (0.00) 384 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean
======================== 12 passed, 1 warning in 11.46s ========================
We are using the PDM tool to manage our dependencies.
To add a new dependency do:
pdm add mystery-package
- PDM will add it topyproject.toml
automatically- re-lock with
pdm lock
- this will update apdm.lock
with a new dependency
To update any package to higher version do:
- update the version in
pyproject.toml
- run
pdm update package_to_be_updated
- this will update apdm.lock
file
As we need to be Konflux-ready (https://redhat-appstudio.github.io/appstudio.docs.ui.io/), we need to have pinned versions in the pyproject.toml
. If you added a new dependency without an explicit version pin, the PDM tool resolves its version in the lock. You need to search for the dependency you've added in the lock file and whatever version you find there, use it to pin it in pyproject.toml
.
NOTE: don't directly edit pdm.lock
file. It contains hash that is checked, so any modification should be done via pdm
command only.
We are using Google's docstring style.
Here is simple example:
def function_with_pep484_type_annotations(param1: int, param2: str) -> bool:
"""Example function with PEP 484 type annotations.
Args:
param1: The first parameter.
param2: The second parameter.
Returns:
The return value. True for success, False otherwise.
Raises:
ValueError: If the first parameter does not contain proper model name
"""
For further guidance, see the rest of our codebase, or check sources online. There are many, eg. this one.
It is possible to create an interface for a new provider. That interface should be
added into ols/src/llms/providers/
Then the provider should be registered and
referenced from rcsconfig.yaml
file the same way as other existing providers.
To add a new provider, follow these steps:
- Create a new file in the
ols/src/llm/providers/
directory. - In this file, define a class that inherits from
LLMProvider
. - Register this class for use by decorating it with
@register_llm_provider_as("your_provider")
. You can refer to existing providers for examples. - Add your provider to the list of supported providers SUPPORTED_PROVIDER_TYPES.
Please note that your custom provider must adhere to the interface defined by AbstractLLMProvider
to ensure proper integration. Specifically, you must define a default_params
property and a load
method in your custom provider class.
You'll also need to modify your rcsconfig.yaml
file to include an appropriate entry for your new provider.
Once you've created and registered your new provider as described above, no further code modifications are necessary. Our discovery mechanism will automatically locate your provider. After that, you'll be able to use it as needed.