Skip to content

test: improve pytest infrastructure and vLLM backend testing#416

Merged
planetf1 merged 5 commits intogenerative-computing:mainfrom
planetf1:test/vllm-clean
Feb 11, 2026
Merged

test: improve pytest infrastructure and vLLM backend testing#416
planetf1 merged 5 commits intogenerative-computing:mainfrom
planetf1:test/vllm-clean

Conversation

@planetf1
Copy link
Contributor

@planetf1 planetf1 commented Feb 5, 2026

Improve pytest infrastructure and vLLM backend testing

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Fixes #415

Enhances pytest infrastructure with capability detection and process isolation for GPU tests. Fixes process isolation to only activate on CUDA systems, preventing unnecessary overhead on macOS and systems without GPU.

Key changes:

  • Add pytest skip mechanism with CLI options (--ignore-gpu-check, --ignore-ram-check, etc.)
  • Implement CUDA-specific process isolation for heavy GPU tests (vLLM, HuggingFace)
  • Fix vLLM structured output token limits (one test was not specifying max tokens & failing)

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Tested

  • hugging face only tests on large CUDA system
  • vllm only on CUDA
  • all tests (except ollama) on CUDA
  • pytest locally on macOS (32GB, so some tests skipped)
  • NOT tested on >32GB macOS, or small gpu Linux

Note that if any large tests are run on cuda, pytest will enforce additional isolation - effectively batching up the test into groups.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link

mergify bot commented Feb 5, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@planetf1 planetf1 force-pushed the test/vllm-clean branch 5 times, most recently from 5a0cdf5 to d25e4d7 Compare February 5, 2026 16:57
@planetf1 planetf1 marked this pull request as ready for review February 5, 2026 16:58
@planetf1 planetf1 enabled auto-merge (squash) February 5, 2026 18:03
Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes make sense to me. Can you please provide example outputs of the example and regular tests? I'd like to see what the vllm / heavy_gpu tests running in isolation looks like.

@planetf1 planetf1 requested a review from nrfulton February 9, 2026 13:16
planetf1 added a commit to planetf1/mellea that referenced this pull request Feb 10, 2026
- Add cleanup_vllm_backend() to test/conftest.py
- Remove duplicated cleanup code from test_vllm.py and test_vllm_tools.py
- Addresses PR generative-computing#416 review feedback on code duplication
planetf1 added a commit to planetf1/mellea that referenced this pull request Feb 10, 2026
- Add cleanup_vllm_backend() to test/conftest.py to eliminate code duplication
- Remove duplicated cleanup code from test_vllm.py and test_vllm_tools.py
- Migrate from IBM aLoRA to PEFT 0.18.1 native aLoRA implementation
- Add --device flag for explicit device control in aLoRA training
- Add GPU memory check and better error handling
- Restrict MPS patching to macOS only for cross-platform safety
- Fix output dirname handling for relative paths
- Add comprehensive aLoRA training tests

Addresses PR generative-computing#416 review feedback on code duplication
planetf1 added a commit to planetf1/mellea that referenced this pull request Feb 10, 2026
- Add cleanup_vllm_backend() to test/conftest.py with comprehensive documentation
- Remove duplicated cleanup code from test_vllm.py and test_vllm_tools.py
- Document that cleanup is best-effort within modules
- Note that cross-module GPU isolation requires process separation
- Addresses PR generative-computing#416 review feedback on code duplication

The shared function includes clear documentation explaining that while
this cleanup helps within a module, only process exit reliably releases
CUDA GPU memory. Cross-module isolation is handled by the existing
pytest_collection_finish hook that runs heavy GPU tests in separate
subprocesses.
planetf1 added a commit to planetf1/mellea that referenced this pull request Feb 10, 2026
- Add cleanup_vllm_backend() to test/conftest.py with comprehensive documentation
- Remove duplicated cleanup code from test_vllm.py and test_vllm_tools.py
- Document that cleanup is best-effort within modules
- Note that cross-module GPU isolation requires process separation
- Addresses PR generative-computing#416 review feedback on code duplication

The shared function includes clear documentation explaining that while
this cleanup helps within a module, only process exit reliably releases
CUDA GPU memory. Cross-module isolation is handled by the existing
pytest_collection_finish hook that runs heavy GPU tests in separate
subprocesses.
@planetf1 planetf1 force-pushed the test/vllm-clean branch 4 times, most recently from 0542af2 to 8a35e19 Compare February 10, 2026 10:34
@planetf1
Copy link
Contributor Author

Pu

The changes make sense to me. Can you please provide example outputs of the example and regular tests? I'd like to see what the vllm / heavy_gpu tests running in isolation looks like.

Here's a run. The mechanism is working reliably on a cuda environment.

However the test of weather in Boston is flaky, sometimes mistral will return a response about London, or New York. This is a qualitative prompt/model issue rather than mechanism - we may need to further review/modify as we use these tests in an automated formal verification step.

=== STDOUT ===
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /u/jonesn/.conda/envs/mellea/bin/python3
cachedir: .pytest_cache
rootdir: /proj/dmfexp/eiger/users/jonesn/mellea-b
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, asyncio-1.3.0, Faker-40.1.2, timeout-2.4.0, langsmith-0.6.6, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... ============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /u/jonesn/.conda/envs/mellea/bin/python3
cachedir: .pytest_cache
rootdir: /proj/dmfexp/eiger/users/jonesn/mellea-b
configfile: pyproject.toml
plugins: nbmake-1.5.5, asyncio-1.3.0, Faker-40.1.2, timeout-2.4.0, langsmith-0.6.6, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 8 items

test/backends/test_vllm.py::test_system_prompt PASSED                    [ 12%]
test/backends/test_vllm.py::test_instruct PASSED                         [ 25%]
test/backends/test_vllm.py::test_multiturn PASSED                        [ 37%]
test/backends/test_vllm.py::test_format PASSED                           [ 50%]
test/backends/test_vllm.py::test_generate_from_raw PASSED                [ 62%]
test/backends/test_vllm.py::test_generate_from_raw_with_format PASSED    [ 75%]
test/backends/test_vllm.py::test_async_parallel_requests PASSED          [ 87%]
test/backends/test_vllm.py::test_async_avalue PASSED                     [100%]

================================ tests coverage ================================
_______________ coverage: platform linux, python 3.12.12-final-0 _______________

Name                                                                                                                        Stmts   Miss Branch BrPart   Cover   Missing
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cli/__init__.py                                                                                                                 0      0      0      0 100.00%
cli/alora/__init__.py                                                                                                           0      0      0      0 100.00%
cli/alora/commands.py                                                                                                           4      4      0      0   0.00%   1-50
cli/alora/train.py                                                                                                             75     75     16      0   0.00%   1-173
cli/alora/upload.py                                                                                                            15     15      4      0   0.00%   1-45
cli/decompose/__init__.py                                                                                                       4      4      0      0   0.00%   1-12
cli/decompose/decompose.py                                                                                                     92     92     36      0   0.00%   1-333
cli/decompose/pipeline.py                                                                                                      56     56      8      0   0.00%   1-179
cli/decompose/prompt_modules/__init__.py                                                                                        6      6      0      0   0.00%   1-10
cli/decompose/prompt_modules/_prompt_modules.py                                                                                13     13      0      0   0.00%   1-37
cli/decompose/prompt_modules/constraint_extractor/__init__.py                                                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_constraint_extractor.py                                                     36     36      6      0   0.00%   1-136
cli/decompose/prompt_modules/constraint_extractor/_exceptions.py                                                               12     12      0      0   0.00%   1-18
cli/decompose/prompt_modules/constraint_extractor/_prompt/__init__.py                                                           2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/__init__.py                                             2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_1/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_1/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_2/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_2/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_3/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_3/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_4/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_4/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_5/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_5/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_6/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_6/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_icl_examples.py                                        8      8      0      0   0.00%   1-9
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_types.py                                               4      4      0      0   0.00%   1-6
cli/decompose/prompt_modules/constraint_extractor/_prompt/_prompt.py                                                           11     11      0      0   0.00%   1-24
cli/decompose/prompt_modules/general_instructions/__init__.py                                                                   2      2      0      0   0.00%   1-5
cli/decompose/prompt_modules/general_instructions/_exceptions.py                                                               12     12      0      0   0.00%   1-18
cli/decompose/prompt_modules/general_instructions/_general_instructions.py                                                     33     33      4      0   0.00%   1-76
cli/decompose/prompt_modules/general_instructions/_prompt/__init__.py                                                           2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/__init__.py                                             2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_1/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_1/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_2/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_2/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_3/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_3/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_icl_examples.py                                        5      5      0      0   0.00%   1-6
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_types.py                                               4      4      0      0   0.00%   1-6
cli/decompose/prompt_modules/general_instructions/_prompt/_prompt.py                                                           11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_constraint_assign/__init__.py                                                              3      3      0      0   0.00%   1-8
cli/decompose/prompt_modules/subtask_constraint_assign/_exceptions.py                                                          12     12      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/__init__.py                                                      2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/__init__.py                                        2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_1/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_1/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_2/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_2/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_3/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_3/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_4/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_4/_example.py                             9      9      0      0   0.00%   1-32
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_icl_examples.py                                   6      6      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_types.py                                          7      7      0      0   0.00%   1-9
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_prompt.py                                                      12     12      0      0   0.00%   1-25
cli/decompose/prompt_modules/subtask_constraint_assign/_subtask_constraint_assign.py                                           51     51     10      0   0.00%   1-247
cli/decompose/prompt_modules/subtask_constraint_assign/_types.py                                                                6      6      0      0   0.00%   1-26
cli/decompose/prompt_modules/subtask_list/__init__.py                                                                           3      3      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_list/_exceptions.py                                                                       15     15      0      0   0.00%   1-23
cli/decompose/prompt_modules/subtask_list/_prompt/__init__.py                                                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/__init__.py                                                     2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_1/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_1/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_2/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_2/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_3/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_3/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_icl_examples.py                                                5      5      0      0   0.00%   1-6
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_types.py                                                       5      5      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_list/_prompt/_prompt.py                                                                   11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_subtask_list.py                                                                     50     50      4      0   0.00%   1-166
cli/decompose/prompt_modules/subtask_list/_types.py                                                                             4      4      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_prompt_generator/__init__.py                                                               3      3      0      0   0.00%   1-8
cli/decompose/prompt_modules/subtask_prompt_generator/_exceptions.py                                                           12     12      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/__init__.py                                                       2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/__init__.py                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/__init__.py                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_1/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_1/_example.py       4      4      0      0   0.00%   3-19
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_2/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_2/_example.py       4      4      0      0   0.00%   3-19
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_3/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_3/_example.py       4      4      0      0   0.00%   3-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_4/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_4/_example.py       4      4      0      0   0.00%   3-24
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_group.py           11     11      0      0   0.00%   1-16
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/__init__.py                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_1/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_1/_example.py       4      4      0      0   0.00%   3-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_2/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_2/_example.py       4      4      0      0   0.00%   3-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_3/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_3/_example.py       4      4      0      0   0.00%   3-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_4/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_4/_example.py       4      4      0      0   0.00%   3-24
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_5/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_5/_example.py       4      4      0      0   0.00%   3-25
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_group.py           12     12      0      0   0.00%   1-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_icl_example_groups.py                        4      4      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_types.py                                     9      9      0      0   0.00%   1-13
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_prompt.py                                                       11     11      0      0   0.00%   1-33
cli/decompose/prompt_modules/subtask_prompt_generator/_subtask_prompt_generator.py                                             50     50      8      0   0.00%   1-247
cli/decompose/prompt_modules/subtask_prompt_generator/_types.py                                                                 5      5      0      0   0.00%   1-23
cli/decompose/prompt_modules/validation_decision/__init__.py                                                                    2      2      0      0   0.00%   1-5
cli/decompose/prompt_modules/validation_decision/_exceptions.py                                                                12     12      0      0   0.00%   1-22
cli/decompose/prompt_modules/validation_decision/_prompt/__init__.py                                                            2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/__init__.py                                              2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_1/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_1/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_2/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_2/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_3/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_3/_example.py                                   5      5      0      0   0.00%   1-21
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_4/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_4/_example.py                                   5      5      0      0   0.00%   1-12
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_5/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_5/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_icl_examples.py                                         7      7      0      0   0.00%   1-10
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_types.py                                                5      5      0      0   0.00%   1-7
cli/decompose/prompt_modules/validation_decision/_prompt/_prompt.py                                                            11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/validation_decision/_validation_decision.py                                                       42     42      6      0   0.00%   1-128
cli/decompose/utils.py                                                                                                          6      6      2      0   0.00%   1-13
cli/eval/__init__.py                                                                                                            0      0      0      0 100.00%
cli/eval/commands.py                                                                                                            3      3      0      0   0.00%   5-50
cli/eval/runner.py                                                                                                            163    163     40      0   0.00%   1-353
cli/m.py                                                                                                                       12     12      0      0   0.00%   3-30
mellea/__init__.py                                                                                                              4      0      0      0 100.00%
mellea/backends/__init__.py                                                                                                     6      0      0      0 100.00%
mellea/backends/adapters/__init__.py                                                                                            2      0      0      0 100.00%
mellea/backends/adapters/adapter.py                                                                                            76     43     18      0  35.11%   28-37, 98-149, 167-178, 187, 199-200, 212
mellea/backends/adapters/catalog.py                                                                                            20      4      2      0  72.73%   78, 88-96
mellea/backends/backend.py                                                                                                     11      0      0      0 100.00%
mellea/backends/bedrock.py                                                                                                     42     42     10      0   0.00%   3-78
mellea/backends/cache.py                                                                                                       28      3      6      2  85.29%   42, 58, 61
mellea/backends/dummy.py                                                                                                       15     15      4      0   0.00%   3-45
mellea/backends/huggingface.py                                                                                                484    484    132      0   0.00%   6-1314
mellea/backends/kv_block_helpers.py                                                                                            24     24      2      0   0.00%   3-47
mellea/backends/litellm.py                                                                                                    248    248     90      0   0.00%   3-677
mellea/backends/model_ids.py                                                                                                   38      0      0      0 100.00%
mellea/backends/model_options.py                                                                                               46      3     20      3  90.91%   77, 87-91, 113->116
mellea/backends/ollama.py                                                                                                     273    273    104      0   0.00%   3-684
mellea/backends/openai.py                                                                                                     387    387    132      0   0.00%   3-1037
mellea/backends/tools.py                                                                                                      249    192     94      0  16.62%   33-35, 39, 44, 49-77, 86-91, 98-123, 131-143, 159-174, 182-202, 207-214, 261-405, 425-428, 443, 476-482, 499, 543-582, 590-630
mellea/backends/utils.py                                                                                                       39     15     16      1  56.36%   47, 63-84
mellea/backends/vllm.py                                                                                                       198     31     40      8  78.57%   27-28, 95-98, 131, 132->139, 166-198, 296-309, 381-383, 388, 411, 478, 511->exit
mellea/backends/watsonx.py                                                                                                    232    232     72      0   0.00%   3-642
mellea/core/__init__.py                                                                                                         7      0      0      0 100.00%
mellea/core/backend.py                                                                                                         40      4     12      4  84.62%   102, 117, 127, 133
mellea/core/base.py                                                                                                           327     95     76     14  65.51%   39, 42->44, 54, 70-73, 78-102, 107-109, 116-117, 157-158, 226, 232, 242-243, 246, 271-272, 275, 313->315, 317-318, 325-327, 341->360, 350-351, 361, 378-394, 399-423, 541, 545-548, 555-569, 651, 660-665, 670-684
mellea/core/formatter.py                                                                                                        5      0      0      0 100.00%
mellea/core/requirement.py                                                                                                     61     32      6      0  43.28%   34-38, 43, 48, 53, 58, 62, 66, 75-84, 125-145, 154, 158-161, 170
mellea/core/sampling.py                                                                                                        39      6      8      4  78.72%   36, 38, 40, 42, 72, 77
mellea/core/utils.py                                                                                                           65     11     10      3  78.67%   24-35, 43-55, 104->111, 106
mellea/formatters/__init__.py                                                                                                   4      0      0      0 100.00%
mellea/formatters/chat_formatter.py                                                                                            27      2     12      3  87.18%   35, 40, 48->52
mellea/formatters/template_formatter.py                                                                                       132     26     68     21  75.50%   76, 84-87, 95, 102, 106-110, 131, 134-137, 140->145, 151->186, 155, 159-160, 162->183, 168-173, 183->151, 187, 199-207, 209->211, 234->243, 248->243, 257, 266->264, 269, 283
mellea/helpers/__init__.py                                                                                                      5      0      0      0 100.00%
mellea/helpers/async_helpers.py                                                                                                47     21     14      2  49.18%   17, 27, 35-36, 45-49, 73-74, 78, 82-88, 92-99
mellea/helpers/event_loop_helper.py                                                                                            34     14      6      1  52.50%   30, 34-54, 60
mellea/helpers/openai_compatible_helpers.py                                                                                    88     77     46      0   8.21%   17-42, 54-122, 127-138, 159-172
mellea/helpers/server_type.py                                                                                                  19     10      4      0  39.13%   19-28
mellea/stdlib/__init__.py                                                                                                       0      0      0      0 100.00%
mellea/stdlib/components/__init__.py                                                                                            8      0      0      0 100.00%
mellea/stdlib/components/chat.py                                                                                               70     39     24      6  39.36%   53, 60, 62, 71, 103-125, 129, 134, 169-173, 177-181, 193-212
mellea/stdlib/components/docs/__init__.py                                                                                       2      0      0      0 100.00%
mellea/stdlib/components/docs/document.py                                                                                      16     11      4      0  25.00%   12-14, 25-32, 36
mellea/stdlib/components/docs/richdocument.py                                                                                  84     84      6      0   0.00%   3-192
mellea/stdlib/components/genslot.py                                                                                           236    163     66      0  24.17%   89, 100-109, 117, 120, 123, 126, 141-142, 150-151, 163, 183-193, 209-222, 272-291, 310-386, 390-394, 398, 417-423, 473-554, 606-694, 828-831
mellea/stdlib/components/instruction.py                                                                                        66     28     18      2  47.62%   52-107, 135, 173-174, 183-185
mellea/stdlib/components/intrinsic/__init__.py                                                                                  2      0      0      0 100.00%
mellea/stdlib/components/intrinsic/intrinsic.py                                                                                14      7      2      0  43.75%   29-32, 37, 50, 66
mellea/stdlib/components/intrinsic/rag.py                                                                                      46     46      8      0   0.00%   3-313
mellea/stdlib/components/mify.py                                                                                              125     94     56      0  17.13%   44, 54, 64, 73-75, 88-123, 138-168, 180-193, 210, 217-220, 311-315, 335-392, 399-407, 415-433
mellea/stdlib/components/mobject.py                                                                                            60     28      4      0  50.00%   23-24, 28, 32-33, 54, 67-68, 72, 76-77, 98, 166-167, 171, 179, 187, 195, 202-218, 227-230, 240
mellea/stdlib/components/react.py                                                                                              33     33      2      0   0.00%   3-96
mellea/stdlib/components/simple.py                                                                                             33     23     12      0  22.22%   11-15, 19, 22-28, 33, 40-49, 53, 57
mellea/stdlib/components/unit_test_eval.py                                                                                     71     71     14      0   0.00%   3-148
mellea/stdlib/context.py                                                                                                       17      0      0      0 100.00%
mellea/stdlib/frameworks/__init__.py                                                                                            0      0      0      0 100.00%
mellea/stdlib/frameworks/react.py                                                                                              39     39     14      0   0.00%   4-121
mellea/stdlib/functional.py                                                                                                   205    118     78     10  36.40%   239, 277-291, 318-333, 360-414, 489, 498, 505, 573-575, 578-581, 668-686, 712-733, 753, 759, 766-769, 772-788, 817-832, 859-913, 921-937, 948-975
mellea/stdlib/requirements/__init__.py                                                                                          5      0      0      0 100.00%
mellea/stdlib/requirements/md.py                                                                                               45     36     10      0  16.36%   17-21, 29-46, 50, 65-77
mellea/stdlib/requirements/python_reqs.py                                                                                      65     57     30      0   8.42%   32-58, 63-98, 118-137, 160-193
mellea/stdlib/requirements/requirement.py                                                                                      45     32     14      0  22.03%   23-36, 50-59, 71-76, 81, 86, 121-147
mellea/stdlib/requirements/safety/__init__.py                                                                                   0      0      0      0 100.00%
mellea/stdlib/requirements/safety/guardian.py                                                                                 157    157     74      0   0.00%   3-355
mellea/stdlib/requirements/tool_reqs.py                                                                                        44     40     24      0   5.88%   9-15, 27-36, 65-109
mellea/stdlib/sampling/__init__.py                                                                                              3      0      0      0 100.00%
mellea/stdlib/sampling/base.py                                                                                                 98     31     16      5  63.16%   143, 144->146, 160, 220-249, 278, 300, 322, 344-365, 387, 409-424
mellea/stdlib/sampling/budget_forcing.py                                                                                       71     71     10      0   0.00%   3-249
mellea/stdlib/sampling/majority_voting.py                                                                                      86     86     22      0   0.00%   3-292
mellea/stdlib/sampling/sampling_algos/__init__.py                                                                               2      2      0      0   0.00%   3-5
mellea/stdlib/sampling/sampling_algos/budget_forcing_alg.py                                                                    76     76     28      0   0.00%   3-181
mellea/stdlib/sampling/sofai.py                                                                                               200    174     72      0   9.56%   82-103, 128-151, 171, 186-196, 201-224, 236-244, 257-263, 281-345, 360-366, 389-408, 436-493, 522-550, 603-768
mellea/stdlib/session.py                                                                                                      169    114     40      1  26.79%   47-52, 57-96, 156-198, 233-239, 243-249, 253-257, 282, 290-292, 345-363, 456-457, 502, 534-544, 567-576, 629-647, 721-746, 760-773, 786, 818-828, 851-860, 865-866, 877-887
mellea/stdlib/tools/__init__.py                                                                                                 2      0      0      0 100.00%
mellea/stdlib/tools/interpreter.py                                                                                            106     72     30      0  25.00%   53-65, 77, 89-112, 127-138, 141-179, 187-221, 232-255, 260, 269-270, 279-280
mellea/telemetry/__init__.py                                                                                                   98     60     42      4  30.00%   31-35, 56-78, 87-90, 95, 100, 115-119, 138-145, 163-172, 181-182, 193-208, 220, 230-232
mellea/telemetry/backend_instrumentation.py                                                                                    90     64     44      8  25.37%   25-26, 40, 42, 44, 46, 48, 52, 64-71, 91-94, 130-135, 196-220, 233-267
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                        7032   5371   1912    102  20.78%
Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
============================== 8 passed in 52.97s ==============================
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /u/jonesn/.conda/envs/mellea/bin/python3
cachedir: .pytest_cache
rootdir: /proj/dmfexp/eiger/users/jonesn/mellea-b
configfile: pyproject.toml
plugins: nbmake-1.5.5, asyncio-1.3.0, Faker-40.1.2, timeout-2.4.0, langsmith-0.6.6, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

test/backends/test_vllm_tools.py::test_tool PASSED                       [100%]

=============================== warnings summary ===============================
test/backends/test_vllm_tools.py::test_tool
  /u/jonesn/.conda/envs/mellea/lib/python3.12/site-packages/mistral_common/tokens/tokenizers/sentencepiece.py:203: FutureWarning: Using the tokenizer's special token policy `None` is deprecated. It will be removed in 1.10.0. Please pass a special token policy explicitly. Future default will be SpecialTokenPolicy.IGNORE.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================ tests coverage ================================
_______________ coverage: platform linux, python 3.12.12-final-0 _______________

Name                                                                                                                        Stmts   Miss Branch BrPart   Cover   Missing
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cli/__init__.py                                                                                                                 0      0      0      0 100.00%
cli/alora/__init__.py                                                                                                           0      0      0      0 100.00%
cli/alora/commands.py                                                                                                           4      4      0      0   0.00%   1-50
cli/alora/train.py                                                                                                             75     75     16      0   0.00%   1-173
cli/alora/upload.py                                                                                                            15     15      4      0   0.00%   1-45
cli/decompose/__init__.py                                                                                                       4      4      0      0   0.00%   1-12
cli/decompose/decompose.py                                                                                                     92     92     36      0   0.00%   1-333
cli/decompose/pipeline.py                                                                                                      56     56      8      0   0.00%   1-179
cli/decompose/prompt_modules/__init__.py                                                                                        6      6      0      0   0.00%   1-10
cli/decompose/prompt_modules/_prompt_modules.py                                                                                13     13      0      0   0.00%   1-37
cli/decompose/prompt_modules/constraint_extractor/__init__.py                                                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_constraint_extractor.py                                                     36     36      6      0   0.00%   1-136
cli/decompose/prompt_modules/constraint_extractor/_exceptions.py                                                               12     12      0      0   0.00%   1-18
cli/decompose/prompt_modules/constraint_extractor/_prompt/__init__.py                                                           2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/__init__.py                                             2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_1/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_1/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_2/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_2/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_3/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_3/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_4/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_4/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_5/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_5/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_6/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_6/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_icl_examples.py                                        8      8      0      0   0.00%   1-9
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_types.py                                               4      4      0      0   0.00%   1-6
cli/decompose/prompt_modules/constraint_extractor/_prompt/_prompt.py                                                           11     11      0      0   0.00%   1-24
cli/decompose/prompt_modules/general_instructions/__init__.py                                                                   2      2      0      0   0.00%   1-5
cli/decompose/prompt_modules/general_instructions/_exceptions.py                                                               12     12      0      0   0.00%   1-18
cli/decompose/prompt_modules/general_instructions/_general_instructions.py                                                     33     33      4      0   0.00%   1-76
cli/decompose/prompt_modules/general_instructions/_prompt/__init__.py                                                           2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/__init__.py                                             2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_1/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_1/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_2/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_2/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_3/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_3/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_icl_examples.py                                        5      5      0      0   0.00%   1-6
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_types.py                                               4      4      0      0   0.00%   1-6
cli/decompose/prompt_modules/general_instructions/_prompt/_prompt.py                                                           11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_constraint_assign/__init__.py                                                              3      3      0      0   0.00%   1-8
cli/decompose/prompt_modules/subtask_constraint_assign/_exceptions.py                                                          12     12      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/__init__.py                                                      2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/__init__.py                                        2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_1/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_1/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_2/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_2/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_3/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_3/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_4/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_4/_example.py                             9      9      0      0   0.00%   1-32
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_icl_examples.py                                   6      6      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_types.py                                          7      7      0      0   0.00%   1-9
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_prompt.py                                                      12     12      0      0   0.00%   1-25
cli/decompose/prompt_modules/subtask_constraint_assign/_subtask_constraint_assign.py                                           51     51     10      0   0.00%   1-247
cli/decompose/prompt_modules/subtask_constraint_assign/_types.py                                                                6      6      0      0   0.00%   1-26
cli/decompose/prompt_modules/subtask_list/__init__.py                                                                           3      3      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_list/_exceptions.py                                                                       15     15      0      0   0.00%   1-23
cli/decompose/prompt_modules/subtask_list/_prompt/__init__.py                                                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/__init__.py                                                     2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_1/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_1/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_2/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_2/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_3/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_3/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_icl_examples.py                                                5      5      0      0   0.00%   1-6
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_types.py                                                       5      5      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_list/_prompt/_prompt.py                                                                   11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_subtask_list.py                                                                     50     50      4      0   0.00%   1-166
cli/decompose/prompt_modules/subtask_list/_types.py                                                                             4      4      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_prompt_generator/__init__.py                                                               3      3      0      0   0.00%   1-8
cli/decompose/prompt_modules/subtask_prompt_generator/_exceptions.py                                                           12     12      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/__init__.py                                                       2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/__init__.py                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/__init__.py                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_1/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_1/_example.py       4      4      0      0   0.00%   3-19
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_2/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_2/_example.py       4      4      0      0   0.00%   3-19
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_3/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_3/_example.py       4      4      0      0   0.00%   3-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_4/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_4/_example.py       4      4      0      0   0.00%   3-24
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_group.py           11     11      0      0   0.00%   1-16
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/__init__.py                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_1/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_1/_example.py       4      4      0      0   0.00%   3-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_2/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_2/_example.py       4      4      0      0   0.00%   3-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_3/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_3/_example.py       4      4      0      0   0.00%   3-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_4/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_4/_example.py       4      4      0      0   0.00%   3-24
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_5/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_5/_example.py       4      4      0      0   0.00%   3-25
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_group.py           12     12      0      0   0.00%   1-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_icl_example_groups.py                        4      4      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_types.py                                     9      9      0      0   0.00%   1-13
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_prompt.py                                                       11     11      0      0   0.00%   1-33
cli/decompose/prompt_modules/subtask_prompt_generator/_subtask_prompt_generator.py                                             50     50      8      0   0.00%   1-247
cli/decompose/prompt_modules/subtask_prompt_generator/_types.py                                                                 5      5      0      0   0.00%   1-23
cli/decompose/prompt_modules/validation_decision/__init__.py                                                                    2      2      0      0   0.00%   1-5
cli/decompose/prompt_modules/validation_decision/_exceptions.py                                                                12     12      0      0   0.00%   1-22
cli/decompose/prompt_modules/validation_decision/_prompt/__init__.py                                                            2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/__init__.py                                              2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_1/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_1/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_2/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_2/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_3/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_3/_example.py                                   5      5      0      0   0.00%   1-21
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_4/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_4/_example.py                                   5      5      0      0   0.00%   1-12
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_5/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_5/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_icl_examples.py                                         7      7      0      0   0.00%   1-10
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_types.py                                                5      5      0      0   0.00%   1-7
cli/decompose/prompt_modules/validation_decision/_prompt/_prompt.py                                                            11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/validation_decision/_validation_decision.py                                                       42     42      6      0   0.00%   1-128
cli/decompose/utils.py                                                                                                          6      6      2      0   0.00%   1-13
cli/eval/__init__.py                                                                                                            0      0      0      0 100.00%
cli/eval/commands.py                                                                                                            3      3      0      0   0.00%   5-50
cli/eval/runner.py                                                                                                            163    163     40      0   0.00%   1-353
cli/m.py                                                                                                                       12     12      0      0   0.00%   3-30
mellea/__init__.py                                                                                                              4      0      0      0 100.00%
mellea/backends/__init__.py                                                                                                     6      0      0      0 100.00%
mellea/backends/adapters/__init__.py                                                                                            2      0      0      0 100.00%
mellea/backends/adapters/adapter.py                                                                                            76     43     18      0  35.11%   28-37, 98-149, 167-178, 187, 199-200, 212
mellea/backends/adapters/catalog.py                                                                                            20      4      2      0  72.73%   78, 88-96
mellea/backends/backend.py                                                                                                     11      0      0      0 100.00%
mellea/backends/bedrock.py                                                                                                     42     42     10      0   0.00%   3-78
mellea/backends/cache.py                                                                                                       28      6      6      3  73.53%   42, 50-52, 58, 61
mellea/backends/dummy.py                                                                                                       15     15      4      0   0.00%   3-45
mellea/backends/huggingface.py                                                                                                484    484    132      0   0.00%   6-1314
mellea/backends/kv_block_helpers.py                                                                                            24     24      2      0   0.00%   3-47
mellea/backends/litellm.py                                                                                                    248    248     90      0   0.00%   3-677
mellea/backends/model_ids.py                                                                                                   38      0      0      0 100.00%
mellea/backends/model_options.py                                                                                               46      3     20      3  90.91%   77, 87-91, 113->116
mellea/backends/ollama.py                                                                                                     273    273    104      0   0.00%   3-684
mellea/backends/openai.py                                                                                                     387    387    132      0   0.00%   3-1037
mellea/backends/tools.py                                                                                                      249     87     94     21  59.77%   49-77, 100, 109-116, 132, 136, 142-143, 169-172, 183, 189->196, 194, 200-202, 212->210, 292-309, 320, 332, 349->356, 353, 360, 365, 371-405, 425-428, 443, 476-482, 499, 545, 553, 575, 579-580, 613-614
mellea/backends/utils.py                                                                                                       39      7     16      5  78.18%   47, 53-54, 67-70, 76, 84
mellea/backends/vllm.py                                                                                                       198     68     40     11  59.24%   27-28, 95-98, 131, 132->139, 166-198, 241->246, 295->311, 297, 333-345, 381-383, 388, 392->394, 410->413, 470-546, 573
mellea/backends/watsonx.py                                                                                                    232    232     72      0   0.00%   3-642
mellea/core/__init__.py                                                                                                         7      0      0      0 100.00%
mellea/core/backend.py                                                                                                         40     10     12      3  67.31%   102, 111-120, 127, 133
mellea/core/base.py                                                                                                           327    110     76     16  59.80%   39, 42->44, 54, 70-73, 78-102, 107-109, 116-117, 157-158, 220, 226, 232, 242-243, 246, 271-272, 275, 290, 296->308, 313->315, 317-327, 345-351, 360-363, 378-394, 399-423, 491, 499, 520-529, 545-548, 555-569, 660-665, 670-684
mellea/core/formatter.py                                                                                                        5      0      0      0 100.00%
mellea/core/requirement.py                                                                                                     61     32      6      0  43.28%   34-38, 43, 48, 53, 58, 62, 66, 75-84, 125-145, 154, 158-161, 170
mellea/core/sampling.py                                                                                                        39      6      8      4  78.72%   36, 38, 40, 42, 72, 77
mellea/core/utils.py                                                                                                           65     11     10      3  78.67%   24-35, 43-55, 104->111, 106
mellea/formatters/__init__.py                                                                                                   4      0      0      0 100.00%
mellea/formatters/chat_formatter.py                                                                                            27     11     12      4  51.28%   25-40, 44, 48->52, 53-54
mellea/formatters/template_formatter.py                                                                                       132     29     68     23  73.00%   70-71, 76, 84-87, 95, 102, 106-110, 131, 134-137, 140->145, 143, 151->186, 155, 159-160, 162->183, 168-173, 183->151, 187, 199-207, 209->211, 234->243, 248->243, 257, 266->264, 269, 283
mellea/helpers/__init__.py                                                                                                      5      0      0      0 100.00%
mellea/helpers/async_helpers.py                                                                                                47     21     14      2  49.18%   17, 27, 35-36, 45-49, 73-74, 78, 82-88, 92-99
mellea/helpers/event_loop_helper.py                                                                                            34     14      6      1  52.50%   30, 34-54, 60
mellea/helpers/openai_compatible_helpers.py                                                                                    88     77     46      0   8.21%   17-42, 54-122, 127-138, 159-172
mellea/helpers/server_type.py                                                                                                  19     10      4      0  39.13%   19-28
mellea/stdlib/__init__.py                                                                                                       0      0      0      0 100.00%
mellea/stdlib/components/__init__.py                                                                                            8      0      0      0 100.00%
mellea/stdlib/components/chat.py                                                                                               70     50     24      0  21.28%   52-54, 58-63, 71, 97-144, 169-173, 177-181, 193-212
mellea/stdlib/components/docs/__init__.py                                                                                       2      0      0      0 100.00%
mellea/stdlib/components/docs/document.py                                                                                      16     11      4      0  25.00%   12-14, 25-32, 36
mellea/stdlib/components/docs/richdocument.py                                                                                  84     84      6      0   0.00%   3-192
mellea/stdlib/components/genslot.py                                                                                           236    163     66      0  24.17%   89, 100-109, 117, 120, 123, 126, 141-142, 150-151, 163, 183-193, 209-222, 272-291, 310-386, 390-394, 398, 417-423, 473-554, 606-694, 828-831
mellea/stdlib/components/instruction.py                                                                                        66     28     18      2  47.62%   52-107, 135, 173-174, 183-185
mellea/stdlib/components/intrinsic/__init__.py                                                                                  2      0      0      0 100.00%
mellea/stdlib/components/intrinsic/intrinsic.py                                                                                14      7      2      0  43.75%   29-32, 37, 50, 66
mellea/stdlib/components/intrinsic/rag.py                                                                                      46     46      8      0   0.00%   3-313
mellea/stdlib/components/mify.py                                                                                              125     94     56      0  17.13%   44, 54, 64, 73-75, 88-123, 138-168, 180-193, 210, 217-220, 311-315, 335-392, 399-407, 415-433
mellea/stdlib/components/mobject.py                                                                                            60     28      4      0  50.00%   23-24, 28, 32-33, 54, 67-68, 72, 76-77, 98, 166-167, 171, 179, 187, 195, 202-218, 227-230, 240
mellea/stdlib/components/react.py                                                                                              33     33      2      0   0.00%   3-96
mellea/stdlib/components/simple.py                                                                                             33     23     12      0  22.22%   11-15, 19, 22-28, 33, 40-49, 53, 57
mellea/stdlib/components/unit_test_eval.py                                                                                     71     71     14      0   0.00%   3-148
mellea/stdlib/context.py                                                                                                       17      1      0      0  94.12%   37
mellea/stdlib/frameworks/__init__.py                                                                                            0      0      0      0 100.00%
mellea/stdlib/frameworks/react.py                                                                                              39     39     14      0   0.00%   4-121
mellea/stdlib/functional.py                                                                                                   205    132     78     10  30.04%   238-259, 277-291, 318-333, 360-414, 489, 498, 504-521, 554->558, 573-575, 578-581, 668-686, 712-733, 753, 759, 766-769, 772-788, 817-832, 859-913, 921-937, 948-975
mellea/stdlib/requirements/__init__.py                                                                                          5      0      0      0 100.00%
mellea/stdlib/requirements/md.py                                                                                               45     36     10      0  16.36%   17-21, 29-46, 50, 65-77
mellea/stdlib/requirements/python_reqs.py                                                                                      65     57     30      0   8.42%   32-58, 63-98, 118-137, 160-193
mellea/stdlib/requirements/requirement.py                                                                                      45     32     14      0  22.03%   23-36, 50-59, 71-76, 81, 86, 121-147
mellea/stdlib/requirements/safety/__init__.py                                                                                   0      0      0      0 100.00%
mellea/stdlib/requirements/safety/guardian.py                                                                                 157    157     74      0   0.00%   3-355
mellea/stdlib/requirements/tool_reqs.py                                                                                        44     40     24      0   5.88%   9-15, 27-36, 65-109
mellea/stdlib/sampling/__init__.py                                                                                              3      0      0      0 100.00%
mellea/stdlib/sampling/base.py                                                                                                 98     31     16      5  63.16%   143, 144->146, 160, 220-249, 278, 300, 322, 344-365, 387, 409-424
mellea/stdlib/sampling/budget_forcing.py                                                                                       71     71     10      0   0.00%   3-249
mellea/stdlib/sampling/majority_voting.py                                                                                      86     86     22      0   0.00%   3-292
mellea/stdlib/sampling/sampling_algos/__init__.py                                                                               2      2      0      0   0.00%   3-5
mellea/stdlib/sampling/sampling_algos/budget_forcing_alg.py                                                                    76     76     28      0   0.00%   3-181
mellea/stdlib/sampling/sofai.py                                                                                               200    174     72      0   9.56%   82-103, 128-151, 171, 186-196, 201-224, 236-244, 257-263, 281-345, 360-366, 389-408, 436-493, 522-550, 603-768
mellea/stdlib/session.py                                                                                                      169    117     40      1  25.36%   47-52, 57-96, 156-198, 233-239, 243-249, 253-257, 282, 290-292, 345-363, 456-457, 476-489, 502, 534-544, 567-576, 629-647, 721-746, 760-773, 786, 818-828, 851-860, 865-866, 877-887
mellea/stdlib/tools/__init__.py                                                                                                 2      0      0      0 100.00%
mellea/stdlib/tools/interpreter.py                                                                                            106     72     30      0  25.00%   53-65, 77, 89-112, 127-138, 141-179, 187-221, 232-255, 260, 269-270, 279-280
mellea/telemetry/__init__.py                                                                                                   98     62     42      3  27.86%   31-35, 56-78, 87-90, 95, 100, 115-119, 137-147, 163-172, 181-182, 193-208, 220, 230-232
mellea/telemetry/backend_instrumentation.py                                                                                    90     90     44      0   0.00%   7-270
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                        7032   5388   1912    120  20.59%
Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
======================== 1 passed, 1 warning in 54.33s =========================
collected 380 items / 371 deselected / 1 skipped / 9 selected

======================================================================
Heavy GPU Test Process Isolation Active
======================================================================
Running 2 heavy GPU test module(s) in separate processes
to ensure GPU memory is fully released between modules.


[1/2] Running: /proj/dmfexp/eiger/users/jonesn/mellea-b/test/backends/test_vllm.py
----------------------------------------------------------------------
✓ Module passed: /proj/dmfexp/eiger/users/jonesn/mellea-b/test/backends/test_vllm.py

[2/2] Running: /proj/dmfexp/eiger/users/jonesn/mellea-b/test/backends/test_vllm_tools.py
----------------------------------------------------------------------
✓ Module passed: /proj/dmfexp/eiger/users/jonesn/mellea-b/test/backends/test_vllm_tools.py

======================================================================
All heavy GPU modules passed!
======================================================================


=============================== Skipped Examples ===============================
Examples with the following names were skipped because they cannot be easily run in the pytest framework; please run them manually:
simple_rag_with_filter.py
__init__.py
m_decomp_result.py
client.py
mcp_example.py
python_decompose_result.py
pii_serve.py
mellea_pdf.py
================ 1 skipped, 371 deselected in 139.13s (0:02:19) ================
!!!! _pytest.outcomes.Exit: Heavy GPU tests completed in isolated processes !!!!

------------------------------------------------------------
Sender: LSF System <lsfadmin@p4-r29-n3>
Subject: Job 452680: <___vllm_tests__marker_> in cluster <BLUEVELA_LSF> Done

Job <___vllm_tests__marker_> was submitted from host <login3> by user <jonesn> in cluster <BLUEVELA_LSF> at Tue Feb 10 10:38:24 2026
Job was executed on host(s) <p4-r29-n3>, in queue <normal>, as user <jonesn> in cluster <BLUEVELA_LSF> at Tue Feb 10 10:38:26 2026
</u/jonesn> was used as the home directory.
</proj/dmfexp/eiger/users/jonesn/mellea-b> was used as the working directory.
Started at Tue Feb 10 10:38:26 2026
Terminated at Tue Feb 10 10:40:56 2026
Results reported at Tue Feb 10 10:40:56 2026

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
export VLLM_USE_V1=0 && export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True && source /opt/share/miniforge/etc/profile.d/conda.sh && conda activate mellea && pytest -m vllm --no-cov -v
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :                                   231.00 sec.
    Max Memory :                                 4852 MB
    Average Memory :                             1939.68 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              5
    Max Threads :                                491
    Run time :                                   154 sec.
    Turnaround time :                            152 sec.

The output (if any) is above this job summary.



PS:

Read file </proj/dmfexp/eiger/users/jonesn/mellea-b/logs/___vllm_tests__marker__452680.stderr> for stderr output of this job.



=== STDERR ===


@planetf1
Copy link
Contributor Author

planetf1 commented Feb 10, 2026

I noted the unreliability in responses, and in trying to address some warnings found a potential issue in how we handle tokenization. Raised issue #431 to discuss further

@jakelorocco
Copy link
Contributor

@planetf1, in the example output above, is each collection of tests a separate run of pytest?

Also, I'm looking at the output code; it looks like if the module fails, it doesn't tell us which test failed? Is it possible to do this gpu memory cleaning through a regular pytest auto-run fixture so that we keep the results in a regular pytest form?

@planetf1
Copy link
Contributor Author

planetf1 commented Feb 11, 2026

@planetf1, in the example output above, is each collection of tests a separate run of pytest?

Also, I'm looking at the output code; it looks like if the module fails, it doesn't tell us which test failed? Is it possible to do this gpu memory cleaning through a regular pytest auto-run fixture so that we keep the results in a regular pytest form?

Yes, each collection is run separately - I was unable to get the required isolation without doing some juggling at the process level due to the way cuda allocation works (not a problem on macOS ). It would be wonderful if someone could demonstrate this is possible, and the explanation below is wrong - but I couldn't get there without splitting processes.

I'll add a longer (generated) explanation below.

The individual module errors are reported, but you need to look back through the output. I'll try to add a workaround for this.

The CUDA Memory Problem

CUDA's Memory Model:

  1. When you load a model onto GPU, CUDA driver allocates memory
  2. This memory is tracked at the operating system process level, not Python object level
  3. The CUDA driver maintains internal state about allocated memory
  4. This state persists for the lifetime of the process

What Happens with Fixtures:

@pytest.fixture(scope="module")
def vllm_backend():
    backend = VLLMBackend(model="mistralai/Mistral-7B-Instruct-v0.3")
    yield backend
    # Cleanup code runs here
    del backend
    gc.collect()
    torch.cuda.empty_cache()  # This doesn't actually free memory!

The Problem:

  • del backend removes Python's reference to the object
  • gc.collect() runs Python's garbage collector
  • torch.cuda.empty_cache() tells PyTorch to release its cache
  • BUT: The CUDA driver still holds the memory at the OS process level
  • The memory is marked as "available" but not actually freed back to the system
  • Next test module tries to load another model → OOM error

Why Process Isolation Works

With Separate Processes:

Process 1: test_vllm.py
  ├─ Load model (8GB GPU memory allocated by CUDA driver)
  ├─ Run tests
  └─ Process exits → OS reclaims ALL memory from this process
  
Process 2: test_vllm_tools.py
  ├─ Fresh process, clean GPU memory state
  ├─ Load model (8GB GPU memory allocated)
  ├─ Run tests
  └─ Process exits → OS reclaims ALL memory

Key Difference:

  • Process termination triggers OS-level cleanup
  • OS forcibly reclaims all resources (memory, file handles, etc.)
  • CUDA driver's process-level state is destroyed
  • GPU memory is truly freed back to the system

Real-World Impact

Without Process Isolation (using fixtures only):

test_vllm.py: Load 8GB model → Run tests → "Cleanup" (memory still held)
test_vllm_tools.py: Try to load 8GB model → OOM! (16GB total, only 12GB available)

With Process Isolation:

Process 1 (test_vllm.py): Load 8GB → Run tests → Exit (memory freed)
Process 2 (test_vllm_tools.py): Load 8GB → Run tests → Exit (memory freed)

Why This Matters for vLLM/HuggingFace

  • vLLM models: 7-8GB GPU memory per model
  • HuggingFace models: Similar memory footprint
  • Typical GPU: 12-24GB total memory
  • Running 2+ heavy tests in same process → Guaranteed OOM
  • Process isolation: Each test gets fresh GPU memory state

Technical Details

CUDA Memory Hierarchy:

  1. OS Process Level (what CUDA driver manages)

    • Only freed on process exit
    • Cannot be forced from Python
  2. PyTorch Cache Level (what torch.cuda.empty_cache() affects)

    • Can be cleared
    • But doesn't free OS-level allocation
  3. Python Object Level (what del and gc.collect() affect)

    • Can be cleaned up
    • But doesn't affect CUDA driver state

Fixtures operate at level 3, but the problem is at level 1.

Summary

Why fixtures don't work:

  • Fixtures run in same process
  • CUDA holds memory at process level
  • Python cannot force CUDA driver to release process-level memory
  • Only process termination triggers OS-level cleanup

Why process isolation works:

  • Each module runs in separate process
  • Process exit forces OS to reclaim all resources
  • CUDA driver state is destroyed
  • GPU memory is truly freed

This is not a pytest limitation - it's a fundamental constraint of how CUDA manages GPU memory at the operating system level.

@planetf1
Copy link
Contributor Author

PyTorch documentation states that even after cleanup, "the unused memory managed by the allocator
will still show as if used in nvidia-smi" (https://docs.pytorch.org/docs/stable/notes/cuda.html).

More specifically for vLLM, the vLLM maintainers themselves recommend process termination as the
only reliable approach. From
vllm-project/vllm#1908 (comment):

"In general it is very difficult to clean up all resources correctly, especially when we use
multiple GPUs, and might be prone to deadlocks. I would say, the most stable way to terminate vLLM
is to shut down the process."
— _@_youkaichao, vLLM maintainer

@planetf1
Copy link
Contributor Author

Added code in the cuda/large test code in conftest.py to add a summary at the end to improve visibility of any failures

@planetf1
Copy link
Contributor Author

planetf1 commented Feb 11, 2026

Here's a log with a forced error (intentionally broke)

=== STDOUT ===
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /u/jonesn/.conda/envs/mellea/bin/python3
cachedir: .pytest_cache
rootdir: /proj/dmfexp/eiger/users/jonesn/mellea
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, asyncio-1.3.0, Faker-40.1.2, timeout-2.4.0, langsmith-0.6.6, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 380 items / 371 deselected / 1 skipped / 9 selected

======================================================================
Heavy GPU Test Process Isolation Active
======================================================================
Running 2 heavy GPU test module(s) in separate processes
to ensure GPU memory is fully released between modules.


[1/2] Running: /proj/dmfexp/eiger/users/jonesn/mellea/test/backends/test_vllm.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /u/jonesn/.conda/envs/mellea/bin/python3
cachedir: .pytest_cache
rootdir: /proj/dmfexp/eiger/users/jonesn/mellea
configfile: pyproject.toml
plugins: nbmake-1.5.5, asyncio-1.3.0, Faker-40.1.2, timeout-2.4.0, langsmith-0.6.6, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 8 items

test/backends/test_vllm.py::test_system_prompt FAILED                    [ 12%]
test/backends/test_vllm.py::test_instruct PASSED                         [ 25%]
test/backends/test_vllm.py::test_multiturn PASSED                        [ 37%]
test/backends/test_vllm.py::test_format PASSED                           [ 50%]
test/backends/test_vllm.py::test_generate_from_raw PASSED                [ 62%]
test/backends/test_vllm.py::test_generate_from_raw_with_format PASSED    [ 75%]
test/backends/test_vllm.py::test_async_parallel_requests PASSED          [ 87%]
test/backends/test_vllm.py::test_async_avalue PASSED                     [100%]

=================================== FAILURES ===================================
______________________________ test_system_prompt ______________________________

session = <mellea.stdlib.session.MelleaSession object at 0x1490779a3d10>

    @pytest.mark.qualitative
    def test_system_prompt(session) -> None:
        result = session.chat(
            "Where are we going?",
            model_options={ModelOption.SYSTEM_PROMPT: "Talk like a pirate."},
        )
        print(result)
        # INJECTED FAILURE FOR TESTING - REMOVE AFTER VERIFICATION
>       assert False, "Intentional test failure to verify failure summary reporting"
E       AssertionError: Intentional test failure to verify failure summary reporting
E       assert False

test/backends/test_vllm.py:77: AssertionError
---------------------------- Captured stdout setup -----------------------------
�[38;20m=== 10:17:38-INFO ======
Instantiating vllm with the following model parameters:
gpu_memory_utilization: 0.8
max_model_len: 8192
max_num_seqs: 8
�[0m
INFO 02-11 10:17:48 [config.py:823] This model supports multiple tasks: {'classify', 'generate', 'score', 'embed', 'reward'}. Defaulting to 'generate'.
INFO 02-11 10:17:48 [llm_engine.py:230] Initializing a V0 LLM engine (v0.9.1) with config: model='Qwen/Qwen3-0.6B', speculative_config=None, tokenizer='Qwen/Qwen3-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='xgrammar', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=None, served_model_name=Qwen/Qwen3-0.6B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":false,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":8,"local_cache_dir":null}, use_cached_outputs=False, 
INFO 02-11 10:17:52 [cuda.py:327] Using Flash Attention backend.
INFO 02-11 10:17:52 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 02-11 10:17:52 [model_runner.py:1171] Starting to load model Qwen/Qwen3-0.6B...
INFO 02-11 10:17:53 [weight_utils.py:292] Using model weights format ['*.safetensors']
INFO 02-11 10:17:53 [weight_utils.py:345] No model.safetensors.index.json found in remote.
INFO 02-11 10:17:53 [default_loader.py:272] Loading weights took 0.58 seconds
INFO 02-11 10:17:54 [model_runner.py:1203] Model loading took 1.1201 GiB and 0.898221 seconds
INFO 02-11 10:17:54 [worker.py:294] Memory profiling takes 0.44 seconds
INFO 02-11 10:17:54 [worker.py:294] the current vLLM instance can use total_gpu_memory (79.19GiB) x gpu_memory_utilization (0.80) = 63.35GiB
INFO 02-11 10:17:54 [worker.py:294] model weights take 1.12GiB; non_torch_memory takes 0.16GiB; PyTorch activation peak memory takes 0.23GiB; the rest of the memory reserved for KV Cache is 61.84GiB.
INFO 02-11 10:17:54 [executor_base.py:113] # cuda blocks: 36184, # CPU blocks: 2340
INFO 02-11 10:17:54 [executor_base.py:118] Maximum concurrency for 8192 tokens per request: 70.67x
INFO 02-11 10:17:57 [model_runner.py:1513] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 02-11 10:17:59 [model_runner.py:1671] Graph capturing finished in 2 secs, took 0.13 GiB
INFO 02-11 10:17:59 [llm_engine.py:428] init engine (profile, create kv cache, warmup model) took 5.13 seconds
�[38;20m=== 10:17:59-INFO ======
vllm instantiated.
final model parameters:
gpu_memory_utilization: 0.8
max_model_len: 8192
max_num_seqs: 8
�[0m
---------------------------- Captured stderr setup -----------------------------

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]

Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.86it/s]

Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.86it/s]


Capturing CUDA graph shapes:   0%|          | 0/4 [00:00<?, ?it/s]
Capturing CUDA graph shapes:  25%|██▌       | 1/4 [00:00<00:01,  2.21it/s]
Capturing CUDA graph shapes:  50%|█████     | 2/4 [00:00<00:00,  2.44it/s]
Capturing CUDA graph shapes:  75%|███████▌  | 3/4 [00:01<00:00,  2.42it/s]
Capturing CUDA graph shapes: 100%|██████████| 4/4 [00:01<00:00,  2.48it/s]
Capturing CUDA graph shapes: 100%|██████████| 4/4 [00:01<00:00,  2.44it/s]
------------------------------ Captured log setup ------------------------------
INFO     fancy_logger:vllm.py:152 Instantiating vllm with the following model parameters:
gpu_memory_utilization: 0.8
max_model_len: 8192
max_num_seqs: 8

INFO     fancy_logger:vllm.py:206 vllm instantiated.
final model parameters:
gpu_memory_utilization: 0.8
max_model_len: 8192
max_num_seqs: 8
----------------------------- Captured stdout call -----------------------------
INFO 02-11 10:18:00 [async_llm_engine.py:210] Added request 22619518960896.
INFO 02-11 10:18:00 [async_llm_engine.py:178] Finished request 22619518960896.
mellea.Message(role="assistant", content="
<think>
Okay, the user asked "Where are we going?" as", images="[]", documents="[]")
================================ tests coverage ================================
_______________ coverage: platform linux, python 3.12.12-final-0 _______________

Name                                                                                                                        Stmts   Miss Branch BrPart   Cover   Missing
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cli/__init__.py                                                                                                                 0      0      0      0 100.00%
cli/alora/__init__.py                                                                                                           0      0      0      0 100.00%
cli/alora/commands.py                                                                                                           4      4      0      0   0.00%   1-50
cli/alora/train.py                                                                                                             75     75     16      0   0.00%   1-173
cli/alora/upload.py                                                                                                            15     15      4      0   0.00%   1-45
cli/decompose/__init__.py                                                                                                       4      4      0      0   0.00%   1-12
cli/decompose/decompose.py                                                                                                     92     92     36      0   0.00%   1-333
cli/decompose/pipeline.py                                                                                                      56     56      8      0   0.00%   1-179
cli/decompose/prompt_modules/__init__.py                                                                                        6      6      0      0   0.00%   1-10
cli/decompose/prompt_modules/_prompt_modules.py                                                                                13     13      0      0   0.00%   1-37
cli/decompose/prompt_modules/constraint_extractor/__init__.py                                                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_constraint_extractor.py                                                     36     36      6      0   0.00%   1-136
cli/decompose/prompt_modules/constraint_extractor/_exceptions.py                                                               12     12      0      0   0.00%   1-18
cli/decompose/prompt_modules/constraint_extractor/_prompt/__init__.py                                                           2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/__init__.py                                             2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_1/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_1/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_2/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_2/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_3/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_3/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_4/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_4/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_5/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_5/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_6/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_6/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_icl_examples.py                                        8      8      0      0   0.00%   1-9
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_types.py                                               4      4      0      0   0.00%   1-6
cli/decompose/prompt_modules/constraint_extractor/_prompt/_prompt.py                                                           11     11      0      0   0.00%   1-24
cli/decompose/prompt_modules/general_instructions/__init__.py                                                                   2      2      0      0   0.00%   1-5
cli/decompose/prompt_modules/general_instructions/_exceptions.py                                                               12     12      0      0   0.00%   1-18
cli/decompose/prompt_modules/general_instructions/_general_instructions.py                                                     33     33      4      0   0.00%   1-76
cli/decompose/prompt_modules/general_instructions/_prompt/__init__.py                                                           2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/__init__.py                                             2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_1/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_1/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_2/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_2/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_3/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_3/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_icl_examples.py                                        5      5      0      0   0.00%   1-6
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_types.py                                               4      4      0      0   0.00%   1-6
cli/decompose/prompt_modules/general_instructions/_prompt/_prompt.py                                                           11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_constraint_assign/__init__.py                                                              3      3      0      0   0.00%   1-8
cli/decompose/prompt_modules/subtask_constraint_assign/_exceptions.py                                                          12     12      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/__init__.py                                                      2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/__init__.py                                        2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_1/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_1/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_2/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_2/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_3/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_3/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_4/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_4/_example.py                             9      9      0      0   0.00%   1-32
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_icl_examples.py                                   6      6      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_types.py                                          7      7      0      0   0.00%   1-9
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_prompt.py                                                      12     12      0      0   0.00%   1-25
cli/decompose/prompt_modules/subtask_constraint_assign/_subtask_constraint_assign.py                                           51     51     10      0   0.00%   1-247
cli/decompose/prompt_modules/subtask_constraint_assign/_types.py                                                                6      6      0      0   0.00%   1-26
cli/decompose/prompt_modules/subtask_list/__init__.py                                                                           3      3      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_list/_exceptions.py                                                                       15     15      0      0   0.00%   1-23
cli/decompose/prompt_modules/subtask_list/_prompt/__init__.py                                                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/__init__.py                                                     2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_1/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_1/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_2/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_2/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_3/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_3/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_icl_examples.py                                                5      5      0      0   0.00%   1-6
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_types.py                                                       5      5      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_list/_prompt/_prompt.py                                                                   11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_subtask_list.py                                                                     50     50      4      0   0.00%   1-166
cli/decompose/prompt_modules/subtask_list/_types.py                                                                             4      4      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_prompt_generator/__init__.py                                                               3      3      0      0   0.00%   1-8
cli/decompose/prompt_modules/subtask_prompt_generator/_exceptions.py                                                           12     12      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/__init__.py                                                       2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/__init__.py                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/__init__.py                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_1/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_1/_example.py       4      4      0      0   0.00%   3-19
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_2/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_2/_example.py       4      4      0      0   0.00%   3-19
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_3/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_3/_example.py       4      4      0      0   0.00%   3-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_4/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_4/_example.py       4      4      0      0   0.00%   3-24
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_group.py           11     11      0      0   0.00%   1-16
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/__init__.py                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_1/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_1/_example.py       4      4      0      0   0.00%   3-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_2/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_2/_example.py       4      4      0      0   0.00%   3-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_3/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_3/_example.py       4      4      0      0   0.00%   3-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_4/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_4/_example.py       4      4      0      0   0.00%   3-24
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_5/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_5/_example.py       4      4      0      0   0.00%   3-25
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_group.py           12     12      0      0   0.00%   1-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_icl_example_groups.py                        4      4      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_types.py                                     9      9      0      0   0.00%   1-13
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_prompt.py                                                       11     11      0      0   0.00%   1-33
cli/decompose/prompt_modules/subtask_prompt_generator/_subtask_prompt_generator.py                                             50     50      8      0   0.00%   1-247
cli/decompose/prompt_modules/subtask_prompt_generator/_types.py                                                                 5      5      0      0   0.00%   1-23
cli/decompose/prompt_modules/validation_decision/__init__.py                                                                    2      2      0      0   0.00%   1-5
cli/decompose/prompt_modules/validation_decision/_exceptions.py                                                                12     12      0      0   0.00%   1-22
cli/decompose/prompt_modules/validation_decision/_prompt/__init__.py                                                            2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/__init__.py                                              2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_1/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_1/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_2/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_2/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_3/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_3/_example.py                                   5      5      0      0   0.00%   1-21
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_4/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_4/_example.py                                   5      5      0      0   0.00%   1-12
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_5/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_5/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_icl_examples.py                                         7      7      0      0   0.00%   1-10
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_types.py                                                5      5      0      0   0.00%   1-7
cli/decompose/prompt_modules/validation_decision/_prompt/_prompt.py                                                            11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/validation_decision/_validation_decision.py                                                       42     42      6      0   0.00%   1-128
cli/decompose/utils.py                                                                                                          6      6      2      0   0.00%   1-13
cli/eval/__init__.py                                                                                                            0      0      0      0 100.00%
cli/eval/commands.py                                                                                                            3      3      0      0   0.00%   5-50
cli/eval/runner.py                                                                                                            163    163     40      0   0.00%   1-353
cli/m.py                                                                                                                       12     12      0      0   0.00%   3-30
mellea/__init__.py                                                                                                              4      0      0      0 100.00%
mellea/backends/__init__.py                                                                                                     6      0      0      0 100.00%
mellea/backends/adapters/__init__.py                                                                                            2      0      0      0 100.00%
mellea/backends/adapters/adapter.py                                                                                            76     43     18      0  35.11%   28-37, 98-149, 167-178, 187, 199-200, 212
mellea/backends/adapters/catalog.py                                                                                            20      4      2      0  72.73%   78, 88-96
mellea/backends/backend.py                                                                                                     11      0      0      0 100.00%
mellea/backends/bedrock.py                                                                                                     42     42     10      0   0.00%   3-78
mellea/backends/cache.py                                                                                                       28      3      6      2  85.29%   42, 58, 61
mellea/backends/dummy.py                                                                                                       15     15      4      0   0.00%   3-45
mellea/backends/huggingface.py                                                                                                484    484    132      0   0.00%   6-1314
mellea/backends/kv_block_helpers.py                                                                                            24     24      2      0   0.00%   3-47
mellea/backends/litellm.py                                                                                                    248    248     90      0   0.00%   3-677
mellea/backends/model_ids.py                                                                                                   38      0      0      0 100.00%
mellea/backends/model_options.py                                                                                               46      3     20      3  90.91%   77, 87-91, 113->116
mellea/backends/ollama.py                                                                                                     273    273    104      0   0.00%   3-684
mellea/backends/openai.py                                                                                                     387    387    132      0   0.00%   3-1037
mellea/backends/tools.py                                                                                                      249    192     94      0  16.62%   33-35, 39, 44, 49-77, 86-91, 98-123, 131-143, 159-174, 182-202, 207-214, 261-405, 425-428, 443, 476-482, 499, 543-582, 590-630
mellea/backends/utils.py                                                                                                       39     15     16      1  56.36%   47, 63-84
mellea/backends/vllm.py                                                                                                       198     31     40      8  78.57%   27-28, 95-98, 131, 132->139, 166-198, 296-309, 381-383, 388, 411, 478, 511->exit
mellea/backends/watsonx.py                                                                                                    232    232     72      0   0.00%   3-642
mellea/core/__init__.py                                                                                                         7      0      0      0 100.00%
mellea/core/backend.py                                                                                                         40      4     12      4  84.62%   102, 117, 127, 133
mellea/core/base.py                                                                                                           327     95     76     14  65.51%   39, 42->44, 54, 70-73, 78-102, 107-109, 116-117, 157-158, 226, 232, 242-243, 246, 271-272, 275, 313->315, 317-318, 325-327, 341->360, 350-351, 361, 378-394, 399-423, 541, 545-548, 555-569, 651, 660-665, 670-684
mellea/core/formatter.py                                                                                                        5      0      0      0 100.00%
mellea/core/requirement.py                                                                                                     61     32      6      0  43.28%   34-38, 43, 48, 53, 58, 62, 66, 75-84, 125-145, 154, 158-161, 170
mellea/core/sampling.py                                                                                                        39      6      8      4  78.72%   36, 38, 40, 42, 72, 77
mellea/core/utils.py                                                                                                           65     11     10      3  78.67%   24-35, 43-55, 104->111, 106
mellea/formatters/__init__.py                                                                                                   4      0      0      0 100.00%
mellea/formatters/chat_formatter.py                                                                                            27      2     12      3  87.18%   35, 40, 48->52
mellea/formatters/template_formatter.py                                                                                       132     26     68     21  75.50%   76, 84-87, 95, 102, 106-110, 131, 134-137, 140->145, 151->186, 155, 159-160, 162->183, 168-173, 183->151, 187, 199-207, 209->211, 234->243, 248->243, 257, 266->264, 269, 283
mellea/helpers/__init__.py                                                                                                      5      0      0      0 100.00%
mellea/helpers/async_helpers.py                                                                                                47     21     14      2  49.18%   17, 27, 35-36, 45-49, 73-74, 78, 82-88, 92-99
mellea/helpers/event_loop_helper.py                                                                                            34     14      6      1  52.50%   30, 34-54, 60
mellea/helpers/openai_compatible_helpers.py                                                                                    88     77     46      0   8.21%   17-42, 54-122, 127-138, 159-172
mellea/helpers/server_type.py                                                                                                  19     10      4      0  39.13%   19-28
mellea/stdlib/__init__.py                                                                                                       0      0      0      0 100.00%
mellea/stdlib/components/__init__.py                                                                                            8      0      0      0 100.00%
mellea/stdlib/components/chat.py                                                                                               70     39     24      6  39.36%   53, 60, 62, 71, 103-125, 129, 134, 169-173, 177-181, 193-212
mellea/stdlib/components/docs/__init__.py                                                                                       2      0      0      0 100.00%
mellea/stdlib/components/docs/document.py                                                                                      16     11      4      0  25.00%   12-14, 25-32, 36
mellea/stdlib/components/docs/richdocument.py                                                                                  84     84      6      0   0.00%   3-192
mellea/stdlib/components/genslot.py                                                                                           236    163     66      0  24.17%   89, 100-109, 117, 120, 123, 126, 141-142, 150-151, 163, 183-193, 209-222, 272-291, 310-386, 390-394, 398, 417-423, 473-554, 606-694, 828-831
mellea/stdlib/components/instruction.py                                                                                        66     28     18      2  47.62%   52-107, 135, 173-174, 183-185
mellea/stdlib/components/intrinsic/__init__.py                                                                                  2      0      0      0 100.00%
mellea/stdlib/components/intrinsic/intrinsic.py                                                                                14      7      2      0  43.75%   29-32, 37, 50, 66
mellea/stdlib/components/intrinsic/rag.py                                                                                      46     46      8      0   0.00%   3-313
mellea/stdlib/components/mify.py                                                                                              125     94     56      0  17.13%   44, 54, 64, 73-75, 88-123, 138-168, 180-193, 210, 217-220, 311-315, 335-392, 399-407, 415-433
mellea/stdlib/components/mobject.py                                                                                            60     28      4      0  50.00%   23-24, 28, 32-33, 54, 67-68, 72, 76-77, 98, 166-167, 171, 179, 187, 195, 202-218, 227-230, 240
mellea/stdlib/components/react.py                                                                                              33     33      2      0   0.00%   3-96
mellea/stdlib/components/simple.py                                                                                             33     23     12      0  22.22%   11-15, 19, 22-28, 33, 40-49, 53, 57
mellea/stdlib/components/unit_test_eval.py                                                                                     71     71     14      0   0.00%   3-148
mellea/stdlib/context.py                                                                                                       17      0      0      0 100.00%
mellea/stdlib/frameworks/__init__.py                                                                                            0      0      0      0 100.00%
mellea/stdlib/frameworks/react.py                                                                                              39     39     14      0   0.00%   4-121
mellea/stdlib/functional.py                                                                                                   205    118     78     10  36.40%   239, 277-291, 318-333, 360-414, 489, 498, 505, 573-575, 578-581, 668-686, 712-733, 753, 759, 766-769, 772-788, 817-832, 859-913, 921-937, 948-975
mellea/stdlib/requirements/__init__.py                                                                                          5      0      0      0 100.00%
mellea/stdlib/requirements/md.py                                                                                               45     36     10      0  16.36%   17-21, 29-46, 50, 65-77
mellea/stdlib/requirements/python_reqs.py                                                                                      65     57     30      0   8.42%   32-58, 63-98, 118-137, 160-193
mellea/stdlib/requirements/requirement.py                                                                                      45     32     14      0  22.03%   23-36, 50-59, 71-76, 81, 86, 121-147
mellea/stdlib/requirements/safety/__init__.py                                                                                   0      0      0      0 100.00%
mellea/stdlib/requirements/safety/guardian.py                                                                                 157    157     74      0   0.00%   3-355
mellea/stdlib/requirements/tool_reqs.py                                                                                        44     40     24      0   5.88%   9-15, 27-36, 65-109
mellea/stdlib/sampling/__init__.py                                                                                              3      0      0      0 100.00%
mellea/stdlib/sampling/base.py                                                                                                 98     31     16      5  63.16%   143, 144->146, 160, 220-249, 278, 300, 322, 344-365, 387, 409-424
mellea/stdlib/sampling/budget_forcing.py                                                                                       71     71     10      0   0.00%   3-249
mellea/stdlib/sampling/majority_voting.py                                                                                      86     86     22      0   0.00%   3-292
mellea/stdlib/sampling/sampling_algos/__init__.py                                                                               2      2      0      0   0.00%   3-5
mellea/stdlib/sampling/sampling_algos/budget_forcing_alg.py                                                                    76     76     28      0   0.00%   3-181
mellea/stdlib/sampling/sofai.py                                                                                               200    174     72      0   9.56%   82-103, 128-151, 171, 186-196, 201-224, 236-244, 257-263, 281-345, 360-366, 389-408, 436-493, 522-550, 603-768
mellea/stdlib/session.py                                                                                                      169    114     40      1  26.79%   47-52, 57-96, 156-198, 233-239, 243-249, 253-257, 282, 290-292, 345-363, 456-457, 502, 534-544, 567-576, 629-647, 721-746, 760-773, 786, 818-828, 851-860, 865-866, 877-887
mellea/stdlib/tools/__init__.py                                                                                                 2      0      0      0 100.00%
mellea/stdlib/tools/interpreter.py                                                                                            106     72     30      0  25.00%   53-65, 77, 89-112, 127-138, 141-179, 187-221, 232-255, 260, 269-270, 279-280
mellea/telemetry/__init__.py                                                                                                   98     60     42      4  30.00%   31-35, 56-78, 87-90, 95, 100, 115-119, 138-145, 163-172, 181-182, 193-208, 220, 230-232
mellea/telemetry/backend_instrumentation.py                                                                                    90     64     44      8  25.37%   25-26, 40, 42, 44, 46, 48, 52, 64-71, 91-94, 130-135, 196-220, 233-267
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                        7032   5371   1912    102  20.78%
Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
=========================== short test summary info ============================
FAILED test/backends/test_vllm.py::test_system_prompt - AssertionError: Inten...
========================= 1 failed, 7 passed in 41.30s =========================
✗ Module failed: /proj/dmfexp/eiger/users/jonesn/mellea/test/backends/test_vllm.py

[2/2] Running: /proj/dmfexp/eiger/users/jonesn/mellea/test/backends/test_vllm_tools.py
----------------------------------------------------------------------
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /u/jonesn/.conda/envs/mellea/bin/python3
cachedir: .pytest_cache
rootdir: /proj/dmfexp/eiger/users/jonesn/mellea
configfile: pyproject.toml
plugins: nbmake-1.5.5, asyncio-1.3.0, Faker-40.1.2, timeout-2.4.0, langsmith-0.6.6, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collecting ... collected 1 item

test/backends/test_vllm_tools.py::test_tool PASSED                       [100%]

=============================== warnings summary ===============================
test/backends/test_vllm_tools.py::test_tool
  /u/jonesn/.conda/envs/mellea/lib/python3.12/site-packages/vllm/transformers_utils/tokenizer_group.py:24: FutureWarning: It is strongly recommended to run mistral models with `--tokenizer-mode "mistral"` to ensure correct encoding and decoding.
    self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================ tests coverage ================================
_______________ coverage: platform linux, python 3.12.12-final-0 _______________

Name                                                                                                                        Stmts   Miss Branch BrPart   Cover   Missing
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cli/__init__.py                                                                                                                 0      0      0      0 100.00%
cli/alora/__init__.py                                                                                                           0      0      0      0 100.00%
cli/alora/commands.py                                                                                                           4      4      0      0   0.00%   1-50
cli/alora/train.py                                                                                                             75     75     16      0   0.00%   1-173
cli/alora/upload.py                                                                                                            15     15      4      0   0.00%   1-45
cli/decompose/__init__.py                                                                                                       4      4      0      0   0.00%   1-12
cli/decompose/decompose.py                                                                                                     92     92     36      0   0.00%   1-333
cli/decompose/pipeline.py                                                                                                      56     56      8      0   0.00%   1-179
cli/decompose/prompt_modules/__init__.py                                                                                        6      6      0      0   0.00%   1-10
cli/decompose/prompt_modules/_prompt_modules.py                                                                                13     13      0      0   0.00%   1-37
cli/decompose/prompt_modules/constraint_extractor/__init__.py                                                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_constraint_extractor.py                                                     36     36      6      0   0.00%   1-136
cli/decompose/prompt_modules/constraint_extractor/_exceptions.py                                                               12     12      0      0   0.00%   1-18
cli/decompose/prompt_modules/constraint_extractor/_prompt/__init__.py                                                           2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/__init__.py                                             2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_1/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_1/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_2/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_2/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_3/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_3/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_4/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_4/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_5/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_5/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_6/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_example_6/_example.py                                  7      7      0      0   0.00%   1-15
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_icl_examples.py                                        8      8      0      0   0.00%   1-9
cli/decompose/prompt_modules/constraint_extractor/_prompt/_icl_examples/_types.py                                               4      4      0      0   0.00%   1-6
cli/decompose/prompt_modules/constraint_extractor/_prompt/_prompt.py                                                           11     11      0      0   0.00%   1-24
cli/decompose/prompt_modules/general_instructions/__init__.py                                                                   2      2      0      0   0.00%   1-5
cli/decompose/prompt_modules/general_instructions/_exceptions.py                                                               12     12      0      0   0.00%   1-18
cli/decompose/prompt_modules/general_instructions/_general_instructions.py                                                     33     33      4      0   0.00%   1-76
cli/decompose/prompt_modules/general_instructions/_prompt/__init__.py                                                           2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/__init__.py                                             2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_1/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_1/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_2/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_2/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_3/__init__.py                                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_example_3/_example.py                                  8      8      0      0   0.00%   1-13
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_icl_examples.py                                        5      5      0      0   0.00%   1-6
cli/decompose/prompt_modules/general_instructions/_prompt/_icl_examples/_types.py                                               4      4      0      0   0.00%   1-6
cli/decompose/prompt_modules/general_instructions/_prompt/_prompt.py                                                           11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_constraint_assign/__init__.py                                                              3      3      0      0   0.00%   1-8
cli/decompose/prompt_modules/subtask_constraint_assign/_exceptions.py                                                          12     12      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/__init__.py                                                      2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/__init__.py                                        2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_1/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_1/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_2/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_2/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_3/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_3/_example.py                             9      9      0      0   0.00%   1-37
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_4/__init__.py                             1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_example_4/_example.py                             9      9      0      0   0.00%   1-32
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_icl_examples.py                                   6      6      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_icl_examples/_types.py                                          7      7      0      0   0.00%   1-9
cli/decompose/prompt_modules/subtask_constraint_assign/_prompt/_prompt.py                                                      12     12      0      0   0.00%   1-25
cli/decompose/prompt_modules/subtask_constraint_assign/_subtask_constraint_assign.py                                           51     51     10      0   0.00%   1-247
cli/decompose/prompt_modules/subtask_constraint_assign/_types.py                                                                6      6      0      0   0.00%   1-26
cli/decompose/prompt_modules/subtask_list/__init__.py                                                                           3      3      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_list/_exceptions.py                                                                       15     15      0      0   0.00%   1-23
cli/decompose/prompt_modules/subtask_list/_prompt/__init__.py                                                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/__init__.py                                                     2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_1/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_1/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_2/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_2/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_3/__init__.py                                          1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_example_3/_example.py                                          9      9      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_icl_examples.py                                                5      5      0      0   0.00%   1-6
cli/decompose/prompt_modules/subtask_list/_prompt/_icl_examples/_types.py                                                       5      5      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_list/_prompt/_prompt.py                                                                   11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/subtask_list/_subtask_list.py                                                                     50     50      4      0   0.00%   1-166
cli/decompose/prompt_modules/subtask_list/_types.py                                                                             4      4      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_prompt_generator/__init__.py                                                               3      3      0      0   0.00%   1-8
cli/decompose/prompt_modules/subtask_prompt_generator/_exceptions.py                                                           12     12      0      0   0.00%   1-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/__init__.py                                                       2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/__init__.py                                   2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/__init__.py                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_1/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_1/_example.py       4      4      0      0   0.00%   3-19
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_2/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_2/_example.py       4      4      0      0   0.00%   3-19
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_3/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_3/_example.py       4      4      0      0   0.00%   3-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_4/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_4/_example.py       4      4      0      0   0.00%   3-24
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_1/_example_group.py           11     11      0      0   0.00%   1-16
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/__init__.py                  1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_1/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_1/_example.py       4      4      0      0   0.00%   3-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_2/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_2/_example.py       4      4      0      0   0.00%   3-20
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_3/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_3/_example.py       4      4      0      0   0.00%   3-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_4/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_4/_example.py       4      4      0      0   0.00%   3-24
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_5/__init__.py       1      1      0      0   0.00%   1
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_5/_example.py       4      4      0      0   0.00%   3-25
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_example_group_2/_example_group.py           12     12      0      0   0.00%   1-23
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_icl_example_groups.py                        4      4      0      0   0.00%   1-7
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_icl_example_groups/_types.py                                     9      9      0      0   0.00%   1-13
cli/decompose/prompt_modules/subtask_prompt_generator/_prompt/_prompt.py                                                       11     11      0      0   0.00%   1-33
cli/decompose/prompt_modules/subtask_prompt_generator/_subtask_prompt_generator.py                                             50     50      8      0   0.00%   1-247
cli/decompose/prompt_modules/subtask_prompt_generator/_types.py                                                                 5      5      0      0   0.00%   1-23
cli/decompose/prompt_modules/validation_decision/__init__.py                                                                    2      2      0      0   0.00%   1-5
cli/decompose/prompt_modules/validation_decision/_exceptions.py                                                                12     12      0      0   0.00%   1-22
cli/decompose/prompt_modules/validation_decision/_prompt/__init__.py                                                            2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/__init__.py                                              2      2      0      0   0.00%   1-2
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_1/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_1/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_2/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_2/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_3/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_3/_example.py                                   5      5      0      0   0.00%   1-21
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_4/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_4/_example.py                                   5      5      0      0   0.00%   1-12
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_5/__init__.py                                   1      1      0      0   0.00%   1
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_example_5/_example.py                                   5      5      0      0   0.00%   1-13
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_icl_examples.py                                         7      7      0      0   0.00%   1-10
cli/decompose/prompt_modules/validation_decision/_prompt/_icl_examples/_types.py                                                5      5      0      0   0.00%   1-7
cli/decompose/prompt_modules/validation_decision/_prompt/_prompt.py                                                            11     11      0      0   0.00%   1-19
cli/decompose/prompt_modules/validation_decision/_validation_decision.py                                                       42     42      6      0   0.00%   1-128
cli/decompose/utils.py                                                                                                          6      6      2      0   0.00%   1-13
cli/eval/__init__.py                                                                                                            0      0      0      0 100.00%
cli/eval/commands.py                                                                                                            3      3      0      0   0.00%   5-50
cli/eval/runner.py                                                                                                            163    163     40      0   0.00%   1-353
cli/m.py                                                                                                                       12     12      0      0   0.00%   3-30
mellea/__init__.py                                                                                                              4      0      0      0 100.00%
mellea/backends/__init__.py                                                                                                     6      0      0      0 100.00%
mellea/backends/adapters/__init__.py                                                                                            2      0      0      0 100.00%
mellea/backends/adapters/adapter.py                                                                                            76     43     18      0  35.11%   28-37, 98-149, 167-178, 187, 199-200, 212
mellea/backends/adapters/catalog.py                                                                                            20      4      2      0  72.73%   78, 88-96
mellea/backends/backend.py                                                                                                     11      0      0      0 100.00%
mellea/backends/bedrock.py                                                                                                     42     42     10      0   0.00%   3-78
mellea/backends/cache.py                                                                                                       28      6      6      3  73.53%   42, 50-52, 58, 61
mellea/backends/dummy.py                                                                                                       15     15      4      0   0.00%   3-45
mellea/backends/huggingface.py                                                                                                484    484    132      0   0.00%   6-1314
mellea/backends/kv_block_helpers.py                                                                                            24     24      2      0   0.00%   3-47
mellea/backends/litellm.py                                                                                                    248    248     90      0   0.00%   3-677
mellea/backends/model_ids.py                                                                                                   38      0      0      0 100.00%
mellea/backends/model_options.py                                                                                               46      3     20      3  90.91%   77, 87-91, 113->116
mellea/backends/ollama.py                                                                                                     273    273    104      0   0.00%   3-684
mellea/backends/openai.py                                                                                                     387    387    132      0   0.00%   3-1037
mellea/backends/tools.py                                                                                                      249     87     94     21  59.77%   49-77, 100, 109-116, 132, 136, 142-143, 169-172, 183, 189->196, 194, 200-202, 212->210, 292-309, 320, 332, 349->356, 353, 360, 365, 371-405, 425-428, 443, 476-482, 499, 545, 553, 575, 579-580, 613-614
mellea/backends/utils.py                                                                                                       39      7     16      5  78.18%   47, 53-54, 67-70, 76, 84
mellea/backends/vllm.py                                                                                                       198     68     40     11  59.24%   27-28, 95-98, 131, 132->139, 166-198, 241->246, 295->311, 297, 333-345, 381-383, 388, 392->394, 410->413, 470-546, 573
mellea/backends/watsonx.py                                                                                                    232    232     72      0   0.00%   3-642
mellea/core/__init__.py                                                                                                         7      0      0      0 100.00%
mellea/core/backend.py                                                                                                         40     10     12      3  67.31%   102, 111-120, 127, 133
mellea/core/base.py                                                                                                           327    110     76     16  59.80%   39, 42->44, 54, 70-73, 78-102, 107-109, 116-117, 157-158, 220, 226, 232, 242-243, 246, 271-272, 275, 290, 296->308, 313->315, 317-327, 345-351, 360-363, 378-394, 399-423, 491, 499, 520-529, 545-548, 555-569, 660-665, 670-684
mellea/core/formatter.py                                                                                                        5      0      0      0 100.00%
mellea/core/requirement.py                                                                                                     61     32      6      0  43.28%   34-38, 43, 48, 53, 58, 62, 66, 75-84, 125-145, 154, 158-161, 170
mellea/core/sampling.py                                                                                                        39      6      8      4  78.72%   36, 38, 40, 42, 72, 77
mellea/core/utils.py                                                                                                           65     11     10      3  78.67%   24-35, 43-55, 104->111, 106
mellea/formatters/__init__.py                                                                                                   4      0      0      0 100.00%
mellea/formatters/chat_formatter.py                                                                                            27     11     12      4  51.28%   25-40, 44, 48->52, 53-54
mellea/formatters/template_formatter.py                                                                                       132     29     68     23  73.00%   70-71, 76, 84-87, 95, 102, 106-110, 131, 134-137, 140->145, 143, 151->186, 155, 159-160, 162->183, 168-173, 183->151, 187, 199-207, 209->211, 234->243, 248->243, 257, 266->264, 269, 283
mellea/helpers/__init__.py                                                                                                      5      0      0      0 100.00%
mellea/helpers/async_helpers.py                                                                                                47     21     14      2  49.18%   17, 27, 35-36, 45-49, 73-74, 78, 82-88, 92-99
mellea/helpers/event_loop_helper.py                                                                                            34     14      6      1  52.50%   30, 34-54, 60
mellea/helpers/openai_compatible_helpers.py                                                                                    88     77     46      0   8.21%   17-42, 54-122, 127-138, 159-172
mellea/helpers/server_type.py                                                                                                  19     10      4      0  39.13%   19-28
mellea/stdlib/__init__.py                                                                                                       0      0      0      0 100.00%
mellea/stdlib/components/__init__.py                                                                                            8      0      0      0 100.00%
mellea/stdlib/components/chat.py                                                                                               70     50     24      0  21.28%   52-54, 58-63, 71, 97-144, 169-173, 177-181, 193-212
mellea/stdlib/components/docs/__init__.py                                                                                       2      0      0      0 100.00%
mellea/stdlib/components/docs/document.py                                                                                      16     11      4      0  25.00%   12-14, 25-32, 36
mellea/stdlib/components/docs/richdocument.py                                                                                  84     84      6      0   0.00%   3-192
mellea/stdlib/components/genslot.py                                                                                           236    163     66      0  24.17%   89, 100-109, 117, 120, 123, 126, 141-142, 150-151, 163, 183-193, 209-222, 272-291, 310-386, 390-394, 398, 417-423, 473-554, 606-694, 828-831
mellea/stdlib/components/instruction.py                                                                                        66     28     18      2  47.62%   52-107, 135, 173-174, 183-185
mellea/stdlib/components/intrinsic/__init__.py                                                                                  2      0      0      0 100.00%
mellea/stdlib/components/intrinsic/intrinsic.py                                                                                14      7      2      0  43.75%   29-32, 37, 50, 66
mellea/stdlib/components/intrinsic/rag.py                                                                                      46     46      8      0   0.00%   3-313
mellea/stdlib/components/mify.py                                                                                              125     94     56      0  17.13%   44, 54, 64, 73-75, 88-123, 138-168, 180-193, 210, 217-220, 311-315, 335-392, 399-407, 415-433
mellea/stdlib/components/mobject.py                                                                                            60     28      4      0  50.00%   23-24, 28, 32-33, 54, 67-68, 72, 76-77, 98, 166-167, 171, 179, 187, 195, 202-218, 227-230, 240
mellea/stdlib/components/react.py                                                                                              33     33      2      0   0.00%   3-96
mellea/stdlib/components/simple.py                                                                                             33     23     12      0  22.22%   11-15, 19, 22-28, 33, 40-49, 53, 57
mellea/stdlib/components/unit_test_eval.py                                                                                     71     71     14      0   0.00%   3-148
mellea/stdlib/context.py                                                                                                       17      1      0      0  94.12%   37
mellea/stdlib/frameworks/__init__.py                                                                                            0      0      0      0 100.00%
mellea/stdlib/frameworks/react.py                                                                                              39     39     14      0   0.00%   4-121
mellea/stdlib/functional.py                                                                                                   205    132     78     10  30.04%   238-259, 277-291, 318-333, 360-414, 489, 498, 504-521, 554->558, 573-575, 578-581, 668-686, 712-733, 753, 759, 766-769, 772-788, 817-832, 859-913, 921-937, 948-975
mellea/stdlib/requirements/__init__.py                                                                                          5      0      0      0 100.00%
mellea/stdlib/requirements/md.py                                                                                               45     36     10      0  16.36%   17-21, 29-46, 50, 65-77
mellea/stdlib/requirements/python_reqs.py                                                                                      65     57     30      0   8.42%   32-58, 63-98, 118-137, 160-193
mellea/stdlib/requirements/requirement.py                                                                                      45     32     14      0  22.03%   23-36, 50-59, 71-76, 81, 86, 121-147
mellea/stdlib/requirements/safety/__init__.py                                                                                   0      0      0      0 100.00%
mellea/stdlib/requirements/safety/guardian.py                                                                                 157    157     74      0   0.00%   3-355
mellea/stdlib/requirements/tool_reqs.py                                                                                        44     40     24      0   5.88%   9-15, 27-36, 65-109
mellea/stdlib/sampling/__init__.py                                                                                              3      0      0      0 100.00%
mellea/stdlib/sampling/base.py                                                                                                 98     31     16      5  63.16%   143, 144->146, 160, 220-249, 278, 300, 322, 344-365, 387, 409-424
mellea/stdlib/sampling/budget_forcing.py                                                                                       71     71     10      0   0.00%   3-249
mellea/stdlib/sampling/majority_voting.py                                                                                      86     86     22      0   0.00%   3-292
mellea/stdlib/sampling/sampling_algos/__init__.py                                                                               2      2      0      0   0.00%   3-5
mellea/stdlib/sampling/sampling_algos/budget_forcing_alg.py                                                                    76     76     28      0   0.00%   3-181
mellea/stdlib/sampling/sofai.py                                                                                               200    174     72      0   9.56%   82-103, 128-151, 171, 186-196, 201-224, 236-244, 257-263, 281-345, 360-366, 389-408, 436-493, 522-550, 603-768
mellea/stdlib/session.py                                                                                                      169    117     40      1  25.36%   47-52, 57-96, 156-198, 233-239, 243-249, 253-257, 282, 290-292, 345-363, 456-457, 476-489, 502, 534-544, 567-576, 629-647, 721-746, 760-773, 786, 818-828, 851-860, 865-866, 877-887
mellea/stdlib/tools/__init__.py                                                                                                 2      0      0      0 100.00%
mellea/stdlib/tools/interpreter.py                                                                                            106     72     30      0  25.00%   53-65, 77, 89-112, 127-138, 141-179, 187-221, 232-255, 260, 269-270, 279-280
mellea/telemetry/__init__.py                                                                                                   98     62     42      3  27.86%   31-35, 56-78, 87-90, 95, 100, 115-119, 137-147, 163-172, 181-182, 193-208, 220, 230-232
mellea/telemetry/backend_instrumentation.py                                                                                    90     90     44      0   0.00%   7-270
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                        7032   5388   1912    120  20.59%
Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
======================== 1 passed, 1 warning in 41.52s =========================
✓ Module passed: /proj/dmfexp/eiger/users/jonesn/mellea/test/backends/test_vllm_tools.py

======================================================================
Failed modules (1):
  /proj/dmfexp/eiger/users/jonesn/mellea/test/backends/test_vllm.py:
    - test/backends/test_vllm.py::test_system_prompt
======================================================================


=============================== Skipped Examples ===============================
Examples with the following names were skipped because they cannot be easily run in the pytest framework; please run them manually:
pii_serve.py
simple_rag_with_filter.py
python_decompose_result.py
m_decomp_result.py
mellea_pdf.py
mcp_example.py
__init__.py
client.py
================ 1 skipped, 371 deselected in 107.18s (0:01:47) ================
!!!! _pytest.outcomes.Exit: Heavy GPU tests completed in isolated processes !!!!

------------------------------------------------------------
Sender: LSF System <lsfadmin@p3-r31-n2>
Subject: Job 456426: <___vllm_tests__marker_> in cluster <BLUEVELA_LSF> Exited

Job <___vllm_tests__marker_> was submitted from host <login3> by user <jonesn> in cluster <BLUEVELA_LSF> at Wed Feb 11 10:17:11 2026
Job was executed on host(s) <p3-r31-n2>, in queue <normal>, as user <jonesn> in cluster <BLUEVELA_LSF> at Wed Feb 11 10:17:14 2026
</u/jonesn> was used as the home directory.
</proj/dmfexp/eiger/users/jonesn/mellea> was used as the working directory.
Started at Wed Feb 11 10:17:14 2026
Terminated at Wed Feb 11 10:19:09 2026
Results reported at Wed Feb 11 10:19:09 2026

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
export VLLM_USE_V1=0 && export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True && source /opt/share/miniforge/etc/profile.d/conda.sh && conda activate mellea && pytest -m vllm --no-cov -v
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   233.00 sec.
    Max Memory :                                 2339 MB
    Average Memory :                             1345.74 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              5
    Max Threads :                                475
    Run time :                                   115 sec.
    Turnaround time :                            118 sec.

The output (if any) is above this job summary.



PS:

Read file </proj/dmfexp/eiger/users/jonesn/mellea/logs/___vllm_tests__marker__456426.stderr> for stderr output of this job.



=== STDERR ===


Basically an extra summary

======================== 1 passed, 1 warning in 41.52s =========================
✓ Module passed: /proj/dmfexp/eiger/users/jonesn/mellea/test/backends/test_vllm_tools.py

======================================================================
Failed modules (1):
  /proj/dmfexp/eiger/users/jonesn/mellea/test/backends/test_vllm.py:
    - test/backends/test_vllm.py::test_system_prompt
======================================================================

@jakelorocco Is that a help? It only affects the cuda/high mem tests (or when a batch includes them)

- Add pytest skip mechanism with capability detection and CLI options
- Implement process isolation for GPU-intensive vLLM tests
- Enhance test configuration with safe option registration
- Fix vLLM structured output token limits and update documentation
- Extract duplicated cleanup code to cleanup_vllm_backend() in conftest.py
- Add NCCL process group cleanup to suppress warnings
- Add Mistral tokenizer mode to suppress FutureWarning
- Simplify comments for conciseness
The tokenizer_mode='mistral' parameter caused a mismatch between vLLM's
internal tokenizer (with Mistral mode) and the backend's separate tokenizer
(without Mistral mode). This resulted in malformed tool call formatting.

Reverts to default tokenizer mode to restore tool calling functionality.
Parse pytest output to show which tests failed across isolated modules.
Addresses review feedback on test failure visibility.
@guicho271828
Copy link
Contributor

guicho271828 commented Feb 11, 2026

What do you think about using pytest-xdist for process-level isolation, rather than making a new one?
It runs each test in a subprocess.
https://pytest-xdist.readthedocs.io/en/stable/

@guicho271828
Copy link
Contributor

Actually, it seems pytest-xdist may reuse the subprocess across tests.
Instead pytest-forked may launch/shutdown each test in a forked subprocess.
https://github.com/pytest-dev/pytest-forked

If it is an overkill to run all tests in subprocesses, they provide a marker like this:

@pytest.mark.forked
def test_with_leaky_state():
    run_some_monkey_patches()

So we should mark vllm tests with it

@jakelorocco
Copy link
Contributor

What do you think about using pytest-xdist for process-level isolation, rather than making a new one? It runs each test in a subprocess. https://pytest-xdist.readthedocs.io/en/stable/

I think the current output looks clean, especially with the summary. If we can get pytest-xdist working, I'd be fine with that as well; however, I believe there were issues last time that was attempted (even when parallel processes was set to 1).

@guicho271828
Copy link
Contributor

however, I believe there were issues last time that was attempted (even when parallel processes was set to 1).

I hope it is because the processes are reused and pytest-forked does not have this issue

@guicho271828
Copy link
Contributor

Agreed this commit can be merged, and later, if time allows, a PR refactoring this into pytest-forked can be made.

@planetf1
Copy link
Contributor Author

I'd be good to merge, then we can iterate if we think there's a better option?
If someone would like to approve ;-)

Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; future improvements can be discussed / created in new issues

@planetf1 planetf1 added this pull request to the merge queue Feb 11, 2026
Merged via the queue into generative-computing:main with commit 66a90c7 Feb 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vllm test cleanup

4 participants