feat(testing): Add tests for LLMAgent and ShortTermMemory #20

DipayanDasgupta · 2025-10-03T08:26:28Z

Description

This PR significantly increases the test coverage for two core modules, ShortTermMemory and LLMAgent, to improve the overall quality and robustness of the codebase.

During development, this effort also uncovered and fixed several underlying issues in the test environment, leading to a more stable and reliable test suite for future contributions.

Key Contributions

✅ Increased st_memory.py Test Coverage: Boosted coverage from 37% to 80%.
✅ Increased llm_agent.py Test Coverage: Increased coverage from 63% to 67%.
✅ Resolved Test Failures: Fixed persistent AttributeError and environmental ModuleNotFoundError issues.
✅ Improved Test Robustness: Refactored test setup to be more consistent and resilient to environmental problems.

Detailed Breakdown: Before and After

1. `mesa_llm/memory/st_memory.py`

Before: Test coverage was ~37%. The module's core logic was largely untested.
After: Test coverage is now 80%.

Changes Made:
A new test file (tests/test_memory/test_st_memory.py) was created to add comprehensive tests for:

Correct initialization of the memory deque.
The two-stage process_step logic for pre- and post-step execution.
Correct formatting of an empty memory via format_short_term.
Extraction of conversation history via get_communication_history.

2. `mesa_llm/llm_agent.py`

Before: Test coverage was ~63%. Key conditional branches in the generate_obs method were not covered.
After: Test coverage is now 67%.

Changes Made:
Added two new test functions to tests/test_llm_agent.py to verify agent perception:

test_generate_obs_zero_vision: Confirms that an agent with vision=0 sees no neighbors.
test_generate_obs_limited_vision: Confirms that an agent with vision=1 only sees adjacent neighbors and ignores those outside its radius.

Technical Fixes and Test Suite Improvements

During development, the test suite was failing due to two primary issues:

AttributeError: 'DummyModel' object has no attribute 'next_id':
- Diagnosis: The DummyModel used in tests was calling self.next_id() instead of self.schedule.next_id().
- Fix: The test setup was corrected to call next_id() on the scheduler object, aligning with Mesa's design.
ModuleNotFoundError: No module named 'mesa.time':
- Diagnosis: After extensive debugging, it was determined that the local pip installation of mesa==3.3.0 was corrupted and missing the time.py module. This was an environmental issue, not a code issue.
- Fix: To make the test suite robust and resilient to such environmental problems, the dependency on mesa.time was removed from the test file. A MockScheduler class was introduced directly within tests/test_llm_agent.py to provide the necessary next_id() and add() methods for the test models. This ensures the tests are self-contained and reliable.

How to Verify

Check out this branch.
Ensure development dependencies are installed with pip install -e ".[dev]".
Run the test suite:
```
pytest --cov=mesa_llm tests/
```
Confirm that all 161 tests pass and that the coverage report reflects the improvements noted above.

Summary by CodeRabbit

Tests
- Standardized agent test setup to improve consistency and reduce duplication.
- Updated agent behavior tests to use dynamic identifiers, larger grid scenarios, and vision edge cases, with refined memory assertions.
- Added comprehensive Short-Term Memory tests covering initialization, step processing (pre/post), formatting of empty memory, and communication history extraction.
- Simplified test flows and improved clarity by consolidating shared utilities and expectations.
Refactor
- Streamlined test logic and data setup for maintainability and clearer intent, reducing reliance on implicit behaviors.

Increases coverage for st_memory.py to 80% and llm_agent.py to 67%. Also fixes test setup inconsistencies and resolves environmental import errors.

coderabbitai · 2025-10-03T08:26:36Z

Walkthrough

Standardizes LLMAgent tests with a MockScheduler and create_dummy_model, updates ID-dependent assertions, and adds edge-case vision tests. Introduces a new ShortTermMemory test module validating initialization, step processing (pre/post), formatting, and communication history extraction.

Changes

Cohort / File(s)	Summary of Changes
LLMAgent test refactor and vision edge cases `tests/test_llm_agent.py`	Replaces ad-hoc models with create_dummy_model and MockScheduler; assigns dynamic unique_id via schedule.next_id(); updates grid to 5x5 and fixed system_prompt; revises memory setup and assertions to use unique_id; adds zero/limited vision tests; consolidates test flows.
ShortTermMemory tests `tests/test_memory/test_st_memory.py`	New tests for ShortTermMemory: initialization invariants, pre/post process_step behavior, empty formatting, and communication history extraction using a mock_agent.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester
  participant Agent
  participant ShortTermMemory as STM

  rect rgb(240,248,255)
    note over Tester,STM: Pre-step (observation)
    Tester->>STM: set step_content = {"observation": "..."}
    Tester->>STM: process_step(pre_step=True)
    STM->>STM: append MemoryEntry(step=None, content={"observation": "..."})
    STM->>STM: clear step_content
  end

  rect rgb(245,255,240)
    note over Tester,STM: Post-step (action)
    Tester->>STM: set step_content = {"action": "..."}
    Tester->>STM: process_step(pre_step=False)
    STM->>STM: merge into last entry<br/>step=1, content={"observation": "...", "action": "..."}
    STM->>STM: clear step_content
  end

  Tester->>STM: format_short_term() / get_communication_history()
  STM-->>Tester: formatted text / messages (step N: ...)

sequenceDiagram
  autonumber
  participant Test as Test Suite
  participant Model as DummyModel
  participant Sched as MockScheduler
  participant Agent as LLMAgent

  Test->>Model: create_dummy_model(seed)
  Test->>Model: add_agent(Agent)
  Model->>Sched: next_id()
  Sched-->>Model: unique_id
  Model->>Agent: set unique_id
  Model->>Sched: register(agent)
  Model->>Model: place on 5x5 grid
  Test->>Agent: run assertions (IDs, memory, vision)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I thump my paws on grids so wide,
A scheduler counts—new IDs applied.
Short-term whispers, step by step,
Obs, then act, in tidy prep.
With vision small or vision keen,
I map the warren, crisp and clean.
Hop! Tests pass—carrots green.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and concisely summarizes the main purpose of the PR, indicating that tests are being added for the LLMAgent and ShortTermMemory modules, which matches the changeset precisely.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

✅ Unit Test PR creation complete.

Create PR with unit tests
Commit unit tests in branch feature/improve-test-coverage
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae0877e and 83e1e5a.

📒 Files selected for processing (2)

tests/test_llm_agent.py (1 hunks)
tests/test_memory/test_st_memory.py (1 hunks)

🔇 Additional comments (11)

tests/test_llm_agent.py (7)

14-24: LGTM!

The MockScheduler correctly implements next_id() with pre-increment (first ID is 1), and provides simple agent tracking appropriate for test isolation.

28-51: LGTM!

The helper function correctly standardizes test setup and implements the fix for the test environment issue (calling schedule.next_id() instead of self.next_id()). This eliminates test duplication and ensures consistent agent initialization.

54-69: LGTM!

The test correctly verifies that apply_plan adds tool execution results to the agent's memory, using appropriate mocking and assertions.

72-86: LGTM!

The test correctly validates observation generation with dynamic unique_id values from the MockScheduler, ensuring the test adapts to the new ID assignment scheme.

89-106: LGTM!

The test correctly verifies that send_message updates memory for both sender and recipient agents using an appropriate call counter approach.

109-119: LGTM!

The new test correctly validates the edge case where vision=0 results in an empty local_state, ensuring agents with no vision cannot observe neighbors.

122-137: LGTM!

The new test correctly validates that limited vision (vision=1) allows observing only adjacent agents, properly excluding distant agents. This provides good edge case coverage for the vision mechanism.

tests/test_memory/test_st_memory.py (4)

12-19: LGTM!

The test correctly verifies ShortTermMemory initialization, including the important detail that the deque is unbounded (maxlen=None).

21-47: LGTM!

The test comprehensively validates the two-phase memory processing logic, correctly verifying that pre_step creates an entry with step=None, post_step updates it with the actual step number, and content is properly merged between phases.

49-52: LGTM!

The test correctly validates the edge case of formatting empty memory, ensuring it returns the expected default message.

54-71: LGTM!

The test correctly validates that get_communication_history filters memory entries to include only messages while excluding other content types like actions.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

DipayanDasgupta · 2025-10-03T08:30:04Z

@coderabbitai review

coderabbitai · 2025-10-03T08:30:11Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

codecov · 2025-10-03T08:53:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.27%. Comparing base (ae0877e) to head (83e1e5a).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #20      +/-   ##
==========================================
+ Coverage   85.81%   87.27%   +1.45%     
==========================================
  Files          17       17              
  Lines        1234     1234              
==========================================
+ Hits         1059     1077      +18     
+ Misses        175      157      -18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai · 2025-10-03T08:57:55Z

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Generating unit tests... This may take up to 20 minutes.

coderabbitai · 2025-10-03T09:27:55Z

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Generating unit tests... This may take up to 20 minutes.

coderabbitai · 2025-10-03T09:50:51Z

Caution

CodeRabbit failed during planning: Script execution failed: Stream setup permanently failed: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)

feat(testing): Add tests for LLMAgent and ShortTermMemory

83e1e5a

Increases coverage for st_memory.py to 80% and llm_agent.py to 67%. Also fixes test setup inconsistencies and resolves environmental import errors.

colinfrisch requested review from colinfrisch and sanika-n October 3, 2025 08:52

coderabbitai bot mentioned this pull request Oct 3, 2025

CodeRabbit Generated Unit Tests: Add tests for LLM agent behaviors and expand ShortTermMemory tests #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(testing): Add tests for LLMAgent and ShortTermMemory #20

feat(testing): Add tests for LLMAgent and ShortTermMemory #20

Uh oh!

DipayanDasgupta commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

DipayanDasgupta commented Oct 3, 2025

Uh oh!

coderabbitai bot commented Oct 3, 2025

Uh oh!

codecov bot commented Oct 3, 2025

Uh oh!

coderabbitai bot commented Oct 3, 2025

Uh oh!

coderabbitai bot commented Oct 3, 2025

Uh oh!

coderabbitai bot commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

feat(testing): Add tests for LLMAgent and ShortTermMemory #20

Are you sure you want to change the base?

feat(testing): Add tests for LLMAgent and ShortTermMemory #20

Uh oh!

Conversation

DipayanDasgupta commented Oct 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Contributions

Detailed Breakdown: Before and After

1. mesa_llm/memory/st_memory.py

2. mesa_llm/llm_agent.py

Technical Fixes and Test Suite Improvements

How to Verify

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

DipayanDasgupta commented Oct 3, 2025

Uh oh!

coderabbitai bot commented Oct 3, 2025

Uh oh!

codecov bot commented Oct 3, 2025

Codecov Report

Uh oh!

coderabbitai bot commented Oct 3, 2025

Uh oh!

coderabbitai bot commented Oct 3, 2025

Uh oh!

coderabbitai bot commented Oct 3, 2025

Uh oh!

Uh oh!

DipayanDasgupta commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

1. `mesa_llm/memory/st_memory.py`

2. `mesa_llm/llm_agent.py`

coderabbitai bot commented Oct 3, 2025 •

edited

Loading