Skip to content

Conversation

DipayanDasgupta
Copy link
Contributor

@DipayanDasgupta DipayanDasgupta commented Oct 3, 2025

Description

This PR significantly increases the test coverage for two core modules, ShortTermMemory and LLMAgent, to improve the overall quality and robustness of the codebase.

During development, this effort also uncovered and fixed several underlying issues in the test environment, leading to a more stable and reliable test suite for future contributions.

Key Contributions

  • Increased st_memory.py Test Coverage: Boosted coverage from 37% to 80%.
  • Increased llm_agent.py Test Coverage: Increased coverage from 63% to 67%.
  • Resolved Test Failures: Fixed persistent AttributeError and environmental ModuleNotFoundError issues.
  • Improved Test Robustness: Refactored test setup to be more consistent and resilient to environmental problems.

Detailed Breakdown: Before and After

1. mesa_llm/memory/st_memory.py

  • Before: Test coverage was ~37%. The module's core logic was largely untested.
  • After: Test coverage is now 80%.

Changes Made:
A new test file (tests/test_memory/test_st_memory.py) was created to add comprehensive tests for:

  • Correct initialization of the memory deque.
  • The two-stage process_step logic for pre- and post-step execution.
  • Correct formatting of an empty memory via format_short_term.
  • Extraction of conversation history via get_communication_history.

2. mesa_llm/llm_agent.py

  • Before: Test coverage was ~63%. Key conditional branches in the generate_obs method were not covered.
  • After: Test coverage is now 67%.

Changes Made:
Added two new test functions to tests/test_llm_agent.py to verify agent perception:

  • test_generate_obs_zero_vision: Confirms that an agent with vision=0 sees no neighbors.
  • test_generate_obs_limited_vision: Confirms that an agent with vision=1 only sees adjacent neighbors and ignores those outside its radius.

Technical Fixes and Test Suite Improvements

During development, the test suite was failing due to two primary issues:

  1. AttributeError: 'DummyModel' object has no attribute 'next_id':

    • Diagnosis: The DummyModel used in tests was calling self.next_id() instead of self.schedule.next_id().
    • Fix: The test setup was corrected to call next_id() on the scheduler object, aligning with Mesa's design.
  2. ModuleNotFoundError: No module named 'mesa.time':

    • Diagnosis: After extensive debugging, it was determined that the local pip installation of mesa==3.3.0 was corrupted and missing the time.py module. This was an environmental issue, not a code issue.
    • Fix: To make the test suite robust and resilient to such environmental problems, the dependency on mesa.time was removed from the test file. A MockScheduler class was introduced directly within tests/test_llm_agent.py to provide the necessary next_id() and add() methods for the test models. This ensures the tests are self-contained and reliable.

How to Verify

  1. Check out this branch.
  2. Ensure development dependencies are installed with pip install -e ".[dev]".
  3. Run the test suite:
    pytest --cov=mesa_llm tests/
  4. Confirm that all 161 tests pass and that the coverage report reflects the improvements noted above.

Summary by CodeRabbit

  • Tests
    • Standardized agent test setup to improve consistency and reduce duplication.
    • Updated agent behavior tests to use dynamic identifiers, larger grid scenarios, and vision edge cases, with refined memory assertions.
    • Added comprehensive Short-Term Memory tests covering initialization, step processing (pre/post), formatting of empty memory, and communication history extraction.
    • Simplified test flows and improved clarity by consolidating shared utilities and expectations.
  • Refactor
    • Streamlined test logic and data setup for maintainability and clearer intent, reducing reliance on implicit behaviors.

Increases coverage for st_memory.py to 80% and llm_agent.py to 67%. Also fixes test setup inconsistencies and resolves environmental import errors.
Copy link

coderabbitai bot commented Oct 3, 2025

Walkthrough

Standardizes LLMAgent tests with a MockScheduler and create_dummy_model, updates ID-dependent assertions, and adds edge-case vision tests. Introduces a new ShortTermMemory test module validating initialization, step processing (pre/post), formatting, and communication history extraction.

Changes

Cohort / File(s) Summary of Changes
LLMAgent test refactor and vision edge cases
tests/test_llm_agent.py
Replaces ad-hoc models with create_dummy_model and MockScheduler; assigns dynamic unique_id via schedule.next_id(); updates grid to 5x5 and fixed system_prompt; revises memory setup and assertions to use unique_id; adds zero/limited vision tests; consolidates test flows.
ShortTermMemory tests
tests/test_memory/test_st_memory.py
New tests for ShortTermMemory: initialization invariants, pre/post process_step behavior, empty formatting, and communication history extraction using a mock_agent.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester
  participant Agent
  participant ShortTermMemory as STM

  rect rgb(240,248,255)
    note over Tester,STM: Pre-step (observation)
    Tester->>STM: set step_content = {"observation": "..."}
    Tester->>STM: process_step(pre_step=True)
    STM->>STM: append MemoryEntry(step=None, content={"observation": "..."})
    STM->>STM: clear step_content
  end

  rect rgb(245,255,240)
    note over Tester,STM: Post-step (action)
    Tester->>STM: set step_content = {"action": "..."}
    Tester->>STM: process_step(pre_step=False)
    STM->>STM: merge into last entry<br/>step=1, content={"observation": "...", "action": "..."}
    STM->>STM: clear step_content
  end

  Tester->>STM: format_short_term() / get_communication_history()
  STM-->>Tester: formatted text / messages (step N: ...)
Loading
sequenceDiagram
  autonumber
  participant Test as Test Suite
  participant Model as DummyModel
  participant Sched as MockScheduler
  participant Agent as LLMAgent

  Test->>Model: create_dummy_model(seed)
  Test->>Model: add_agent(Agent)
  Model->>Sched: next_id()
  Sched-->>Model: unique_id
  Model->>Agent: set unique_id
  Model->>Sched: register(agent)
  Model->>Model: place on 5x5 grid
  Test->>Agent: run assertions (IDs, memory, vision)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I thump my paws on grids so wide,
A scheduler counts—new IDs applied.
Short-term whispers, step by step,
Obs, then act, in tidy prep.
With vision small or vision keen,
I map the warren, crisp and clean.
Hop! Tests pass—carrots green.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely summarizes the main purpose of the PR, indicating that tests are being added for the LLMAgent and ShortTermMemory modules, which matches the changeset precisely.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)

✅ Unit Test PR creation complete.

  • Create PR with unit tests
  • Commit unit tests in branch feature/improve-test-coverage
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae0877e and 83e1e5a.

📒 Files selected for processing (2)
  • tests/test_llm_agent.py (1 hunks)
  • tests/test_memory/test_st_memory.py (1 hunks)
🔇 Additional comments (11)
tests/test_llm_agent.py (7)

14-24: LGTM!

The MockScheduler correctly implements next_id() with pre-increment (first ID is 1), and provides simple agent tracking appropriate for test isolation.


28-51: LGTM!

The helper function correctly standardizes test setup and implements the fix for the test environment issue (calling schedule.next_id() instead of self.next_id()). This eliminates test duplication and ensures consistent agent initialization.


54-69: LGTM!

The test correctly verifies that apply_plan adds tool execution results to the agent's memory, using appropriate mocking and assertions.


72-86: LGTM!

The test correctly validates observation generation with dynamic unique_id values from the MockScheduler, ensuring the test adapts to the new ID assignment scheme.


89-106: LGTM!

The test correctly verifies that send_message updates memory for both sender and recipient agents using an appropriate call counter approach.


109-119: LGTM!

The new test correctly validates the edge case where vision=0 results in an empty local_state, ensuring agents with no vision cannot observe neighbors.


122-137: LGTM!

The new test correctly validates that limited vision (vision=1) allows observing only adjacent agents, properly excluding distant agents. This provides good edge case coverage for the vision mechanism.

tests/test_memory/test_st_memory.py (4)

12-19: LGTM!

The test correctly verifies ShortTermMemory initialization, including the important detail that the deque is unbounded (maxlen=None).


21-47: LGTM!

The test comprehensively validates the two-phase memory processing logic, correctly verifying that pre_step creates an entry with step=None, post_step updates it with the actual step number, and content is properly merged between phases.


49-52: LGTM!

The test correctly validates the edge case of formatting empty memory, ensuring it returns the expected default message.


54-71: LGTM!

The test correctly validates that get_communication_history filters memory entries to include only messages while excluding other content types like actions.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DipayanDasgupta
Copy link
Contributor Author

@coderabbitai review

Copy link

coderabbitai bot commented Oct 3, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

codecov bot commented Oct 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.27%. Comparing base (ae0877e) to head (83e1e5a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #20      +/-   ##
==========================================
+ Coverage   85.81%   87.27%   +1.45%     
==========================================
  Files          17       17              
  Lines        1234     1234              
==========================================
+ Hits         1059     1077      +18     
+ Misses        175      157      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

coderabbitai bot commented Oct 3, 2025

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

1 similar comment
Copy link

coderabbitai bot commented Oct 3, 2025

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

Copy link

coderabbitai bot commented Oct 3, 2025

Caution

CodeRabbit failed during planning: Script execution failed: Stream setup permanently failed: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant