-
-
Notifications
You must be signed in to change notification settings - Fork 6
feat(testing): Add tests for LLMAgent and ShortTermMemory #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(testing): Add tests for LLMAgent and ShortTermMemory #20
Conversation
Increases coverage for st_memory.py to 80% and llm_agent.py to 67%. Also fixes test setup inconsistencies and resolves environmental import errors.
WalkthroughStandardizes LLMAgent tests with a MockScheduler and create_dummy_model, updates ID-dependent assertions, and adds edge-case vision tests. Introduces a new ShortTermMemory test module validating initialization, step processing (pre/post), formatting, and communication history extraction. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Tester
participant Agent
participant ShortTermMemory as STM
rect rgb(240,248,255)
note over Tester,STM: Pre-step (observation)
Tester->>STM: set step_content = {"observation": "..."}
Tester->>STM: process_step(pre_step=True)
STM->>STM: append MemoryEntry(step=None, content={"observation": "..."})
STM->>STM: clear step_content
end
rect rgb(245,255,240)
note over Tester,STM: Post-step (action)
Tester->>STM: set step_content = {"action": "..."}
Tester->>STM: process_step(pre_step=False)
STM->>STM: merge into last entry<br/>step=1, content={"observation": "...", "action": "..."}
STM->>STM: clear step_content
end
Tester->>STM: format_short_term() / get_communication_history()
STM-->>Tester: formatted text / messages (step N: ...)
sequenceDiagram
autonumber
participant Test as Test Suite
participant Model as DummyModel
participant Sched as MockScheduler
participant Agent as LLMAgent
Test->>Model: create_dummy_model(seed)
Test->>Model: add_agent(Agent)
Model->>Sched: next_id()
Sched-->>Model: unique_id
Model->>Agent: set unique_id
Model->>Sched: register(agent)
Model->>Model: place on 5x5 grid
Test->>Agent: run assertions (IDs, memory, vision)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)✅ Unit Test PR creation complete.
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🔇 Additional comments (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
@coderabbitai review |
✅ Actions performedReview triggered.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20 +/- ##
==========================================
+ Coverage 85.81% 87.27% +1.45%
==========================================
Files 17 17
Lines 1234 1234
==========================================
+ Hits 1059 1077 +18
+ Misses 175 157 -18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Note Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
1 similar comment
Note Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
Caution CodeRabbit failed during planning: Script execution failed: Stream setup permanently failed: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error) |
Description
This PR significantly increases the test coverage for two core modules,
ShortTermMemory
andLLMAgent
, to improve the overall quality and robustness of the codebase.During development, this effort also uncovered and fixed several underlying issues in the test environment, leading to a more stable and reliable test suite for future contributions.
Key Contributions
st_memory.py
Test Coverage: Boosted coverage from 37% to 80%.llm_agent.py
Test Coverage: Increased coverage from 63% to 67%.AttributeError
and environmentalModuleNotFoundError
issues.Detailed Breakdown: Before and After
1.
mesa_llm/memory/st_memory.py
Changes Made:
A new test file (
tests/test_memory/test_st_memory.py
) was created to add comprehensive tests for:process_step
logic for pre- and post-step execution.format_short_term
.get_communication_history
.2.
mesa_llm/llm_agent.py
generate_obs
method were not covered.Changes Made:
Added two new test functions to
tests/test_llm_agent.py
to verify agent perception:test_generate_obs_zero_vision
: Confirms that an agent withvision=0
sees no neighbors.test_generate_obs_limited_vision
: Confirms that an agent withvision=1
only sees adjacent neighbors and ignores those outside its radius.Technical Fixes and Test Suite Improvements
During development, the test suite was failing due to two primary issues:
AttributeError: 'DummyModel' object has no attribute 'next_id'
:DummyModel
used in tests was callingself.next_id()
instead ofself.schedule.next_id()
.next_id()
on the scheduler object, aligning with Mesa's design.ModuleNotFoundError: No module named 'mesa.time'
:pip
installation ofmesa==3.3.0
was corrupted and missing thetime.py
module. This was an environmental issue, not a code issue.mesa.time
was removed from the test file. AMockScheduler
class was introduced directly withintests/test_llm_agent.py
to provide the necessarynext_id()
andadd()
methods for the test models. This ensures the tests are self-contained and reliable.How to Verify
pip install -e ".[dev]"
.Summary by CodeRabbit