-
Notifications
You must be signed in to change notification settings - Fork 3
feat(memory): implement scope and acceptance criteria #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add memory type taxonomy, isolation strategy, and acceptance metrics: - docs: memory_types.md, memory_isolation.md, P0 design doc - feat: MemoryMetricCollector for tracking 5 P0 metrics - feat: MemoryEvalMetricModel for persisting metrics - feat: P0 hooks in research routes (false_positive, deletion_compliance) - feat: /research/memory/feedback endpoint for retrieval_hit_rate - feat: /memory/metrics endpoints for viewing recorded metrics - feat: get_items_by_ids() method in memory_store - test: deletion_compliance and retrieval_hit_rate eval scripts - ci: add memory P0 acceptance tests to CI workflow P0 Acceptance Metrics: - extraction_precision: >= 85% - false_positive_rate: <= 5% - retrieval_hit_rate: >= 80% - injection_pollution_rate: <= 2% - deletion_compliance: 100% Generated with [LIU BOYU]<oor2020@163.com>
in collector Fix import error: DbProvider doesn't exist, use SessionProvider from sqlalchemy_db. 🤖 Generated with [LIU BOYU]<oor2020@163.com>
- Add python-multipart to requirements.txt and requirements-ci.txt
(required for FastAPI file uploads in /memory/ingest)
🤖 Generated with [LIU BOYU]<oor2020@163.com>
- memory.py: use MemoryMetricCollector() without args - collector.py: update docstring to show correct usage 🤖 Generated with [LIU BOYU]<oor2020@163.com>
🤖 Generated with [LIU BOYU]<oor2020@163.com>
Generated with [LIU BOYU]<oor2020@163.com>
Generated with [LIU BOYU]<oor2020@163.com>
Add: markdown, aiofiles, python-dotenv, json-repair, uvicorn Generated with [LIU BOYU]<oor2020@163.com>
Unit tests (tests/unit/test_memory_metric_collector.py): - Test all 5 metric recording methods - Test pass/fail target validation - Test metrics summary and history - Test edge cases (zero totals) Unit tests (tests/unit/test_memory_module.py): - Add test for get_items_by_ids method Integration tests (tests/integration/test_scope_and_acceptance_criteria_hooks.py): - Test memory feedback records retrieval_hit_rate - Test bulk_moderate rejection records false_positive_rate - Test clear_track_memory records deletion_compliance - Test /memory/metrics endpoints Generated with [LIU BOYU]<oor2020@163.com>
📝 WalkthroughWalkthroughThis PR introduces a comprehensive memory evaluation framework with metric collection infrastructure, evaluation tests, and API endpoints. It adds CI pipeline steps for acceptance testing, a new database model for storing metrics, metric recording hooks in API routes, and integration/unit tests validating the system. Changes
Sequence DiagramsequenceDiagram
actor User
participant API as API Route
participant Collector as MemoryMetricCollector
participant DB as Database
participant Client
User->>API: POST /research/memory/feedback
activate API
API->>Collector: record_retrieval_hit_rate(hits, expected)
activate Collector
Collector->>Collector: compute hit_rate, validate targets
Collector->>DB: INSERT MemoryEvalMetricModel
deactivate Collector
API-->>User: 200 MemoryFeedbackResponse
deactivate API
User->>API: GET /memory/metrics
activate API
API->>Collector: get_metrics_summary()
activate Collector
Collector->>DB: SELECT latest per metric_name
Collector->>Collector: evaluate each against TARGETS
Collector-->>API: {metrics, targets, pass/fail}
deactivate Collector
API-->>User: 200 MetricsSummaryResponse
deactivate API
User->>API: GET /memory/metrics/{metric_name}
activate API
API->>Collector: get_metric_history(metric_name, limit=30)
activate Collector
Collector->>DB: SELECT * WHERE metric_name ORDER BY evaluated_at DESC
Collector-->>API: [history items]
deactivate Collector
API-->>User: 200 MetricHistoryResponse
deactivate API
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @CJBshuosi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the memory subsystem by formalizing its operational boundaries and introducing a robust framework for quality assurance. It defines clear memory types and isolation strategies, alongside measurable acceptance criteria. The changes include the implementation of a metric collection system, new API endpoints for monitoring performance, and integrated hooks to automatically track key quality indicators. This foundational work ensures the memory system's reliability, accuracy, and compliance with defined standards, paving the way for future feature development with a strong emphasis on data integrity and user experience. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive memory scope and acceptance criteria feature, including documentation, metric collection, new API endpoints, and extensive tests. The implementation is solid and well-tested. I've identified a few areas for improvement, primarily concerning a logical error in the design documentation's pseudo-code, a discrepancy between the documented and implemented query logic, and opportunities to align with Python and FastAPI best practices in the test and API code.
| ```python | ||
| def resolve_visible_memories(user_id, workspace_id=None, scope_type=None, scope_id=None): | ||
| """ | ||
| Resolution order (most specific to least): | ||
| 1. Exact scope match (scope_type + scope_id) | ||
| 2. Global scope within workspace | ||
| 3. Global scope for user (if no workspace specified) | ||
| """ | ||
| base_filter = (user_id == user_id) | ||
|
|
||
| if workspace_id: | ||
| base_filter &= (workspace_id == workspace_id) | ||
|
|
||
| if scope_type and scope_id: | ||
| # Include both specific scope AND global | ||
| scope_filter = ( | ||
| (scope_type == scope_type AND scope_id == scope_id) | | ||
| (scope_type == 'global') | ||
| ) | ||
| else: | ||
| scope_filter = (scope_type == 'global') | ||
|
|
||
| return base_filter & scope_filter | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pseudo-code for resolve_visible_memories contains a logical error. The comparisons like user_id == user_id and scope_type == scope_type will always evaluate to true, as they compare a variable to itself. The intention is likely to compare database fields with the function's arguments. This could be misleading for developers implementing this logic.
Here is a corrected version assuming item is the object being filtered:
def resolve_visible_memories(user_id, workspace_id=None, scope_type=None, scope_id=None):
"""
Resolution order (most specific to least):
1. Exact scope match (scope_type + scope_id)
2. Global scope within workspace
3. Global scope for user (if no workspace specified)
"""
# Assuming 'item' is the memory item object being filtered
base_filter = (item.user_id == user_id)
if workspace_id:
base_filter &= (item.workspace_id == workspace_id)
if scope_type and scope_id:
# Include both specific scope AND global
scope_filter = (
(item.scope_type == scope_type and item.scope_id == scope_id) |
(item.scope_type == 'global')
)
else:
scope_filter = (item.scope_type == 'global')
return base_filter & scope_filter| **Implementation:** `memory_store.py` lines 296-301: | ||
| ```python | ||
| if scope_type is not None: | ||
| if scope_type == "global": | ||
| stmt = stmt.where(or_(MemoryItemModel.scope_type == scope_type, MemoryItemModel.scope_type.is_(None))) | ||
| else: | ||
| stmt = stmt.where(MemoryItemModel.scope_type == scope_type) | ||
| if scope_id is not None: | ||
| stmt = stmt.where(MemoryItemModel.scope_id == scope_id) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a discrepancy between the query logic described in the main design document (P0_memory_scope_acceptance_design.md) and the implementation snippet shown here. The design doc states that queries for a specific scope (e.g., track) should also include global items. However, the snippet here (and the actual implementation) only filters for the specific scope_type, excluding global items. This could lead to unexpected behavior where global memories are not retrieved when working within a specific track. The documentation and implementation should be aligned with the intended design.
| # Add src to path for imports | ||
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "src")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manipulating sys.path is generally considered an anti-pattern and can make the test suite fragile and harder to run in different environments. A better approach is to make your project installable in editable mode. By running pip install -e . from the project root, the src directory will be added to Python's path correctly, and you can use absolute imports like from paperbot.infrastructure.stores.memory_store import SqlAlchemyMemoryStore without modifying sys.path. This change should be applied to other new test files in this PR as well.
| _metric_collector: Optional[MemoryMetricCollector] = None | ||
|
|
||
|
|
||
| def _get_metric_collector() -> MemoryMetricCollector: | ||
| """Lazy initialization of metric collector.""" | ||
| global _metric_collector | ||
| if _metric_collector is None: | ||
| _metric_collector = MemoryMetricCollector() | ||
| return _metric_collector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a global, lazily-initialized _metric_collector is not ideal in FastAPI as it introduces global mutable state, which can be problematic for testing and concurrency. A more idiomatic FastAPI approach is to use the dependency injection system. This improves testability by allowing you to easily inject a mock collector during tests.
You could refactor this to:
from fastapi import Depends
# You can initialize it once at the module level
_metric_collector = MemoryMetricCollector()
def get_metric_collector() -> MemoryMetricCollector:
"""FastAPI dependency to get the metric collector instance."""
return _metric_collectorAnd then in your route functions, you would use Depends:
@router.post("/research/memory/bulk_moderate", response_model=BulkModerateResponse)
def bulk_moderate(
req: BulkModerateRequest,
background_tasks: BackgroundTasks,
collector: MemoryMetricCollector = Depends(get_metric_collector)
):
# ...
if high_confidence_rejected > 0:
collector.record_false_positive_rate(...)
# ...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/paperbot/api/routes/research.py (1)
374-407: Verifytotal_approved_countdenominator in false positive calculation.The metric is recorded with
total_approved_count=len(items_before), butitems_beforemay include items that were already rejected or pending (not just approved). This could skew the false positive rate calculation.Consider filtering to only count items that were previously approved:
🔎 Proposed fix
if req.status == "rejected" and items_before: + # Only count items that were previously approved as the denominator + previously_approved = [ + item for item in items_before + if item.get("status") == "approved" + ] high_confidence_rejected = sum( - 1 for item in items_before + 1 for item in previously_approved if item.get("confidence", 0) >= 0.60 and item.get("status") == "approved" ) if high_confidence_rejected > 0: collector = _get_metric_collector() collector.record_false_positive_rate( false_positive_count=high_confidence_rejected, - total_approved_count=len(items_before), + total_approved_count=len(previously_approved), evaluator_id=f"user:{req.user_id}",
🧹 Nitpick comments (12)
docs/memory_types.md (1)
64-87: Add language identifier to fenced code block.The status workflow diagram should specify a language identifier (e.g.,
textormermaid) for better Markdown rendering and to satisfy linter requirements.🔎 Proposed fix
-``` +```text Extracted (MemoryCandidate) │Alternatively, consider using Mermaid syntax for better visualization:
-``` +```mermaid +graph TD + A[Extracted MemoryCandidate] --> B{confidence >= 0.60 AND pii_risk < 2} + B -->|Yes| C[approved] + B -->|No| D[pending] + D --> E[approved] + D --> F[rejected] + C --> G[Can be retrieved and injected]tests/unit/test_memory_module.py (1)
143-143: Fix unused variable.The
rowsvariable is unpacked but never used in this test. Prefix it with an underscore to indicate it's intentionally unused.🔎 Proposed fix
- created, _, rows = store.add_memories(user_id="u1", memories=candidates) + created, _, _rows = store.add_memories(user_id="u1", memories=candidates)evals/memory/test_deletion_compliance.py (3)
18-19: Consider using a proper package structure or pytest configuration.The
sys.pathmanipulation is fragile; paths may break if the file is moved. For standalone scripts this is acceptable, but consider usingpytestwith properconftest.pyor installing the package in editable mode.
53-57: Prefix unused variables with underscore.Per the static analysis hint,
createdandskippedare never used.🔎 Proposed fix
- created, skipped, _rows = store.add_memories( + _created, _skipped, _rows = store.add_memories( user_id=user_id, memories=test_memories, actor_id="test", )
140-140: Remove extraneous f-prefix.Per static analysis, line 140 has an f-string without placeholders.
🔎 Proposed fix
- print(f"\nResults:") + print("\nResults:")src/paperbot/api/routes/memory.py (1)
270-270: Module-level singleton initialized at import time.
_metric_collector = MemoryMetricCollector()is instantiated when the module loads. This will create a database session provider and runcreate_all()immediately. If the database isn't available at import time, this could cause startup failures.Consider lazy initialization (similar to
_get_metric_collector()inresearch.py) for consistency and to avoid import-time side effects.🔎 Proposed lazy initialization
-_metric_collector = MemoryMetricCollector() +_metric_collector: Optional[MemoryMetricCollector] = None + + +def _get_metric_collector() -> MemoryMetricCollector: + global _metric_collector + if _metric_collector is None: + _metric_collector = MemoryMetricCollector() + return _metric_collectorThen update usages:
- return _metric_collector.get_metrics_summary() + return _get_metric_collector().get_metrics_summary()evals/memory/test_retrieval_hit_rate.py (3)
31-39: Consider adding error handling for missing fixture files.If the fixture files don't exist, the test will fail with a generic
FileNotFoundError. Providing a more descriptive error message would help debugging.🔎 Proposed improvement
def load_fixtures(): """Load test fixtures.""" - with open(FIXTURES_DIR / "sample_memories.json") as f: - memories_data = json.load(f) - - with open(FIXTURES_DIR / "retrieval_queries.json") as f: - queries_data = json.load(f) + memories_path = FIXTURES_DIR / "sample_memories.json" + queries_path = FIXTURES_DIR / "retrieval_queries.json" + + if not memories_path.exists(): + raise FileNotFoundError(f"Missing fixture: {memories_path}") + if not queries_path.exists(): + raise FileNotFoundError(f"Missing fixture: {queries_path}") + + with open(memories_path) as f: + memories_data = json.load(f) + + with open(queries_path) as f: + queries_data = json.load(f) return memories_data, queries_data
53-56: Close the store and collector to prevent resource leaks.Neither
SqlAlchemyMemoryStorenorMemoryMetricCollectorare explicitly closed. Based on the codebase,SqlAlchemyMemoryStorehas aclose()method that disposes the engine. Without closing, SQLite file handles may remain open, potentially causing issues with theos.unlink(db_path)cleanup.🔎 Proposed fix
try: db_url = f"sqlite:///{db_path}" store = SqlAlchemyMemoryStore(db_url=db_url) collector = MemoryMetricCollector(db_url=db_url) # ... rest of the code ... return { "passed": passed, "total_queries": total_queries, "hits": hits, "hit_rate": hit_rate, "details": details, } finally: # Cleanup + store.close() if os.path.exists(db_path): os.unlink(db_path)
150-155: Remove extraneousfprefix from strings without placeholders.Lines 150 and 154 use f-strings without any interpolation. As flagged by static analysis (Ruff F541), these should be regular strings.
🔎 Proposed fix
- print(f"\nResults:") + print("\nResults:") print(f" Total queries: {result['total_queries']}") print(f" Hits: {result['hits']}") print(f" Hit rate: {result['hit_rate']:.1%}") - print(f" Target: >= 80%") + print(" Target: >= 80%") print(f" Status: {'PASS' if result['passed'] else 'FAIL'}")tests/integration/test_scope_and_acceptance_criteria_hooks.py (2)
91-95: Consider asserting the expectedfalse_positive_ratevalue.Unlike other tests that verify both existence and value of the metric, this test only checks that
false_positive_rateexists in the metrics. For consistency and stronger verification, consider asserting the expected value.🔎 Proposed improvement
# Verify false_positive_rate metric was recorded metrics_response = test_client.get("/api/memory/metrics") assert metrics_response.status_code == 200 metrics = metrics_response.json() assert "false_positive_rate" in metrics["metrics"] + # One high-confidence item rejected = 100% false positive rate for this batch + assert metrics["metrics"]["false_positive_rate"]["value"] == 1.0
108-120: Add response status assertions when creating test memories.The loop creates memories but doesn't verify the responses succeeded. If memory creation fails, the test could produce misleading results (e.g.,
deleted_countwould be 0 instead of 3).🔎 Proposed fix
# Create some memories in the track for i in range(3): - test_client.post( + resp = test_client.post( "/api/research/memory/items", json={ "user_id": "test_user", "kind": "fact", "content": f"Test fact {i}", "confidence": 0.7, "scope_type": "track", "scope_id": str(track_id), }, ) + assert resp.status_code == 200, f"Failed to create memory {i}: {resp.text}"src/paperbot/memory/eval/collector.py (1)
43-50: AnnotateTARGETSasClassVarfor type correctness.As flagged by static analysis (Ruff RUF012), mutable class attributes should be annotated with
typing.ClassVarto indicate they belong to the class, not instances.🔎 Proposed fix
Add
ClassVarto imports:-from typing import Any, Dict, List, Optional +from typing import Any, ClassVar, Dict, List, OptionalThen annotate the constant:
# P0 target thresholds - TARGETS = { + TARGETS: ClassVar[Dict[str, float]] = { "extraction_precision": 0.85, "false_positive_rate": 0.05, "retrieval_hit_rate": 0.80, "injection_pollution_rate": 0.02, "deletion_compliance": 1.00, }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
.github/workflows/ci.ymldocs/P0_memory_scope_acceptance_design.mddocs/memory_isolation.mddocs/memory_types.mdevals/memory/__init__.pyevals/memory/fixtures/retrieval_queries.jsonevals/memory/fixtures/sample_memories.jsonevals/memory/test_deletion_compliance.pyevals/memory/test_retrieval_hit_rate.pyrequirements-ci.txtrequirements.txtsrc/paperbot/api/routes/memory.pysrc/paperbot/api/routes/research.pysrc/paperbot/infrastructure/stores/memory_store.pysrc/paperbot/infrastructure/stores/models.pysrc/paperbot/memory/eval/__init__.pysrc/paperbot/memory/eval/collector.pytests/integration/test_scope_and_acceptance_criteria_hooks.pytests/unit/test_memory_metric_collector.pytests/unit/test_memory_module.py
🧰 Additional context used
🧬 Code graph analysis (8)
tests/unit/test_memory_module.py (2)
src/paperbot/memory/schema.py (1)
MemoryCandidate(39-48)src/paperbot/infrastructure/stores/memory_store.py (2)
list_memories(233-269)get_items_by_ids(271-287)
tests/unit/test_memory_metric_collector.py (1)
src/paperbot/memory/eval/collector.py (8)
MemoryMetricCollector(24-243)record_extraction_precision(57-74)get_latest_metrics(179-200)record_false_positive_rate(76-93)record_retrieval_hit_rate(95-112)record_injection_pollution_rate(114-131)record_deletion_compliance(133-155)get_metrics_summary(202-210)
evals/memory/test_deletion_compliance.py (1)
src/paperbot/memory/eval/collector.py (1)
MemoryMetricCollector(24-243)
src/paperbot/api/routes/memory.py (1)
src/paperbot/memory/eval/collector.py (3)
MemoryMetricCollector(24-243)get_metrics_summary(202-210)get_metric_history(223-243)
src/paperbot/memory/eval/__init__.py (1)
src/paperbot/memory/eval/collector.py (1)
MemoryMetricCollector(24-243)
evals/memory/test_retrieval_hit_rate.py (3)
src/paperbot/infrastructure/stores/memory_store.py (3)
SqlAlchemyMemoryStore(40-657)add_memories(138-231)search_memories(289-352)src/paperbot/memory/schema.py (1)
MemoryCandidate(39-48)src/paperbot/memory/eval/collector.py (2)
MemoryMetricCollector(24-243)record_retrieval_hit_rate(95-112)
src/paperbot/memory/eval/collector.py (3)
src/paperbot/infrastructure/stores/models.py (2)
Base(11-12)MemoryEvalMetricModel(322-347)src/paperbot/infrastructure/stores/sqlalchemy_db.py (3)
SessionProvider(48-56)get_db_url(15-16)session(55-56)src/paperbot/api/routes/memory.py (1)
get_metric_history(300-308)
src/paperbot/api/routes/research.py (2)
src/paperbot/memory/eval/collector.py (4)
MemoryMetricCollector(24-243)record_false_positive_rate(76-93)record_retrieval_hit_rate(95-112)record_deletion_compliance(133-155)src/paperbot/infrastructure/stores/memory_store.py (3)
get_items_by_ids(271-287)list_memories(233-269)search_memories(289-352)
🪛 LanguageTool
docs/P0_memory_scope_acceptance_design.md
[uncategorized] ~648-~648: The official name of this software platform is spelled with a capital “H”.
Context: ...ynthetic test data ### 8.4 CI/CD - [ ] .github/workflows/ci.yml - Memory eval job add...
(GITHUB)
[uncategorized] ~739-~739: The official name of this software platform is spelled with a capital “H”.
Context: ...---------------------------------| | .github/workflows/ci.yml | Modified (added memo...
(GITHUB)
🪛 markdownlint-cli2 (0.18.1)
docs/memory_isolation.md
16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
106-106: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/P0_memory_scope_acceptance_design.md
16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
106-106: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
147-147: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
323-323: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
437-437: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
603-603: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
721-721: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
721-721: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
729-729: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
729-729: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
docs/memory_types.md
16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
106-106: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.10)
tests/unit/test_memory_module.py
143-143: Unpacked variable rows is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
evals/memory/test_deletion_compliance.py
53-53: Unpacked variable created is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
53-53: Unpacked variable skipped is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
140-140: f-string without any placeholders
Remove extraneous f prefix
(F541)
evals/memory/test_retrieval_hit_rate.py
150-150: f-string without any placeholders
Remove extraneous f prefix
(F541)
154-154: f-string without any placeholders
Remove extraneous f prefix
(F541)
src/paperbot/memory/eval/collector.py
44-50: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
🔇 Additional comments (30)
evals/memory/__init__.py (1)
1-7: LGTM! Clear package documentation.The docstring provides clear documentation of the memory evaluation test suite and its components.
docs/memory_isolation.md (1)
1-144: LGTM! Comprehensive isolation strategy documentation.This document provides clear, detailed documentation of the memory namespace hierarchy and isolation rules. The inclusion of implementation references, code examples, and explicit behavior specifications makes it valuable for both developers and maintainers.
docs/P0_memory_scope_acceptance_design.md (1)
1-739: LGTM! Excellent comprehensive design document.This design document is thorough, well-structured, and provides valuable context for the P0 memory scope implementation. Key strengths:
- Clear architecture mapping across all 5 layers
- Detailed acceptance criteria with measurable metrics
- Comprehensive risk analysis and mitigation strategies
- Realistic workload estimation and timeline
- Strong references to academic papers and industry practices
The document effectively bridges strategic planning with implementation details, making it a valuable resource for both current development and future maintenance.
docs/memory_types.md (1)
1-141: Excellent documentation structure.The memory taxonomy is comprehensive and well-organized. The breakdown of memory types, confidence thresholds, PII risk levels, and status workflows provides clear guidance for implementation.
.github/workflows/ci.yml (1)
51-56: LGTM! Clean integration of memory acceptance tests.The new CI step properly sets
PYTHONPATHand executes the memory acceptance tests. This aligns well with the PR's goal of validating deletion compliance and retrieval hit rate metrics.evals/memory/fixtures/sample_memories.json (1)
1-42: LGTM! Well-structured test fixture.The fixture provides good coverage of memory types with realistic confidence scores that align with the taxonomy thresholds defined in
memory_types.md.evals/memory/fixtures/retrieval_queries.json (1)
1-35: LGTM! Comprehensive retrieval test cases.The fixture defines clear test cases with expected outcomes for validating retrieval hit rates across different memory kinds.
tests/unit/test_memory_module.py (1)
130-161: Excellent test coverage.The test validates:
- Basic retrieval by IDs
- Empty list edge case
- User isolation guarantees
The approach of using
list_memoriesto get IDs is correct since SQLAlchemy session objects are detached after the transaction.tests/unit/test_memory_metric_collector.py (4)
1-27: Well-structured unit tests with good isolation.Tests correctly use
tmp_pathfor database isolation and verify boundary conditions (e.g.,0.85 == targetmeeting the>= 0.85threshold). Coverage of the extraction precision recording and retrieval is complete.
29-59: Good coverage of pass/fail target logic for false positive rate.Tests correctly verify both the passing case (3% ≤ 5% target) and failing case (10% > 5% target), confirming the "lower is better" logic for this metric.
96-128: Deletion compliance calculation verified correctly.The test at line 127 correctly validates the formula
1.0 - (deleted_retrieved / deleted_total)=1.0 - (2/10) = 0.8, and properly asserts this fails the 100% target.
160-189: Essential edge case coverage for zero totals and history ordering.
test_collector_get_metric_historyvalidates descending order (most recent first)test_collector_skips_zero_totalsensures no division-by-zero and no spurious metrics recordedBoth are critical for robustness.
src/paperbot/memory/eval/__init__.py (1)
1-14: Clean module initialization with proper public API exposure.The docstring clearly documents the 5 P0 metrics, and
__all__correctly limits the public surface toMemoryMetricCollector.evals/memory/test_deletion_compliance.py (1)
26-129: Test logic is sound and comprehensive.The test correctly:
- Creates memories, soft-deletes a subset
- Attempts retrieval via multiple search queries and
list_memories- Verifies deleted items are not returned
- Records the metric via
MemoryMetricCollectorThe binary compliance calculation (line 104) is appropriate for a pass/fail acceptance test.
src/paperbot/infrastructure/stores/models.py (1)
322-347: Well-designed metrics model with appropriate indexing.The schema correctly:
- Indexes
metric_nameandevaluated_atto support efficient queries for latest values and history- Uses nullable
sample_sizefor flexibility- Provides safe default
"{}"fordetail_jsonThe docstring clearly documents the metric definitions and references the design doc.
src/paperbot/infrastructure/stores/memory_store.py (1)
271-287: Implementation is correct with proper user isolation.The method efficiently handles the empty-list case and correctly filters by
user_id. Note that unlikelist_memories, this method does not filter out soft-deleted items (deleted_at IS NOT NULL). This appears intentional for the current use case (checking item state before moderation), but consider adding aninclude_deletedparameter for consistency with other methods if broader reuse is expected.src/paperbot/api/routes/memory.py (1)
279-308: Metrics endpoints are well-designed.
- Input validation for
metric_nameagainstTARGETSprevents invalid queries- Response models clearly define the contract
- The
limitparameter has sensible bounds (1-100)- Error message helpfully lists valid metric names
src/paperbot/api/routes/research.py (2)
21-29: Lazy initialization pattern is appropriate.This avoids import-time database connections and is consistent with the recommended pattern.
446-509: Feedback endpoint is well-designed.The endpoint clearly documents its purpose and correctly calculates
hit_rate = hits / total. The detail dictionary captures comprehensive context for debugging and analysis.evals/memory/test_retrieval_hit_rate.py (1)
42-134: Well-structured acceptance test with proper isolation.The test logic correctly:
- Uses an isolated temporary database
- Loads fixtures and inserts memories
- Runs queries and tracks hits with intersection checking
- Records the metric with detailed test case information
- Handles edge case of zero queries
tests/integration/test_scope_and_acceptance_criteria_hooks.py (4)
16-24: LGTM - Good use of lazy import for proper database isolation.The import of
api_maininside the fixture after settingPAPERBOT_DB_URLensures the application is initialized with the test database URL. This is a correct pattern for FastAPI integration testing with environment-based configuration.
27-51: Good end-to-end test verifying both response and side effect.The test correctly validates the feedback endpoint response and then verifies that the
retrieval_hit_ratemetric was recorded with the expected value.
139-166: LGTM - Good coverage of metrics endpoint functionality.The test validates the initial empty state, records a metric through the feedback endpoint, and verifies the history endpoint returns the expected data.
169-186: LGTM - Validates pass status for metrics meeting target.The test correctly verifies that a metric exceeding the target reports
meets_target: True. Consider adding a test case for a failing metric (e.g., hit rate below 80%) in future iterations for more complete coverage.src/paperbot/memory/eval/collector.py (6)
57-155: LGTM - Recording methods are well-structured with proper guards.All five recording methods:
- Guard against division by zero with early returns
- Follow a consistent pattern
- Delegate to
_record_metricwith appropriate parameters- Have clear docstrings explaining the metric semantics
179-200: LGTM - Query pattern is acceptable for fixed number of metrics.While this issues 5 separate queries (one per metric type), this is acceptable given:
- The number of metrics is fixed and small (5)
- This is an evaluation/monitoring endpoint, not a hot path
- Code clarity is prioritized over micro-optimization
202-210: Edge case: Empty metrics returns"pass"status.When no metrics have been recorded,
latest.values()is empty, andall()on an empty iterable returnsTrue. This means the summary will report"pass"even when no metrics exist.Consider whether this is the desired behavior, or if an empty state should report
"unknown"or"incomplete".🔎 Possible alternative if empty state should differ
def get_metrics_summary(self) -> Dict[str, Any]: """Get summary of all metrics with pass/fail status.""" latest = self.get_latest_metrics() + if not latest: + status = "no_data" + else: - all_pass = all(m.get("meets_target", False) for m in latest.values()) + all_pass = all(m.get("meets_target", False) for m in latest.values()) + status = "pass" if all_pass else "fail" return { - "status": "pass" if all_pass else "fail", + "status": status, "metrics": latest, "targets": self.TARGETS, }
212-221: LGTM - Correct threshold logic for different metric types.The
_meets_targetmethod correctly handles:
- "Lower is better" metrics (
false_positive_rate,injection_pollution_rate): uses<=- "Higher is better" metrics (
extraction_precision,retrieval_hit_rate,deletion_compliance): uses>=
223-242: LGTM - Clean implementation of metric history retrieval.The method correctly:
- Filters by metric name
- Orders by most recent first
- Respects the limit parameter
- Returns a well-structured list of historical values
167-177: No action needed. The code correctly uses the session context manager pattern. The project explicitly requires SQLAlchemy 2.0+, which natively supports the context manager protocol for Session objects. Thesessionmakeris configured withfuture=True, ensuring proper 2.0 behavior. The usagewith self._provider.session() as session:is safe and correct.
| # Keep this smaller than requirements.txt to avoid heavy system deps (e.g. weasyprint). | ||
|
|
||
| fastapi==0.115.0 | ||
| python-multipart>=0.0.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "requirements-ci.txt" -type fRepository: jerry609/PaperBot
Length of output: 82
🏁 Script executed:
cat -n requirements-ci.txtRepository: jerry609/PaperBot
Length of output: 1047
🌐 Web query:
requests 2.28.0 CVE security vulnerability 2023 2024 2025
💡 Result:
Short answer: Yes — requests 2.28.0 is affected by multiple CVEs discovered in 2023–2025. Upgrade to a fixed release (see details below).
Details (affected in 2.28.0 and fixed releases)
- CVE-2023-32681 — unintended leak of Proxy-Authorization header. Affects many 2.x releases including 2.28.0; fixed in later releases (recommended >= 2.31.0). [1]
- CVE-2024-35195 — Sessions that first use verify=False can cause subsequent requests on that connection to skip cert verification; fixed in 2.32.0. [2]
- CVE-2024-47081 — URL parsing issue that may leak .netrc credentials; affects requests < 2.32.4 and is fixed in 2.32.4. For older versions you can mitigate by disabling .netrc usage (Session(trust_env=False)). [3]
Recommended action: Upgrade to requests >= 2.32.4 (covers the .netrc fix) or at minimum to the specific fixed versions noted above.
Sources:
[1] (CVE-2023-32681 / disclosure/fix guidance).
[2] (CVE-2024-35195 — fixed in 2.32.0).
[3] (CVE-2024-47081 — fixed in 2.32.4).
🌐 Web query:
python-dotenv 0.19.0 CVE security vulnerability
💡 Result:
I found no public CVE that specifically affects python-dotenv v0.19.0. Vendor/DB checks show no direct CVE for that version and the project's security page lists no published advisories. Sources: Snyk (package view), Safety DB, and the python-dotenv GitHub security page. [1][2][3]
Recommendation: if you can, upgrade to the latest non‑vulnerable release (Safety DB / Snyk show the current recommended version is 1.2.1) and monitor the project's security page or NVD for new advisories. [2][1]
Sources
- Snyk — python-dotenv 0.19.0 (no direct vulnerabilities listed). [1]
- Safety DB — python-dotenv versions / latest recommended version. [2]
- python-dotenv GitHub — security policy / no published advisories. [3]
🌐 Web query:
jinja2 3.1.0 CVE security vulnerability 2023 2024 2025
💡 Result:
Short answer: Yes — Jinja2 3.1.0 is vulnerable to multiple disclosed CVEs (December 2024–March 2025). Upgrade to a patched release (>= 3.1.6) or apply vendor fixes.
What affects 3.1.0 (examples):
- CVE-2024-56201 — sandbox breakout via malicious filenames (affects >=3.0.0 and <=3.1.4; fixed in 3.1.5). (github.com)
- CVE-2024-56326 — sandbox bypass via indirect str.format reference (affects <3.1.5; fixed in 3.1.5). (alas.aws.amazon.com)
- CVE-2024-22195 and CVE-2024-34064 — xmlattr / XSS issues reported in 2024 affecting 3.1.x versions (vendor advisories list these). (ibm.com)
- CVE-2025-27516 — |attr filter sandbox bypass / RCE (affects <3.1.6; fixed in 3.1.6). (alas.aws.amazon.com)
Recommended action:
- Upgrade Jinja2 to the latest patched release (at minimum 3.1.6 to cover the 2025 |attr issue). (alas.aws.amazon.com)
- If you cannot upgrade immediately, follow your OS/vendor security advisories for backported fixes (Ubuntu/Red Hat/SUSE/Amazon Linux advisories exist). (ubuntu.com)
If you want, I can (1) list exact patched release notes and links for each CVE, or (2) scan a requirements.txt / environment to see if you're using a vulnerable Jinja2 version.
Citations:
- 1: https://github.com/advisories/GHSA-gmj6-6f8f-6699?utm_source=openai
- 2: https://alas.aws.amazon.com/cve/html/CVE-2024-56326.html?utm_source=openai
- 3: https://www.ibm.com/support/pages/security-bulletin-jinja2-2113-py2py3-none-anywhl-and-jinja2-312-py3-none-anywhl-vulnerable-cve-2024-22195-used-ibm-maximo-application-suite-edge-data-collector?utm_source=openai
- 4: https://alas.aws.amazon.com/cve/html/CVE-2025-27516.html?utm_source=openai
- 5: https://alas.aws.amazon.com/cve/html/CVE-2025-27516.html?utm_source=openai
- 6: https://ubuntu.com/security/CVE-2024-56201?utm_source=openai
🌐 Web query:
gitpython 3.1.0 CVE security vulnerability 2023 2024 2025
💡 Result:
Short answer: Yes — GitPython 3.1.0 is affected.
Details:
- CVE-2023-40267 (RCE via insufficient sanitization in clone/clone_from) affects GitPython versions < 3.1.32 (so 3.1.0 is vulnerable). Update fixed in 3.1.32. [1][2]
- CVE-2024-22190 (untrusted search path / arbitrary code execution on Windows, an incomplete follow-up to the 2023 issue) affects versions prior to 3.1.41 (so 3.1.0 is vulnerable). Update fixed in 3.1.41. [3][4]
Recommended action: upgrade to the latest GitPython release (at minimum >= 3.1.41) and audit any code that passes user-controlled options to clone/clone_from or uses shells on Windows. [1][3]
Sources:
- GitLab Advisory / Debian / Ubuntu references for CVE-2023-40267 (fix in 3.1.32). [GitLab advisory / Ubuntu]. (advisories.gitlab.com)
- Additional writeups and advisories for CVE-2023-40267. (resolvedsecurity.com)
- NVD / INCIBE / other advisories for CVE-2024-22190 (fix in 3.1.41). (nvd.nist.gov)
- Vendor/third‑party summaries noting patch in 3.1.41. (advisories.gitlab.com)
If you want, I can check your project for which GitPython version is installed and show the exact upgrade command.
Citations:
- 1: https://advisories.gitlab.com/pkg/pypi/gitpython/CVE-2023-40267/?utm_source=openai
- 2: https://www.resolvedsecurity.com/vulnerability-catalog/CVE-2023-40267?utm_source=openai
- 3: https://nvd.nist.gov/vuln/detail/CVE-2024-22190?utm_source=openai
- 4: https://advisories.gitlab.com/pkg/pypi/gitpython/?utm_source=openai
🌐 Web query:
uvicorn 0.32.0 CVE security vulnerability
💡 Result:
Short answer: There are no public CVEs that specifically target uvicorn 0.32.0. The known CVEs (e.g., CVE-2020-7694 and CVE-2020-7695) affect much older releases (<0.11.7) and not 0.32.0. As of January 4, 2026, Uvicorn's release notes show 0.32.0 (Oct 15, 2024) with no CVE listed for that release; vulnerability databases likewise attribute the historical CVEs to older versions. [1][2][3]
Recommendation: If you run 0.32.0, upgrade to a recent patched release (check the latest on the Uvicorn release notes or PyPI) and monitor vendor advisories. [1]
Sources:
- Uvicorn release notes (0.32.0 entry). [1]
- Ubuntu / OSV entry for CVE-2020-7694 (log/ANSI escape injection) — affects older versions. [2]
- Ubuntu entry for CVE-2020-7695 (HTTP response splitting) — affects older versions. [3]
🌐 Web query:
markdown 3.4.0 CVE security vulnerability
💡 Result:
Short answer: There is no single CVE named for "markdown 3.4.0" (the Python-Markdown package). However, several Markdown-related projects and extensions (PyMdown, Python‑Markdown plugins, and other Markdown libraries) have had CVEs that can affect systems using older Markdown versions or extensions. See notable examples and advisories below.
- Arbitrary file inclusion in PyMdown Snippets (CVE-2023-32309) — affects PyMdown (an extensions package for Python‑Markdown); patched in newer PyMdown releases. [1][2]
- Security databases list older Python‑Markdown releases (including 3.4.x) among versions with known issues; maintainers recommend upgrading to a current, supported release. [3]
Recommendation: if you run Python‑Markdown 3.4.0 or use PyMdown/extensions, upgrade to the latest patched releases for both python-markdown and any extensions (or apply vendor advisories) and audit any markdown-processing that consumes untrusted input.
Sources:
- NVD — CVE-2023-32309 (PyMdown Snippets arbitrary file inclusion). [turn1search6].
- CVEDetails / advisories for CVE-2023-32309. [turn1search3].
- Safety DB / package security summary for python-markdown (version list & guidance). [turn1search4].
🌐 Web query:
aiofiles 22.1.0 CVE security vulnerability
💡 Result:
Short answer: There is no publicly reported CVE specifically affecting aiofiles 22.1.0 as of Jan 4, 2026. Security databases (Snyk, Safety DB, Meterian) list no direct CVEs for aiofiles v22.1.0. [1][2][3]
Related note: an unrelated older CVE (CVE-2020-8162) affected very early aiofiles/RHEL packaging in 2020/2021; it does not target aiofiles 22.1.0. [4]
Recommendation: if you depend on aiofiles, keep it updated to the latest stable release and run your own SBOM/vulnerability scan (e.g., Snyk, OSV, Safety) to catch new advisories.
Sources:
- Snyk package page for aiofiles (no known direct vulnerabilities). [1]
- Safety DB / package-versions listing showing 22.1.0 with no reported CVE. [2]
- Meterian component page for aiofiles 22.1.0 (no known vulnerabilities). [3]
- Snyk/RHEL advisory for CVE-2020-8162 (older, unrelated). [4]
Would you like me to run a live scan of a requirements.txt or check the latest aiofiles release?
Upgrade vulnerable dependencies to patch critical security issues.
Verification confirms several dependencies have known security vulnerabilities:
- requests>=2.28.0: CVE-2023-32681 (Proxy-Authorization header leak), CVE-2024-35195 (SSL verification bypass), CVE-2024-47081 (.netrc credential leak) → upgrade to >=2.32.4
- jinja2>=3.1.0: CVE-2024-56201 (sandbox breakout), CVE-2024-56326 (sandbox bypass), CVE-2024-22195 and CVE-2024-34064 (xmlattr/XSS), CVE-2025-27516 (|attr filter RCE) → upgrade to >=3.1.6
- gitpython>=3.1.0: CVE-2023-40267 (RCE via insufficient sanitization in clone operations), CVE-2024-22190 (untrusted search path / arbitrary code execution on Windows) → upgrade to >=3.1.41
No CVEs found in python-dotenv 0.19.0, uvicorn 0.32.0, markdown 3.4.0, or aiofiles 22.1.0, though these remain outdated and should be periodically reviewed.
🤖 Prompt for AI Agents
In requirements-ci.txt around lines 5 to 5, several dependencies are vulnerable;
update the package pins in the file to safe minimum versions: set requests to
>=2.32.4, jinja2 to >=3.1.6, and gitpython to >=3.1.41 (leave other packages
unchanged unless review dictates); after updating, regenerate any
lock/constraints files (pip-compile/poetry lock), run the test suite and
dependency-scan/safety checks locally, and commit the updated requirements and
lockfiles to ensure CI uses the patched versions.
|
|
||
| # FastAPI backend for CLI/Web | ||
| fastapi==0.115.0 | ||
| python-multipart>=0.0.6 # Required for file uploads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check python-multipart version info and security data from PyPI
echo "=== Checking python-multipart on PyPI ==="
curl -s https://pypi.org/pypi/python-multipart/json | jq '{
current_version: .info.version,
latest_release: .releases | keys[-1],
all_versions: (.releases | keys | reverse | .[0:10]),
requires_python: .info.requires_python,
summary: .info.summary
}'
echo -e "\n=== Release history (last 15 versions) ==="
curl -s https://pypi.org/pypi/python-multipart/json | jq -r '.releases | keys | reverse | .[0:15] | .[]'Repository: jerry609/PaperBot
Length of output: 549
🌐 Web query:
python-multipart security vulnerabilities CVE advisories
💡 Result:
Found multiple advisories for python-multipart DoS issues. Summary and action items:
-
CVE-2024-24762 — regex ReDoS in Content-Type parsing; fixed in python-multipart 0.0.7. Remediate by upgrading to >= 0.0.7. [1][2]
-
CVE-2024-53981 — excessive per-byte skipping/logging before first/after last boundary causing DoS; fixed in python-multipart 0.0.18. Remediate by upgrading to >= 0.0.18. [3][4]
-
Related: CVE-2023-30798 — unlimited multipart parts (too many fields/files) causes high CPU/memory; mitigated by Starlette 0.25.0 (defaults max_fields/max_files=1000) or FastAPI 0.92.0. If you can’t upgrade python-multipart, ensure Starlette/FastAPI are updated or set max_fields/max_files when parsing forms. [5]
Sources:
[1] NVD / CVE-2024-24762.
[2] Debian security tracker (python-multipart / CVE-2024-24762).
[3] NVD / CVE-2024-53981 (and GitHub advisory).
[4] Debian security tracker (python-multipart / CVE-2024-53981).
[5] Seal Security / advisory on multipart parser (CVE-2023-30798) and Starlette/FastAPI fixes.
Would you like exact upgrade commands (pip / requirements) or a quick scan command to detect vulnerable versions in your environment?
🌐 Web query:
FastAPI 0.115.0 python-multipart version requirements compatibility
💡 Result:
Short answer: FastAPI 0.115.0 (pre-0.115.4) expects the old "multipart" import, so use an older python-multipart release (pin to 0.0.12 / <=0.0.13). FastAPI added compatibility for newer python-multipart imports in 0.115.4 and then bumped the minimum to >=0.0.18 in later 0.115.x releases. [1][2]
Sources:
- FastAPI release notes (0.115.4: updated import/check for python-multipart; 0.115.7: bumped to >=0.0.18). [1]
- FastAPI GitHub discussion showing breakage when python-multipart switched import path (recommendation to use python-multipart==0.0.12 as hotfix). [2]
Tighten the python-multipart version constraint to address security vulnerabilities and FastAPI 0.115.0 compatibility.
The constraint >=0.0.6 allows vulnerable versions. CVE-2024-24762 (ReDoS in Content-Type parsing) requires >=0.0.7, and CVE-2024-53981 (DoS with boundary parsing) requires >=0.0.18. However, FastAPI 0.115.0 has import path compatibility issues with python-multipart >=0.0.14. Use python-multipart>=0.0.7,<=0.0.13 (recommend 0.0.12) for FastAPI 0.115.0, or upgrade FastAPI to 0.115.4+ to support newer python-multipart versions.
🤖 Prompt for AI Agents
In requirements.txt around line 65, the current loose constraint
"python-multipart>=0.0.6" permits vulnerable versions; tighten it to a safe,
FastAPI‑compatible range by changing the requirement to
"python-multipart>=0.0.7,<=0.0.13" (recommended pin to 0.0.12) to address
CVE-2024-24762 and CVE-2024-53981 while preserving FastAPI 0.115.0
compatibility; alternatively, if you prefer newer python-multipart versions,
upgrade FastAPI to 0.115.4+ and then allow python-multipart>=0.0.18.
|
|
||
| # P0 Hook: Verify deletion compliance - deleted items should not be retrievable | ||
| if deleted > 0: | ||
| # Try to retrieve items from the cleared scope (should return empty) | ||
| retrieved_after_delete = _memory_store.list_memories( | ||
| user_id=user_id, | ||
| scope_type="track", | ||
| scope_id=str(track_id), | ||
| include_deleted=False, | ||
| include_pending=True, | ||
| limit=100, | ||
| ) | ||
| # Also try searching | ||
| search_results = _memory_store.search_memories( | ||
| user_id=user_id, | ||
| query="*", # broad query | ||
| scope_type="track", | ||
| scope_id=str(track_id), | ||
| limit=100, | ||
| ) | ||
| retrieved_count = len(retrieved_after_delete) + len(search_results) | ||
|
|
||
| collector = _get_metric_collector() | ||
| collector.record_deletion_compliance( | ||
| deleted_retrieved_count=retrieved_count, | ||
| deleted_total_count=deleted, | ||
| evaluator_id=f"user:{user_id}", | ||
| detail={ | ||
| "track_id": track_id, | ||
| "deleted_count": deleted, | ||
| "retrieved_after_delete": retrieved_count, | ||
| "action": "clear_track_memory", | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential double-counting and ineffective search query in deletion compliance check.
Two issues:
-
Search query
"*": Thesearch_memoriesmethod tokenizes the query and filters bycontent.contains(token). A single"*"won't match typical content, so this search may return nothing or fall back tolist_memories, potentially returning the same items. -
Double-counting: Both
list_memoriesandsearch_memoriescould return the same items, inflatingretrieved_count.
Consider either:
- Using only
list_memoriesfor the compliance check (simpler, sufficient) - Or deduplicating by ID if both are needed
🔎 Proposed simplification
if deleted > 0:
# Try to retrieve items from the cleared scope (should return empty)
retrieved_after_delete = _memory_store.list_memories(
user_id=user_id,
scope_type="track",
scope_id=str(track_id),
include_deleted=False,
include_pending=True,
limit=100,
)
- # Also try searching
- search_results = _memory_store.search_memories(
- user_id=user_id,
- query="*", # broad query
- scope_type="track",
- scope_id=str(track_id),
- limit=100,
- )
- retrieved_count = len(retrieved_after_delete) + len(search_results)
+ retrieved_count = len(retrieved_after_delete)
collector = _get_metric_collector()
collector.record_deletion_compliance(📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # P0 Hook: Verify deletion compliance - deleted items should not be retrievable | |
| if deleted > 0: | |
| # Try to retrieve items from the cleared scope (should return empty) | |
| retrieved_after_delete = _memory_store.list_memories( | |
| user_id=user_id, | |
| scope_type="track", | |
| scope_id=str(track_id), | |
| include_deleted=False, | |
| include_pending=True, | |
| limit=100, | |
| ) | |
| # Also try searching | |
| search_results = _memory_store.search_memories( | |
| user_id=user_id, | |
| query="*", # broad query | |
| scope_type="track", | |
| scope_id=str(track_id), | |
| limit=100, | |
| ) | |
| retrieved_count = len(retrieved_after_delete) + len(search_results) | |
| collector = _get_metric_collector() | |
| collector.record_deletion_compliance( | |
| deleted_retrieved_count=retrieved_count, | |
| deleted_total_count=deleted, | |
| evaluator_id=f"user:{user_id}", | |
| detail={ | |
| "track_id": track_id, | |
| "deleted_count": deleted, | |
| "retrieved_after_delete": retrieved_count, | |
| "action": "clear_track_memory", | |
| }, | |
| ) | |
| # P0 Hook: Verify deletion compliance - deleted items should not be retrievable | |
| if deleted > 0: | |
| # Try to retrieve items from the cleared scope (should return empty) | |
| retrieved_after_delete = _memory_store.list_memories( | |
| user_id=user_id, | |
| scope_type="track", | |
| scope_id=str(track_id), | |
| include_deleted=False, | |
| include_pending=True, | |
| limit=100, | |
| ) | |
| retrieved_count = len(retrieved_after_delete) | |
| collector = _get_metric_collector() | |
| collector.record_deletion_compliance( | |
| deleted_retrieved_count=retrieved_count, | |
| deleted_total_count=deleted, | |
| evaluator_id=f"user:{user_id}", | |
| detail={ | |
| "track_id": track_id, | |
| "deleted_count": deleted, | |
| "retrieved_after_delete": retrieved_count, | |
| "action": "clear_track_memory", | |
| }, | |
| ) |
🤖 Prompt for AI Agents
In src/paperbot/api/routes/research.py around lines 535 to 568, the
deletion-compliance check is using both list_memories and search_memories with a
tokenized "*" query which is ineffective and may double-count items; simplify by
using only list_memories to verify that no items remain (remove the
search_memories call and use the length of retrieved_after_delete as
retrieved_count), or if you must keep search_memories, replace the "*" query
with a meaningful query and deduplicate results by memory ID before summing
counts (build a set of IDs from both results and use its size as
retrieved_count) and update the metric payload accordingly.
feat(memory): implement scope and acceptance criteria
Add memory type taxonomy, isolation strategy, and acceptance metrics:
Add Unit tests (tests/unit/test_memory_metric_collector.py):
Unit tests (tests/unit/test_memory_module.py):
Integration tests (tests/integration/test_scope_and_acceptance_criteria_hooks.py):
Acceptance Metrics:
Generated with [LIU BOYU]oor2020@163.com
Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.