Skip to content

Conversation

@CJBshuosi
Copy link
Collaborator

@CJBshuosi CJBshuosi commented Jan 4, 2026

feat(memory): implement scope and acceptance criteria
Add memory type taxonomy, isolation strategy, and acceptance metrics:

  • docs: memory_types.md, memory_isolation.md, P0 design doc
  • feat: MemoryMetricCollector for tracking metrics
  • feat: MemoryEvalMetricModel for persisting metrics
  • feat: hooks in research routes (false_positive, deletion_compliance)
  • feat: /research/memory/feedback endpoint for retrieval_hit_rate
  • feat: /memory/metrics endpoints for viewing recorded metrics
  • feat: get_items_by_ids() method in memory_store
  • test: deletion_compliance and retrieval_hit_rate eval scripts
  • ci: add memory scope and acceptance tests to CI workflow

Add Unit tests (tests/unit/test_memory_metric_collector.py):

  • Test all 5 metric recording methods
  • Test pass/fail target validation
  • Test metrics summary and history
  • Test edge cases (zero totals)

Unit tests (tests/unit/test_memory_module.py):

  • Add test for get_items_by_ids method

Integration tests (tests/integration/test_scope_and_acceptance_criteria_hooks.py):

  • Test memory feedback records retrieval_hit_rate
  • Test bulk_moderate rejection records false_positive_rate
  • Test clear_track_memory records deletion_compliance
  • Test /memory/metrics endpoints

Acceptance Metrics:

  • extraction_precision: >= 85%
  • false_positive_rate: <= 5%
  • retrieval_hit_rate: >= 80%
  • injection_pollution_rate: <= 2%
  • deletion_compliance: 100%

Generated with [LIU BOYU]oor2020@163.com

Summary by CodeRabbit

Release Notes

  • New Features

    • Memory isolation with three-level namespace hierarchy (user, workspace, scope-based)
    • Formal memory type taxonomy (user, episodic, workspace/project memory)
    • Memory quality metrics endpoints for tracking system health
    • Memory feedback API to capture user feedback on retrieved memories
    • Memory deletion count reporting
  • Documentation

    • Memory scope and acceptance design documentation
    • Memory isolation strategy and implementation guidelines
    • Memory types taxonomy and status workflow documentation

✏️ Tip: You can customize this high-level summary in your review settings.

   Add memory type taxonomy, isolation strategy, and acceptance metrics:

   - docs: memory_types.md, memory_isolation.md, P0 design doc
   - feat: MemoryMetricCollector for tracking 5 P0 metrics
   - feat: MemoryEvalMetricModel for persisting metrics
   - feat: P0 hooks in research routes (false_positive, deletion_compliance)
   - feat: /research/memory/feedback endpoint for retrieval_hit_rate
   - feat: /memory/metrics endpoints for viewing recorded metrics
   - feat: get_items_by_ids() method in memory_store
   - test: deletion_compliance and retrieval_hit_rate eval scripts
   - ci: add memory P0 acceptance tests to CI workflow

   P0 Acceptance Metrics:
   - extraction_precision: >= 85%
   - false_positive_rate: <= 5%
   - retrieval_hit_rate: >= 80%
   - injection_pollution_rate: <= 2%
   - deletion_compliance: 100%

   Generated with [LIU BOYU]<oor2020@163.com>
   in collector

   Fix import error: DbProvider doesn't exist, use SessionProvider from sqlalchemy_db.

   🤖 Generated with [LIU BOYU]<oor2020@163.com>
  - Add python-multipart to requirements.txt and requirements-ci.txt
    (required for FastAPI file uploads in /memory/ingest)

  🤖 Generated with [LIU BOYU]<oor2020@163.com>
  - memory.py: use MemoryMetricCollector() without args
  - collector.py: update docstring to show correct usage

  🤖 Generated with [LIU BOYU]<oor2020@163.com>
  🤖 Generated with [LIU BOYU]<oor2020@163.com>
  Generated with [LIU BOYU]<oor2020@163.com>
  Generated with [LIU BOYU]<oor2020@163.com>
  Add: markdown, aiofiles, python-dotenv, json-repair, uvicorn
  Generated with [LIU BOYU]<oor2020@163.com>
   Unit tests (tests/unit/test_memory_metric_collector.py):
   - Test all 5 metric recording methods
   - Test pass/fail target validation
   - Test metrics summary and history
   - Test edge cases (zero totals)

   Unit tests (tests/unit/test_memory_module.py):
   - Add test for get_items_by_ids method

   Integration tests (tests/integration/test_scope_and_acceptance_criteria_hooks.py):
   - Test memory feedback records retrieval_hit_rate
   - Test bulk_moderate rejection records false_positive_rate
   - Test clear_track_memory records deletion_compliance
   - Test /memory/metrics endpoints

   Generated with [LIU BOYU]<oor2020@163.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive memory evaluation framework with metric collection infrastructure, evaluation tests, and API endpoints. It adds CI pipeline steps for acceptance testing, a new database model for storing metrics, metric recording hooks in API routes, and integration/unit tests validating the system.

Changes

Cohort / File(s) Summary
CI & Configuration
.github/workflows/ci.yml, requirements.txt, requirements-ci.txt
Added CI step for memory acceptance tests; added python-multipart>=0.0.6 for file uploads; appended CI dependencies (gitpython, requests, jinja2, markdown, aiofiles, python-dotenv, json-repair, uvicorn).
Memory Design Documentation
docs/P0_memory_scope_acceptance_design.md, docs/memory_isolation.md, docs/memory_types.md
Added three design documents: P0 scope & acceptance metrics, isolation strategy with namespace hierarchy, and formal memory taxonomy with status workflow and confidence guidelines.
Memory Evaluation Tests & Fixtures
evals/memory/__init__.py, evals/memory/fixtures/retrieval_queries.json, evals/memory/fixtures/sample_memories.json, evals/memory/test_deletion_compliance.py, evals/memory/test_retrieval_hit_rate.py
Introduced memory evaluation test suite: deletion compliance test, retrieval hit-rate test with synthetic queries and sample memory data; tests use temporary SQLite DBs and compute acceptance metrics.
Metrics Infrastructure
src/paperbot/memory/eval/__init__.py, src/paperbot/memory/eval/collector.py, src/paperbot/infrastructure/stores/models.py
Added MemoryMetricCollector class with recording methods for five metrics (extraction_precision, false_positive_rate, retrieval_hit_rate, injection_pollution_rate, deletion_compliance), target thresholds, and query/summary APIs; added MemoryEvalMetricModel ORM table.
Memory Store Enhancement
src/paperbot/infrastructure/stores/memory_store.py
Added get_items_by_ids() method to retrieve memory items by a list of IDs with user isolation.
API Routes & Metrics Endpoints
src/paperbot/api/routes/memory.py, src/paperbot/api/routes/research.py
Added /memory/metrics and /memory/metrics/{metric_name} endpoints with MetricsSummaryResponse and MetricHistoryResponse models; integrated metric recording hooks in bulk_moderate (false_positive_rate), clear_track_memory (deletion_compliance), and new POST /research/memory/feedback endpoint (retrieval_hit_rate); updated ClearTrackMemoryResponse with deleted_count field.
Integration & Unit Tests
tests/integration/test_scope_and_acceptance_criteria_hooks.py, tests/unit/test_memory_metric_collector.py, tests/unit/test_memory_module.py
Added end-to-end tests validating metrics recording in API flows (feedback, moderation, deletion); unit tests for MemoryMetricCollector recording/retrieval and SqlAlchemyMemoryStore.get_items_by_ids().

Sequence Diagram

sequenceDiagram
    actor User
    participant API as API Route
    participant Collector as MemoryMetricCollector
    participant DB as Database
    participant Client

    User->>API: POST /research/memory/feedback
    activate API
    API->>Collector: record_retrieval_hit_rate(hits, expected)
    activate Collector
    Collector->>Collector: compute hit_rate, validate targets
    Collector->>DB: INSERT MemoryEvalMetricModel
    deactivate Collector
    API-->>User: 200 MemoryFeedbackResponse
    deactivate API

    User->>API: GET /memory/metrics
    activate API
    API->>Collector: get_metrics_summary()
    activate Collector
    Collector->>DB: SELECT latest per metric_name
    Collector->>Collector: evaluate each against TARGETS
    Collector-->>API: {metrics, targets, pass/fail}
    deactivate Collector
    API-->>User: 200 MetricsSummaryResponse
    deactivate API

    User->>API: GET /memory/metrics/{metric_name}
    activate API
    API->>Collector: get_metric_history(metric_name, limit=30)
    activate Collector
    Collector->>DB: SELECT * WHERE metric_name ORDER BY evaluated_at DESC
    Collector-->>API: [history items]
    deactivate Collector
    API-->>User: 200 MetricHistoryResponse
    deactivate API
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Metrics bloom in databases deep,
Hit rates hop and deletions keep,
Acceptance tests bound through the night,
Memory's measured—targets in sight!
Five metrics dance, targets all tight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.60% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(memory): implement scope and acceptance criteria' directly and concisely summarizes the main changes: adding memory scope definition, acceptance metrics framework, and related infrastructure.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CJBshuosi CJBshuosi requested a review from jerry609 January 4, 2026 15:50
@gemini-code-assist
Copy link

Summary of Changes

Hello @CJBshuosi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the memory subsystem by formalizing its operational boundaries and introducing a robust framework for quality assurance. It defines clear memory types and isolation strategies, alongside measurable acceptance criteria. The changes include the implementation of a metric collection system, new API endpoints for monitoring performance, and integrated hooks to automatically track key quality indicators. This foundational work ensures the memory system's reliability, accuracy, and compliance with defined standards, paving the way for future feature development with a strong emphasis on data integrity and user experience.

Highlights

  • Memory Type Taxonomy: Formalized the taxonomy for memory items, categorizing them into User, Episodic, and Workspace/Project Memory, with clear definitions, persistence characteristics, and confidence thresholds. Explicitly defined what is NOT considered part of the memory system.
  • Namespace and Isolation Strategy: Implemented a three-level isolation hierarchy (user_id, workspace_id, scope_type:scope_id) to ensure data privacy and context relevance. This includes rules for visibility, content hash deduplication, and auto-status assignment based on confidence and PII risk.
  • Acceptance Criteria and Metrics: Defined five key acceptance metrics for the memory system: Extraction Precision (>= 85%), False Positive Rate (<= 5%), Retrieval Hit Rate (>= 80%), Injection Pollution Rate (<= 2%), and Deletion Compliance (100%). Measurement methods and storage schema for these metrics have been established.
  • Metric Collection and API Endpoints: Introduced a MemoryMetricCollector class and a new memory_eval_metrics database table to store and manage these metrics. New API endpoints (/api/memory/metrics and /api/memory/metrics/{metric_name}) have been added to retrieve metric summaries and historical data.
  • Integration of Metric Hooks: Integrated metric collection hooks into existing API routes. Specifically, false_positive_rate is recorded when high-confidence items are rejected via bulk_moderate, retrieval_hit_rate is recorded through a new /research/memory/feedback endpoint, and deletion_compliance is checked when clear_track_memory is used.
  • Enhanced Memory Store Functionality: Added a get_items_by_ids() method to the memory_store for efficient retrieval of specific memory items by their IDs, improving data access for evaluation and moderation workflows.
  • Comprehensive Testing and CI Integration: Added new unit tests for the MemoryMetricCollector and get_items_by_ids method, along with integration tests to verify the correct functioning of the new metric collection hooks in the API routes. The CI workflow has been updated to include these new memory scope and acceptance tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive memory scope and acceptance criteria feature, including documentation, metric collection, new API endpoints, and extensive tests. The implementation is solid and well-tested. I've identified a few areas for improvement, primarily concerning a logical error in the design documentation's pseudo-code, a discrepancy between the documented and implemented query logic, and opportunities to align with Python and FastAPI best practices in the test and API code.

Comment on lines +358 to +381
```python
def resolve_visible_memories(user_id, workspace_id=None, scope_type=None, scope_id=None):
"""
Resolution order (most specific to least):
1. Exact scope match (scope_type + scope_id)
2. Global scope within workspace
3. Global scope for user (if no workspace specified)
"""
base_filter = (user_id == user_id)

if workspace_id:
base_filter &= (workspace_id == workspace_id)

if scope_type and scope_id:
# Include both specific scope AND global
scope_filter = (
(scope_type == scope_type AND scope_id == scope_id) |
(scope_type == 'global')
)
else:
scope_filter = (scope_type == 'global')

return base_filter & scope_filter
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pseudo-code for resolve_visible_memories contains a logical error. The comparisons like user_id == user_id and scope_type == scope_type will always evaluate to true, as they compare a variable to itself. The intention is likely to compare database fields with the function's arguments. This could be misleading for developers implementing this logic.

Here is a corrected version assuming item is the object being filtered:

def resolve_visible_memories(user_id, workspace_id=None, scope_type=None, scope_id=None):
    """
    Resolution order (most specific to least):
    1. Exact scope match (scope_type + scope_id)
    2. Global scope within workspace
    3. Global scope for user (if no workspace specified)
    """
    # Assuming 'item' is the memory item object being filtered
    base_filter = (item.user_id == user_id)

    if workspace_id:
        base_filter &= (item.workspace_id == workspace_id)

    if scope_type and scope_id:
        # Include both specific scope AND global
        scope_filter = (
            (item.scope_type == scope_type and item.scope_id == scope_id) |
            (item.scope_type == 'global')
        )
    else:
        scope_filter = (item.scope_type == 'global')

    return base_filter & scope_filter

Comment on lines +50 to +59
**Implementation:** `memory_store.py` lines 296-301:
```python
if scope_type is not None:
if scope_type == "global":
stmt = stmt.where(or_(MemoryItemModel.scope_type == scope_type, MemoryItemModel.scope_type.is_(None)))
else:
stmt = stmt.where(MemoryItemModel.scope_type == scope_type)
if scope_id is not None:
stmt = stmt.where(MemoryItemModel.scope_id == scope_id)
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a discrepancy between the query logic described in the main design document (P0_memory_scope_acceptance_design.md) and the implementation snippet shown here. The design doc states that queries for a specific scope (e.g., track) should also include global items. However, the snippet here (and the actual implementation) only filters for the specific scope_type, excluding global items. This could lead to unexpected behavior where global memories are not retrieved when working within a specific track. The documentation and implementation should be aligned with the intended design.

Comment on lines +18 to +19
# Add src to path for imports
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "src"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Manipulating sys.path is generally considered an anti-pattern and can make the test suite fragile and harder to run in different environments. A better approach is to make your project installable in editable mode. By running pip install -e . from the project root, the src directory will be added to Python's path correctly, and you can use absolute imports like from paperbot.infrastructure.stores.memory_store import SqlAlchemyMemoryStore without modifying sys.path. This change should be applied to other new test files in this PR as well.

Comment on lines +21 to +29
_metric_collector: Optional[MemoryMetricCollector] = None


def _get_metric_collector() -> MemoryMetricCollector:
"""Lazy initialization of metric collector."""
global _metric_collector
if _metric_collector is None:
_metric_collector = MemoryMetricCollector()
return _metric_collector

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of a global, lazily-initialized _metric_collector is not ideal in FastAPI as it introduces global mutable state, which can be problematic for testing and concurrency. A more idiomatic FastAPI approach is to use the dependency injection system. This improves testability by allowing you to easily inject a mock collector during tests.

You could refactor this to:

from fastapi import Depends

# You can initialize it once at the module level
_metric_collector = MemoryMetricCollector()

def get_metric_collector() -> MemoryMetricCollector:
    """FastAPI dependency to get the metric collector instance."""
    return _metric_collector

And then in your route functions, you would use Depends:

@router.post("/research/memory/bulk_moderate", response_model=BulkModerateResponse)
def bulk_moderate(
    req: BulkModerateRequest, 
    background_tasks: BackgroundTasks, 
    collector: MemoryMetricCollector = Depends(get_metric_collector)
):
    # ...
    if high_confidence_rejected > 0:
        collector.record_false_positive_rate(...)
    # ...

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/paperbot/api/routes/research.py (1)

374-407: Verify total_approved_count denominator in false positive calculation.

The metric is recorded with total_approved_count=len(items_before), but items_before may include items that were already rejected or pending (not just approved). This could skew the false positive rate calculation.

Consider filtering to only count items that were previously approved:

🔎 Proposed fix
     if req.status == "rejected" and items_before:
+        # Only count items that were previously approved as the denominator
+        previously_approved = [
+            item for item in items_before
+            if item.get("status") == "approved"
+        ]
         high_confidence_rejected = sum(
-            1 for item in items_before
+            1 for item in previously_approved
             if item.get("confidence", 0) >= 0.60 and item.get("status") == "approved"
         )
         if high_confidence_rejected > 0:
             collector = _get_metric_collector()
             collector.record_false_positive_rate(
                 false_positive_count=high_confidence_rejected,
-                total_approved_count=len(items_before),
+                total_approved_count=len(previously_approved),
                 evaluator_id=f"user:{req.user_id}",
🧹 Nitpick comments (12)
docs/memory_types.md (1)

64-87: Add language identifier to fenced code block.

The status workflow diagram should specify a language identifier (e.g., text or mermaid) for better Markdown rendering and to satisfy linter requirements.

🔎 Proposed fix
-```
+```text
 Extracted (MemoryCandidate)
          │

Alternatively, consider using Mermaid syntax for better visualization:

-```
+```mermaid
+graph TD
+    A[Extracted MemoryCandidate] --> B{confidence >= 0.60 AND pii_risk < 2}
+    B -->|Yes| C[approved]
+    B -->|No| D[pending]
+    D --> E[approved]
+    D --> F[rejected]
+    C --> G[Can be retrieved and injected]
tests/unit/test_memory_module.py (1)

143-143: Fix unused variable.

The rows variable is unpacked but never used in this test. Prefix it with an underscore to indicate it's intentionally unused.

🔎 Proposed fix
-    created, _, rows = store.add_memories(user_id="u1", memories=candidates)
+    created, _, _rows = store.add_memories(user_id="u1", memories=candidates)
evals/memory/test_deletion_compliance.py (3)

18-19: Consider using a proper package structure or pytest configuration.

The sys.path manipulation is fragile; paths may break if the file is moved. For standalone scripts this is acceptable, but consider using pytest with proper conftest.py or installing the package in editable mode.


53-57: Prefix unused variables with underscore.

Per the static analysis hint, created and skipped are never used.

🔎 Proposed fix
-        created, skipped, _rows = store.add_memories(
+        _created, _skipped, _rows = store.add_memories(
             user_id=user_id,
             memories=test_memories,
             actor_id="test",
         )

140-140: Remove extraneous f-prefix.

Per static analysis, line 140 has an f-string without placeholders.

🔎 Proposed fix
-    print(f"\nResults:")
+    print("\nResults:")
src/paperbot/api/routes/memory.py (1)

270-270: Module-level singleton initialized at import time.

_metric_collector = MemoryMetricCollector() is instantiated when the module loads. This will create a database session provider and run create_all() immediately. If the database isn't available at import time, this could cause startup failures.

Consider lazy initialization (similar to _get_metric_collector() in research.py) for consistency and to avoid import-time side effects.

🔎 Proposed lazy initialization
-_metric_collector = MemoryMetricCollector()
+_metric_collector: Optional[MemoryMetricCollector] = None
+
+
+def _get_metric_collector() -> MemoryMetricCollector:
+    global _metric_collector
+    if _metric_collector is None:
+        _metric_collector = MemoryMetricCollector()
+    return _metric_collector

Then update usages:

-    return _metric_collector.get_metrics_summary()
+    return _get_metric_collector().get_metrics_summary()
evals/memory/test_retrieval_hit_rate.py (3)

31-39: Consider adding error handling for missing fixture files.

If the fixture files don't exist, the test will fail with a generic FileNotFoundError. Providing a more descriptive error message would help debugging.

🔎 Proposed improvement
 def load_fixtures():
     """Load test fixtures."""
-    with open(FIXTURES_DIR / "sample_memories.json") as f:
-        memories_data = json.load(f)
-
-    with open(FIXTURES_DIR / "retrieval_queries.json") as f:
-        queries_data = json.load(f)
+    memories_path = FIXTURES_DIR / "sample_memories.json"
+    queries_path = FIXTURES_DIR / "retrieval_queries.json"
+    
+    if not memories_path.exists():
+        raise FileNotFoundError(f"Missing fixture: {memories_path}")
+    if not queries_path.exists():
+        raise FileNotFoundError(f"Missing fixture: {queries_path}")
+    
+    with open(memories_path) as f:
+        memories_data = json.load(f)
+
+    with open(queries_path) as f:
+        queries_data = json.load(f)
 
     return memories_data, queries_data

53-56: Close the store and collector to prevent resource leaks.

Neither SqlAlchemyMemoryStore nor MemoryMetricCollector are explicitly closed. Based on the codebase, SqlAlchemyMemoryStore has a close() method that disposes the engine. Without closing, SQLite file handles may remain open, potentially causing issues with the os.unlink(db_path) cleanup.

🔎 Proposed fix
     try:
         db_url = f"sqlite:///{db_path}"
         store = SqlAlchemyMemoryStore(db_url=db_url)
         collector = MemoryMetricCollector(db_url=db_url)
 
         # ... rest of the code ...
 
         return {
             "passed": passed,
             "total_queries": total_queries,
             "hits": hits,
             "hit_rate": hit_rate,
             "details": details,
         }
 
     finally:
         # Cleanup
+        store.close()
         if os.path.exists(db_path):
             os.unlink(db_path)

150-155: Remove extraneous f prefix from strings without placeholders.

Lines 150 and 154 use f-strings without any interpolation. As flagged by static analysis (Ruff F541), these should be regular strings.

🔎 Proposed fix
-    print(f"\nResults:")
+    print("\nResults:")
     print(f"  Total queries: {result['total_queries']}")
     print(f"  Hits: {result['hits']}")
     print(f"  Hit rate: {result['hit_rate']:.1%}")
-    print(f"  Target: >= 80%")
+    print("  Target: >= 80%")
     print(f"  Status: {'PASS' if result['passed'] else 'FAIL'}")
tests/integration/test_scope_and_acceptance_criteria_hooks.py (2)

91-95: Consider asserting the expected false_positive_rate value.

Unlike other tests that verify both existence and value of the metric, this test only checks that false_positive_rate exists in the metrics. For consistency and stronger verification, consider asserting the expected value.

🔎 Proposed improvement
     # Verify false_positive_rate metric was recorded
     metrics_response = test_client.get("/api/memory/metrics")
     assert metrics_response.status_code == 200
     metrics = metrics_response.json()
     assert "false_positive_rate" in metrics["metrics"]
+    # One high-confidence item rejected = 100% false positive rate for this batch
+    assert metrics["metrics"]["false_positive_rate"]["value"] == 1.0

108-120: Add response status assertions when creating test memories.

The loop creates memories but doesn't verify the responses succeeded. If memory creation fails, the test could produce misleading results (e.g., deleted_count would be 0 instead of 3).

🔎 Proposed fix
     # Create some memories in the track
     for i in range(3):
-        test_client.post(
+        resp = test_client.post(
             "/api/research/memory/items",
             json={
                 "user_id": "test_user",
                 "kind": "fact",
                 "content": f"Test fact {i}",
                 "confidence": 0.7,
                 "scope_type": "track",
                 "scope_id": str(track_id),
             },
         )
+        assert resp.status_code == 200, f"Failed to create memory {i}: {resp.text}"
src/paperbot/memory/eval/collector.py (1)

43-50: Annotate TARGETS as ClassVar for type correctness.

As flagged by static analysis (Ruff RUF012), mutable class attributes should be annotated with typing.ClassVar to indicate they belong to the class, not instances.

🔎 Proposed fix

Add ClassVar to imports:

-from typing import Any, Dict, List, Optional
+from typing import Any, ClassVar, Dict, List, Optional

Then annotate the constant:

     # P0 target thresholds
-    TARGETS = {
+    TARGETS: ClassVar[Dict[str, float]] = {
         "extraction_precision": 0.85,
         "false_positive_rate": 0.05,
         "retrieval_hit_rate": 0.80,
         "injection_pollution_rate": 0.02,
         "deletion_compliance": 1.00,
     }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e39cde and 982cb19.

📒 Files selected for processing (20)
  • .github/workflows/ci.yml
  • docs/P0_memory_scope_acceptance_design.md
  • docs/memory_isolation.md
  • docs/memory_types.md
  • evals/memory/__init__.py
  • evals/memory/fixtures/retrieval_queries.json
  • evals/memory/fixtures/sample_memories.json
  • evals/memory/test_deletion_compliance.py
  • evals/memory/test_retrieval_hit_rate.py
  • requirements-ci.txt
  • requirements.txt
  • src/paperbot/api/routes/memory.py
  • src/paperbot/api/routes/research.py
  • src/paperbot/infrastructure/stores/memory_store.py
  • src/paperbot/infrastructure/stores/models.py
  • src/paperbot/memory/eval/__init__.py
  • src/paperbot/memory/eval/collector.py
  • tests/integration/test_scope_and_acceptance_criteria_hooks.py
  • tests/unit/test_memory_metric_collector.py
  • tests/unit/test_memory_module.py
🧰 Additional context used
🧬 Code graph analysis (8)
tests/unit/test_memory_module.py (2)
src/paperbot/memory/schema.py (1)
  • MemoryCandidate (39-48)
src/paperbot/infrastructure/stores/memory_store.py (2)
  • list_memories (233-269)
  • get_items_by_ids (271-287)
tests/unit/test_memory_metric_collector.py (1)
src/paperbot/memory/eval/collector.py (8)
  • MemoryMetricCollector (24-243)
  • record_extraction_precision (57-74)
  • get_latest_metrics (179-200)
  • record_false_positive_rate (76-93)
  • record_retrieval_hit_rate (95-112)
  • record_injection_pollution_rate (114-131)
  • record_deletion_compliance (133-155)
  • get_metrics_summary (202-210)
evals/memory/test_deletion_compliance.py (1)
src/paperbot/memory/eval/collector.py (1)
  • MemoryMetricCollector (24-243)
src/paperbot/api/routes/memory.py (1)
src/paperbot/memory/eval/collector.py (3)
  • MemoryMetricCollector (24-243)
  • get_metrics_summary (202-210)
  • get_metric_history (223-243)
src/paperbot/memory/eval/__init__.py (1)
src/paperbot/memory/eval/collector.py (1)
  • MemoryMetricCollector (24-243)
evals/memory/test_retrieval_hit_rate.py (3)
src/paperbot/infrastructure/stores/memory_store.py (3)
  • SqlAlchemyMemoryStore (40-657)
  • add_memories (138-231)
  • search_memories (289-352)
src/paperbot/memory/schema.py (1)
  • MemoryCandidate (39-48)
src/paperbot/memory/eval/collector.py (2)
  • MemoryMetricCollector (24-243)
  • record_retrieval_hit_rate (95-112)
src/paperbot/memory/eval/collector.py (3)
src/paperbot/infrastructure/stores/models.py (2)
  • Base (11-12)
  • MemoryEvalMetricModel (322-347)
src/paperbot/infrastructure/stores/sqlalchemy_db.py (3)
  • SessionProvider (48-56)
  • get_db_url (15-16)
  • session (55-56)
src/paperbot/api/routes/memory.py (1)
  • get_metric_history (300-308)
src/paperbot/api/routes/research.py (2)
src/paperbot/memory/eval/collector.py (4)
  • MemoryMetricCollector (24-243)
  • record_false_positive_rate (76-93)
  • record_retrieval_hit_rate (95-112)
  • record_deletion_compliance (133-155)
src/paperbot/infrastructure/stores/memory_store.py (3)
  • get_items_by_ids (271-287)
  • list_memories (233-269)
  • search_memories (289-352)
🪛 LanguageTool
docs/P0_memory_scope_acceptance_design.md

[uncategorized] ~648-~648: The official name of this software platform is spelled with a capital “H”.
Context: ...ynthetic test data ### 8.4 CI/CD - [ ] .github/workflows/ci.yml - Memory eval job add...

(GITHUB)


[uncategorized] ~739-~739: The official name of this software platform is spelled with a capital “H”.
Context: ...---------------------------------| | .github/workflows/ci.yml | Modified (added memo...

(GITHUB)

🪛 markdownlint-cli2 (0.18.1)
docs/memory_isolation.md

16-16: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


106-106: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/P0_memory_scope_acceptance_design.md

16-16: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


106-106: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


147-147: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


323-323: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


437-437: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


603-603: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


721-721: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


721-721: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


729-729: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


729-729: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)

docs/memory_types.md

16-16: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


106-106: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.10)
tests/unit/test_memory_module.py

143-143: Unpacked variable rows is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

evals/memory/test_deletion_compliance.py

53-53: Unpacked variable created is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


53-53: Unpacked variable skipped is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


140-140: f-string without any placeholders

Remove extraneous f prefix

(F541)

evals/memory/test_retrieval_hit_rate.py

150-150: f-string without any placeholders

Remove extraneous f prefix

(F541)


154-154: f-string without any placeholders

Remove extraneous f prefix

(F541)

src/paperbot/memory/eval/collector.py

44-50: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🔇 Additional comments (30)
evals/memory/__init__.py (1)

1-7: LGTM! Clear package documentation.

The docstring provides clear documentation of the memory evaluation test suite and its components.

docs/memory_isolation.md (1)

1-144: LGTM! Comprehensive isolation strategy documentation.

This document provides clear, detailed documentation of the memory namespace hierarchy and isolation rules. The inclusion of implementation references, code examples, and explicit behavior specifications makes it valuable for both developers and maintainers.

docs/P0_memory_scope_acceptance_design.md (1)

1-739: LGTM! Excellent comprehensive design document.

This design document is thorough, well-structured, and provides valuable context for the P0 memory scope implementation. Key strengths:

  • Clear architecture mapping across all 5 layers
  • Detailed acceptance criteria with measurable metrics
  • Comprehensive risk analysis and mitigation strategies
  • Realistic workload estimation and timeline
  • Strong references to academic papers and industry practices

The document effectively bridges strategic planning with implementation details, making it a valuable resource for both current development and future maintenance.

docs/memory_types.md (1)

1-141: Excellent documentation structure.

The memory taxonomy is comprehensive and well-organized. The breakdown of memory types, confidence thresholds, PII risk levels, and status workflows provides clear guidance for implementation.

.github/workflows/ci.yml (1)

51-56: LGTM! Clean integration of memory acceptance tests.

The new CI step properly sets PYTHONPATH and executes the memory acceptance tests. This aligns well with the PR's goal of validating deletion compliance and retrieval hit rate metrics.

evals/memory/fixtures/sample_memories.json (1)

1-42: LGTM! Well-structured test fixture.

The fixture provides good coverage of memory types with realistic confidence scores that align with the taxonomy thresholds defined in memory_types.md.

evals/memory/fixtures/retrieval_queries.json (1)

1-35: LGTM! Comprehensive retrieval test cases.

The fixture defines clear test cases with expected outcomes for validating retrieval hit rates across different memory kinds.

tests/unit/test_memory_module.py (1)

130-161: Excellent test coverage.

The test validates:

  • Basic retrieval by IDs
  • Empty list edge case
  • User isolation guarantees

The approach of using list_memories to get IDs is correct since SQLAlchemy session objects are detached after the transaction.

tests/unit/test_memory_metric_collector.py (4)

1-27: Well-structured unit tests with good isolation.

Tests correctly use tmp_path for database isolation and verify boundary conditions (e.g., 0.85 == target meeting the >= 0.85 threshold). Coverage of the extraction precision recording and retrieval is complete.


29-59: Good coverage of pass/fail target logic for false positive rate.

Tests correctly verify both the passing case (3% ≤ 5% target) and failing case (10% > 5% target), confirming the "lower is better" logic for this metric.


96-128: Deletion compliance calculation verified correctly.

The test at line 127 correctly validates the formula 1.0 - (deleted_retrieved / deleted_total) = 1.0 - (2/10) = 0.8, and properly asserts this fails the 100% target.


160-189: Essential edge case coverage for zero totals and history ordering.

  • test_collector_get_metric_history validates descending order (most recent first)
  • test_collector_skips_zero_totals ensures no division-by-zero and no spurious metrics recorded

Both are critical for robustness.

src/paperbot/memory/eval/__init__.py (1)

1-14: Clean module initialization with proper public API exposure.

The docstring clearly documents the 5 P0 metrics, and __all__ correctly limits the public surface to MemoryMetricCollector.

evals/memory/test_deletion_compliance.py (1)

26-129: Test logic is sound and comprehensive.

The test correctly:

  1. Creates memories, soft-deletes a subset
  2. Attempts retrieval via multiple search queries and list_memories
  3. Verifies deleted items are not returned
  4. Records the metric via MemoryMetricCollector

The binary compliance calculation (line 104) is appropriate for a pass/fail acceptance test.

src/paperbot/infrastructure/stores/models.py (1)

322-347: Well-designed metrics model with appropriate indexing.

The schema correctly:

  • Indexes metric_name and evaluated_at to support efficient queries for latest values and history
  • Uses nullable sample_size for flexibility
  • Provides safe default "{}" for detail_json

The docstring clearly documents the metric definitions and references the design doc.

src/paperbot/infrastructure/stores/memory_store.py (1)

271-287: Implementation is correct with proper user isolation.

The method efficiently handles the empty-list case and correctly filters by user_id. Note that unlike list_memories, this method does not filter out soft-deleted items (deleted_at IS NOT NULL). This appears intentional for the current use case (checking item state before moderation), but consider adding an include_deleted parameter for consistency with other methods if broader reuse is expected.

src/paperbot/api/routes/memory.py (1)

279-308: Metrics endpoints are well-designed.

  • Input validation for metric_name against TARGETS prevents invalid queries
  • Response models clearly define the contract
  • The limit parameter has sensible bounds (1-100)
  • Error message helpfully lists valid metric names
src/paperbot/api/routes/research.py (2)

21-29: Lazy initialization pattern is appropriate.

This avoids import-time database connections and is consistent with the recommended pattern.


446-509: Feedback endpoint is well-designed.

The endpoint clearly documents its purpose and correctly calculates hit_rate = hits / total. The detail dictionary captures comprehensive context for debugging and analysis.

evals/memory/test_retrieval_hit_rate.py (1)

42-134: Well-structured acceptance test with proper isolation.

The test logic correctly:

  • Uses an isolated temporary database
  • Loads fixtures and inserts memories
  • Runs queries and tracks hits with intersection checking
  • Records the metric with detailed test case information
  • Handles edge case of zero queries
tests/integration/test_scope_and_acceptance_criteria_hooks.py (4)

16-24: LGTM - Good use of lazy import for proper database isolation.

The import of api_main inside the fixture after setting PAPERBOT_DB_URL ensures the application is initialized with the test database URL. This is a correct pattern for FastAPI integration testing with environment-based configuration.


27-51: Good end-to-end test verifying both response and side effect.

The test correctly validates the feedback endpoint response and then verifies that the retrieval_hit_rate metric was recorded with the expected value.


139-166: LGTM - Good coverage of metrics endpoint functionality.

The test validates the initial empty state, records a metric through the feedback endpoint, and verifies the history endpoint returns the expected data.


169-186: LGTM - Validates pass status for metrics meeting target.

The test correctly verifies that a metric exceeding the target reports meets_target: True. Consider adding a test case for a failing metric (e.g., hit rate below 80%) in future iterations for more complete coverage.

src/paperbot/memory/eval/collector.py (6)

57-155: LGTM - Recording methods are well-structured with proper guards.

All five recording methods:

  • Guard against division by zero with early returns
  • Follow a consistent pattern
  • Delegate to _record_metric with appropriate parameters
  • Have clear docstrings explaining the metric semantics

179-200: LGTM - Query pattern is acceptable for fixed number of metrics.

While this issues 5 separate queries (one per metric type), this is acceptable given:

  • The number of metrics is fixed and small (5)
  • This is an evaluation/monitoring endpoint, not a hot path
  • Code clarity is prioritized over micro-optimization

202-210: Edge case: Empty metrics returns "pass" status.

When no metrics have been recorded, latest.values() is empty, and all() on an empty iterable returns True. This means the summary will report "pass" even when no metrics exist.

Consider whether this is the desired behavior, or if an empty state should report "unknown" or "incomplete".

🔎 Possible alternative if empty state should differ
     def get_metrics_summary(self) -> Dict[str, Any]:
         """Get summary of all metrics with pass/fail status."""
         latest = self.get_latest_metrics()
+        if not latest:
+            status = "no_data"
+        else:
-        all_pass = all(m.get("meets_target", False) for m in latest.values())
+            all_pass = all(m.get("meets_target", False) for m in latest.values())
+            status = "pass" if all_pass else "fail"
         return {
-            "status": "pass" if all_pass else "fail",
+            "status": status,
             "metrics": latest,
             "targets": self.TARGETS,
         }

212-221: LGTM - Correct threshold logic for different metric types.

The _meets_target method correctly handles:

  • "Lower is better" metrics (false_positive_rate, injection_pollution_rate): uses <=
  • "Higher is better" metrics (extraction_precision, retrieval_hit_rate, deletion_compliance): uses >=

223-242: LGTM - Clean implementation of metric history retrieval.

The method correctly:

  • Filters by metric name
  • Orders by most recent first
  • Respects the limit parameter
  • Returns a well-structured list of historical values

167-177: No action needed. The code correctly uses the session context manager pattern. The project explicitly requires SQLAlchemy 2.0+, which natively supports the context manager protocol for Session objects. The sessionmaker is configured with future=True, ensuring proper 2.0 behavior. The usage with self._provider.session() as session: is safe and correct.

# Keep this smaller than requirements.txt to avoid heavy system deps (e.g. weasyprint).

fastapi==0.115.0
python-multipart>=0.0.6
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -name "requirements-ci.txt" -type f

Repository: jerry609/PaperBot

Length of output: 82


🏁 Script executed:

cat -n requirements-ci.txt

Repository: jerry609/PaperBot

Length of output: 1047


🌐 Web query:

requests 2.28.0 CVE security vulnerability 2023 2024 2025

💡 Result:

Short answer: Yes — requests 2.28.0 is affected by multiple CVEs discovered in 2023–2025. Upgrade to a fixed release (see details below).

Details (affected in 2.28.0 and fixed releases)

  • CVE-2023-32681 — unintended leak of Proxy-Authorization header. Affects many 2.x releases including 2.28.0; fixed in later releases (recommended >= 2.31.0). [1]
  • CVE-2024-35195 — Sessions that first use verify=False can cause subsequent requests on that connection to skip cert verification; fixed in 2.32.0. [2]
  • CVE-2024-47081 — URL parsing issue that may leak .netrc credentials; affects requests < 2.32.4 and is fixed in 2.32.4. For older versions you can mitigate by disabling .netrc usage (Session(trust_env=False)). [3]

Recommended action: Upgrade to requests >= 2.32.4 (covers the .netrc fix) or at minimum to the specific fixed versions noted above.

Sources:
[1] (CVE-2023-32681 / disclosure/fix guidance).
[2] (CVE-2024-35195 — fixed in 2.32.0).
[3] (CVE-2024-47081 — fixed in 2.32.4).


🌐 Web query:

python-dotenv 0.19.0 CVE security vulnerability

💡 Result:

I found no public CVE that specifically affects python-dotenv v0.19.0. Vendor/DB checks show no direct CVE for that version and the project's security page lists no published advisories. Sources: Snyk (package view), Safety DB, and the python-dotenv GitHub security page. [1][2][3]

Recommendation: if you can, upgrade to the latest non‑vulnerable release (Safety DB / Snyk show the current recommended version is 1.2.1) and monitor the project's security page or NVD for new advisories. [2][1]

Sources

  • Snyk — python-dotenv 0.19.0 (no direct vulnerabilities listed). [1]
  • Safety DB — python-dotenv versions / latest recommended version. [2]
  • python-dotenv GitHub — security policy / no published advisories. [3]

🌐 Web query:

jinja2 3.1.0 CVE security vulnerability 2023 2024 2025

💡 Result:

Short answer: Yes — Jinja2 3.1.0 is vulnerable to multiple disclosed CVEs (December 2024–March 2025). Upgrade to a patched release (>= 3.1.6) or apply vendor fixes.

What affects 3.1.0 (examples):

Recommended action:

  • Upgrade Jinja2 to the latest patched release (at minimum 3.1.6 to cover the 2025 |attr issue). (alas.aws.amazon.com)
  • If you cannot upgrade immediately, follow your OS/vendor security advisories for backported fixes (Ubuntu/Red Hat/SUSE/Amazon Linux advisories exist). (ubuntu.com)

If you want, I can (1) list exact patched release notes and links for each CVE, or (2) scan a requirements.txt / environment to see if you're using a vulnerable Jinja2 version.

Citations:


🌐 Web query:

gitpython 3.1.0 CVE security vulnerability 2023 2024 2025

💡 Result:

Short answer: Yes — GitPython 3.1.0 is affected.

Details:

  • CVE-2023-40267 (RCE via insufficient sanitization in clone/clone_from) affects GitPython versions < 3.1.32 (so 3.1.0 is vulnerable). Update fixed in 3.1.32. [1][2]
  • CVE-2024-22190 (untrusted search path / arbitrary code execution on Windows, an incomplete follow-up to the 2023 issue) affects versions prior to 3.1.41 (so 3.1.0 is vulnerable). Update fixed in 3.1.41. [3][4]

Recommended action: upgrade to the latest GitPython release (at minimum >= 3.1.41) and audit any code that passes user-controlled options to clone/clone_from or uses shells on Windows. [1][3]

Sources:

  1. GitLab Advisory / Debian / Ubuntu references for CVE-2023-40267 (fix in 3.1.32). [GitLab advisory / Ubuntu]. (advisories.gitlab.com)
  2. Additional writeups and advisories for CVE-2023-40267. (resolvedsecurity.com)
  3. NVD / INCIBE / other advisories for CVE-2024-22190 (fix in 3.1.41). (nvd.nist.gov)
  4. Vendor/third‑party summaries noting patch in 3.1.41. (advisories.gitlab.com)

If you want, I can check your project for which GitPython version is installed and show the exact upgrade command.

Citations:


🌐 Web query:

uvicorn 0.32.0 CVE security vulnerability

💡 Result:

Short answer: There are no public CVEs that specifically target uvicorn 0.32.0. The known CVEs (e.g., CVE-2020-7694 and CVE-2020-7695) affect much older releases (<0.11.7) and not 0.32.0. As of January 4, 2026, Uvicorn's release notes show 0.32.0 (Oct 15, 2024) with no CVE listed for that release; vulnerability databases likewise attribute the historical CVEs to older versions. [1][2][3]

Recommendation: If you run 0.32.0, upgrade to a recent patched release (check the latest on the Uvicorn release notes or PyPI) and monitor vendor advisories. [1]

Sources:

  • Uvicorn release notes (0.32.0 entry). [1]
  • Ubuntu / OSV entry for CVE-2020-7694 (log/ANSI escape injection) — affects older versions. [2]
  • Ubuntu entry for CVE-2020-7695 (HTTP response splitting) — affects older versions. [3]

🌐 Web query:

markdown 3.4.0 CVE security vulnerability

💡 Result:

Short answer: There is no single CVE named for "markdown 3.4.0" (the Python-Markdown package). However, several Markdown-related projects and extensions (PyMdown, Python‑Markdown plugins, and other Markdown libraries) have had CVEs that can affect systems using older Markdown versions or extensions. See notable examples and advisories below.

  • Arbitrary file inclusion in PyMdown Snippets (CVE-2023-32309) — affects PyMdown (an extensions package for Python‑Markdown); patched in newer PyMdown releases. [1][2]
  • Security databases list older Python‑Markdown releases (including 3.4.x) among versions with known issues; maintainers recommend upgrading to a current, supported release. [3]

Recommendation: if you run Python‑Markdown 3.4.0 or use PyMdown/extensions, upgrade to the latest patched releases for both python-markdown and any extensions (or apply vendor advisories) and audit any markdown-processing that consumes untrusted input.

Sources:

  1. NVD — CVE-2023-32309 (PyMdown Snippets arbitrary file inclusion). [turn1search6].
  2. CVEDetails / advisories for CVE-2023-32309. [turn1search3].
  3. Safety DB / package security summary for python-markdown (version list & guidance). [turn1search4].

🌐 Web query:

aiofiles 22.1.0 CVE security vulnerability

💡 Result:

Short answer: There is no publicly reported CVE specifically affecting aiofiles 22.1.0 as of Jan 4, 2026. Security databases (Snyk, Safety DB, Meterian) list no direct CVEs for aiofiles v22.1.0. [1][2][3]

Related note: an unrelated older CVE (CVE-2020-8162) affected very early aiofiles/RHEL packaging in 2020/2021; it does not target aiofiles 22.1.0. [4]

Recommendation: if you depend on aiofiles, keep it updated to the latest stable release and run your own SBOM/vulnerability scan (e.g., Snyk, OSV, Safety) to catch new advisories.

Sources:

  • Snyk package page for aiofiles (no known direct vulnerabilities). [1]
  • Safety DB / package-versions listing showing 22.1.0 with no reported CVE. [2]
  • Meterian component page for aiofiles 22.1.0 (no known vulnerabilities). [3]
  • Snyk/RHEL advisory for CVE-2020-8162 (older, unrelated). [4]

Would you like me to run a live scan of a requirements.txt or check the latest aiofiles release?


Upgrade vulnerable dependencies to patch critical security issues.

Verification confirms several dependencies have known security vulnerabilities:

No CVEs found in python-dotenv 0.19.0, uvicorn 0.32.0, markdown 3.4.0, or aiofiles 22.1.0, though these remain outdated and should be periodically reviewed.

🤖 Prompt for AI Agents
In requirements-ci.txt around lines 5 to 5, several dependencies are vulnerable;
update the package pins in the file to safe minimum versions: set requests to
>=2.32.4, jinja2 to >=3.1.6, and gitpython to >=3.1.41 (leave other packages
unchanged unless review dictates); after updating, regenerate any
lock/constraints files (pip-compile/poetry lock), run the test suite and
dependency-scan/safety checks locally, and commit the updated requirements and
lockfiles to ensure CI uses the patched versions.


# FastAPI backend for CLI/Web
fastapi==0.115.0
python-multipart>=0.0.6 # Required for file uploads
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check python-multipart version info and security data from PyPI

echo "=== Checking python-multipart on PyPI ==="
curl -s https://pypi.org/pypi/python-multipart/json | jq '{
  current_version: .info.version,
  latest_release: .releases | keys[-1],
  all_versions: (.releases | keys | reverse | .[0:10]),
  requires_python: .info.requires_python,
  summary: .info.summary
}'

echo -e "\n=== Release history (last 15 versions) ==="
curl -s https://pypi.org/pypi/python-multipart/json | jq -r '.releases | keys | reverse | .[0:15] | .[]'

Repository: jerry609/PaperBot

Length of output: 549


🌐 Web query:

python-multipart security vulnerabilities CVE advisories

💡 Result:

Found multiple advisories for python-multipart DoS issues. Summary and action items:

  • CVE-2024-24762 — regex ReDoS in Content-Type parsing; fixed in python-multipart 0.0.7. Remediate by upgrading to >= 0.0.7. [1][2]

  • CVE-2024-53981 — excessive per-byte skipping/logging before first/after last boundary causing DoS; fixed in python-multipart 0.0.18. Remediate by upgrading to >= 0.0.18. [3][4]

  • Related: CVE-2023-30798 — unlimited multipart parts (too many fields/files) causes high CPU/memory; mitigated by Starlette 0.25.0 (defaults max_fields/max_files=1000) or FastAPI 0.92.0. If you can’t upgrade python-multipart, ensure Starlette/FastAPI are updated or set max_fields/max_files when parsing forms. [5]

Sources:
[1] NVD / CVE-2024-24762.
[2] Debian security tracker (python-multipart / CVE-2024-24762).
[3] NVD / CVE-2024-53981 (and GitHub advisory).
[4] Debian security tracker (python-multipart / CVE-2024-53981).
[5] Seal Security / advisory on multipart parser (CVE-2023-30798) and Starlette/FastAPI fixes.

Would you like exact upgrade commands (pip / requirements) or a quick scan command to detect vulnerable versions in your environment?


🌐 Web query:

FastAPI 0.115.0 python-multipart version requirements compatibility

💡 Result:

Short answer: FastAPI 0.115.0 (pre-0.115.4) expects the old "multipart" import, so use an older python-multipart release (pin to 0.0.12 / <=0.0.13). FastAPI added compatibility for newer python-multipart imports in 0.115.4 and then bumped the minimum to >=0.0.18 in later 0.115.x releases. [1][2]

Sources:

  • FastAPI release notes (0.115.4: updated import/check for python-multipart; 0.115.7: bumped to >=0.0.18). [1]
  • FastAPI GitHub discussion showing breakage when python-multipart switched import path (recommendation to use python-multipart==0.0.12 as hotfix). [2]

Tighten the python-multipart version constraint to address security vulnerabilities and FastAPI 0.115.0 compatibility.

The constraint >=0.0.6 allows vulnerable versions. CVE-2024-24762 (ReDoS in Content-Type parsing) requires >=0.0.7, and CVE-2024-53981 (DoS with boundary parsing) requires >=0.0.18. However, FastAPI 0.115.0 has import path compatibility issues with python-multipart >=0.0.14. Use python-multipart>=0.0.7,<=0.0.13 (recommend 0.0.12) for FastAPI 0.115.0, or upgrade FastAPI to 0.115.4+ to support newer python-multipart versions.

🤖 Prompt for AI Agents
In requirements.txt around line 65, the current loose constraint
"python-multipart>=0.0.6" permits vulnerable versions; tighten it to a safe,
FastAPI‑compatible range by changing the requirement to
"python-multipart>=0.0.7,<=0.0.13" (recommended pin to 0.0.12) to address
CVE-2024-24762 and CVE-2024-53981 while preserving FastAPI 0.115.0
compatibility; alternatively, if you prefer newer python-multipart versions,
upgrade FastAPI to 0.115.4+ and then allow python-multipart>=0.0.18.

Comment on lines +535 to +568

# P0 Hook: Verify deletion compliance - deleted items should not be retrievable
if deleted > 0:
# Try to retrieve items from the cleared scope (should return empty)
retrieved_after_delete = _memory_store.list_memories(
user_id=user_id,
scope_type="track",
scope_id=str(track_id),
include_deleted=False,
include_pending=True,
limit=100,
)
# Also try searching
search_results = _memory_store.search_memories(
user_id=user_id,
query="*", # broad query
scope_type="track",
scope_id=str(track_id),
limit=100,
)
retrieved_count = len(retrieved_after_delete) + len(search_results)

collector = _get_metric_collector()
collector.record_deletion_compliance(
deleted_retrieved_count=retrieved_count,
deleted_total_count=deleted,
evaluator_id=f"user:{user_id}",
detail={
"track_id": track_id,
"deleted_count": deleted,
"retrieved_after_delete": retrieved_count,
"action": "clear_track_memory",
},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential double-counting and ineffective search query in deletion compliance check.

Two issues:

  1. Search query "*": The search_memories method tokenizes the query and filters by content.contains(token). A single "*" won't match typical content, so this search may return nothing or fall back to list_memories, potentially returning the same items.

  2. Double-counting: Both list_memories and search_memories could return the same items, inflating retrieved_count.

Consider either:

  • Using only list_memories for the compliance check (simpler, sufficient)
  • Or deduplicating by ID if both are needed
🔎 Proposed simplification
     if deleted > 0:
         # Try to retrieve items from the cleared scope (should return empty)
         retrieved_after_delete = _memory_store.list_memories(
             user_id=user_id,
             scope_type="track",
             scope_id=str(track_id),
             include_deleted=False,
             include_pending=True,
             limit=100,
         )
-        # Also try searching
-        search_results = _memory_store.search_memories(
-            user_id=user_id,
-            query="*",  # broad query
-            scope_type="track",
-            scope_id=str(track_id),
-            limit=100,
-        )
-        retrieved_count = len(retrieved_after_delete) + len(search_results)
+        retrieved_count = len(retrieved_after_delete)

         collector = _get_metric_collector()
         collector.record_deletion_compliance(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# P0 Hook: Verify deletion compliance - deleted items should not be retrievable
if deleted > 0:
# Try to retrieve items from the cleared scope (should return empty)
retrieved_after_delete = _memory_store.list_memories(
user_id=user_id,
scope_type="track",
scope_id=str(track_id),
include_deleted=False,
include_pending=True,
limit=100,
)
# Also try searching
search_results = _memory_store.search_memories(
user_id=user_id,
query="*", # broad query
scope_type="track",
scope_id=str(track_id),
limit=100,
)
retrieved_count = len(retrieved_after_delete) + len(search_results)
collector = _get_metric_collector()
collector.record_deletion_compliance(
deleted_retrieved_count=retrieved_count,
deleted_total_count=deleted,
evaluator_id=f"user:{user_id}",
detail={
"track_id": track_id,
"deleted_count": deleted,
"retrieved_after_delete": retrieved_count,
"action": "clear_track_memory",
},
)
# P0 Hook: Verify deletion compliance - deleted items should not be retrievable
if deleted > 0:
# Try to retrieve items from the cleared scope (should return empty)
retrieved_after_delete = _memory_store.list_memories(
user_id=user_id,
scope_type="track",
scope_id=str(track_id),
include_deleted=False,
include_pending=True,
limit=100,
)
retrieved_count = len(retrieved_after_delete)
collector = _get_metric_collector()
collector.record_deletion_compliance(
deleted_retrieved_count=retrieved_count,
deleted_total_count=deleted,
evaluator_id=f"user:{user_id}",
detail={
"track_id": track_id,
"deleted_count": deleted,
"retrieved_after_delete": retrieved_count,
"action": "clear_track_memory",
},
)
🤖 Prompt for AI Agents
In src/paperbot/api/routes/research.py around lines 535 to 568, the
deletion-compliance check is using both list_memories and search_memories with a
tokenized "*" query which is ineffective and may double-count items; simplify by
using only list_memories to verify that no items remain (remove the
search_memories call and use the length of retrieved_after_delete as
retrieved_count), or if you must keep search_memories, replace the "*" query
with a meaningful query and deduplicate results by memory ID before summing
counts (build a set of IDs from both results and use its size as
retrieved_count) and update the metric payload accordingly.

@jerry609 jerry609 merged commit 1e6d2f0 into jerry609:master Jan 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants