Skip to content

Conversation

@xinquiry
Copy link
Collaborator

@xinquiry xinquiry commented Jan 18, 2026

Summary

Changes Reverted

  • Removed literature cleaners module (DOI normalization, record merging)
  • Removed OpenAlex provider for fetching literature data
  • Removed XLSX export functionality
  • Removed literature models

🤖 Generated with Claude Code

Summary by Sourcery

回滚先前引入的文献相关工具,并在使用知识集时收紧 MCP 工具参数注入行为。

Bug Fixes(错误修复):

  • 当没有可用的代理或绑定的知识集时,阻止执行声明了 knowledge_set_id 的 MCP 工具,从而减少配置错误调用的可能性。

Enhancements(功能增强):

  • 简化 MCP 工具参数注入逻辑,在工具 schema 要求时,仅依赖代理所绑定的 knowledge_set_id

Chores(日常维护):

  • 移除与文献相关的模块,包括清洗器、模型、提供方以及导出工具,这些模块曾在现已回滚的提交中被添加。
Original summary in English

Summary by Sourcery

Revert the previously introduced literature utilities and tighten MCP tool argument injection behavior around knowledge set usage.

Bug Fixes:

  • Prevent execution of MCP tools that declare a knowledge_set_id when no agent or bound knowledge set is available, reducing the chance of misconfigured calls.

Enhancements:

  • Simplify MCP tool argument injection to rely solely on the agent’s bound knowledge_set_id when required by the tool schema.

Chores:

  • Remove literature-related modules including cleaners, models, providers, and export utilities that were added in the reverted commit.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 18, 2026

审阅者指南(在小型 PR 上折叠)

审阅者指南

此 PR 主要回滚之前添加的文献工具(清洗、模型、provider 和导出功能),并简化了 MCP 工具在执行时对 knowledge_set_id 的参数注入逻辑:现在只依赖 agent 上下文,同时移除了对 user_id 注入的支持。

MCP 工具执行中简化后的 knowledge_set_id 注入时序图

sequenceDiagram
    actor Caller
    participant execute_tool_call
    participant agent
    participant mcp_tool

    Caller ->> execute_tool_call: call with tool_name, input_schema, args_dict, agent
    activate execute_tool_call

    execute_tool_call ->> execute_tool_call: check input_schema.properties for knowledge_set_id
    alt knowledge_set_id present
        execute_tool_call ->> execute_tool_call: log Tool requires knowledge_set_id
        alt agent is present
            execute_tool_call ->> agent: get knowledge_set_id
            agent -->> execute_tool_call: ks_id
            alt ks_id is truthy
                execute_tool_call ->> execute_tool_call: inject args_dict[knowledge_set_id] = str(ks_id)
                execute_tool_call ->> execute_tool_call: log injected knowledge_set_id
            else ks_id missing
                execute_tool_call ->> execute_tool_call: log agent has no knowledge_set_id bound
            end
        else no agent
            execute_tool_call ->> execute_tool_call: log no agent context available for injection
        end
    else knowledge_set_id not present
        execute_tool_call ->> execute_tool_call: log tool does not require knowledge_set_id
    end

    execute_tool_call ->> mcp_tool: call_mcp_tool(tool_name, args_dict)
    mcp_tool -->> execute_tool_call: result
    deactivate execute_tool_call
    execute_tool_call -->> Caller: result
Loading

execute_tool_call 中决定是否注入 knowledge_set_id 的流程图

flowchart TD
    A[start execute_tool_call with tool_name, input_schema, args_dict, agent] --> B{input_schema.properties contains knowledge_set_id}
    B -- Yes --> C[log: Tool requires knowledge_set_id]
    B -- No --> J[log: Tool does NOT require knowledge_set_id] --> K[proceed to call_mcp_tool]

    C --> D{agent is not None}
    D -- No --> E[log: No agent context available for injection] --> K
    D -- Yes --> F[get ks_id from agent.knowledge_set_id]

    F --> G{ks_id is truthy}
    G -- Yes --> H[set args_dict.knowledge_set_id to string of ks_id] --> I[log: Injected knowledge_set_id into args] --> K
    G -- No --> L[log: Agent has NO knowledge_set_id bound] --> K

    K --> M[call_mcp_tool with tool_name and args_dict]
    M --> N[end execute_tool_call]
Loading

文件级变更

变更 详情 文件
简化 MCP 工具参数注入逻辑,仅在工具 schema 需要时,从 agent 上下文中注入 knowledge_set_id
  • 在准备 MCP 工具调用参数时,移除了基于 session 的 knowledge_set_iduser_id 解析。
  • 完全停止向 MCP 工具参数中注入 user_id,即使工具 schema 中存在该字段。
  • 现在只在工具 schema 拥有 knowledge_set_id 属性且存在绑定了 knowledge_set_id 的 agent 时才注入,并在缺少 agent 或 knowledge_set_id 时增加日志记录。
service/app/agents/mcp_tools.py
移除了在被回滚 PR 中引入的所有与文献相关的工具、providers、模型以及 MCP 端点。
  • 删除用于暴露文献功能的 MCP 文献入口/模块。
  • 移除文献聚合、清洗(包括 DOI 规范化和记录合并)以及导出工具。
  • 移除文献模型以及用于获取文献元数据的 OpenAlex provider。
service/app/mcp/literature.py
service/app/utils/literature/aggregator.py
service/app/utils/literature/cleaners/__init__.py
service/app/utils/literature/cleaners/base.py
service/app/utils/literature/cleaners/doi.py
service/app/utils/literature/exporter.py
service/app/utils/literature/models.py
service/app/utils/literature/providers/base.py
service/app/utils/literature/providers/openalex.py

提示与命令

与 Sourcery 交互

  • 触发新的审阅: 在 pull request 中评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审阅评论。
  • 从审阅评论生成 GitHub issue: 在审阅评论下回复,要求 Sourcery 从该评论创建 issue。你也可以在审阅评论下回复 @sourcery-ai issue 来从该评论创建 issue。
  • 生成 pull request 标题: 在 pull request 标题的任意位置写上 @sourcery-ai,即可随时生成标题。你也可以在 pull request 中评论 @sourcery-ai title 来(重新)生成标题。
  • 生成 pull request 摘要: 在 pull request 正文任意位置写上 @sourcery-ai summary,即可在该处生成 PR 摘要。你也可以在 pull request 中评论 @sourcery-ai summary 来在任意时间(重新)生成摘要。
  • 生成审阅者指南: 在 pull request 中评论 @sourcery-ai guide,即可随时(重新)生成审阅者指南。
  • 解决所有 Sourcery 评论: 在 pull request 中评论 @sourcery-ai resolve,以解决所有 Sourcery 评论。如果你已经处理完所有评论且不想再看到它们,这非常有用。
  • 驳回所有 Sourcery 审阅: 在 pull request 中评论 @sourcery-ai dismiss,以驳回所有现有的 Sourcery 审阅。特别适用于你想从头开始一个新的审阅时——别忘了再评论 @sourcery-ai review 以触发新的审阅!

自定义你的体验

访问你的 dashboard 以:

  • 启用或禁用审阅功能,例如 Sourcery 生成的 pull request 摘要、审阅者指南等。
  • 更改审阅语言。
  • 添加、删除或编辑自定义审阅说明。
  • 调整其他审阅设置。

获取帮助

Original review guide in English
Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This PR mostly reverts the previously added literature utilities (cleaning, models, provider, and export functionality) and simplifies the MCP tool argument injection logic for knowledge_set_id by relying solely on the agent context while removing user_id injection support.

Sequence diagram for simplified knowledge_set_id injection in MCP tool execution

sequenceDiagram
    actor Caller
    participant execute_tool_call
    participant agent
    participant mcp_tool

    Caller ->> execute_tool_call: call with tool_name, input_schema, args_dict, agent
    activate execute_tool_call

    execute_tool_call ->> execute_tool_call: check input_schema.properties for knowledge_set_id
    alt knowledge_set_id present
        execute_tool_call ->> execute_tool_call: log Tool requires knowledge_set_id
        alt agent is present
            execute_tool_call ->> agent: get knowledge_set_id
            agent -->> execute_tool_call: ks_id
            alt ks_id is truthy
                execute_tool_call ->> execute_tool_call: inject args_dict[knowledge_set_id] = str(ks_id)
                execute_tool_call ->> execute_tool_call: log injected knowledge_set_id
            else ks_id missing
                execute_tool_call ->> execute_tool_call: log agent has no knowledge_set_id bound
            end
        else no agent
            execute_tool_call ->> execute_tool_call: log no agent context available for injection
        end
    else knowledge_set_id not present
        execute_tool_call ->> execute_tool_call: log tool does not require knowledge_set_id
    end

    execute_tool_call ->> mcp_tool: call_mcp_tool(tool_name, args_dict)
    mcp_tool -->> execute_tool_call: result
    deactivate execute_tool_call
    execute_tool_call -->> Caller: result
Loading

Flow diagram for knowledge_set_id injection decision in execute_tool_call

flowchart TD
    A[start execute_tool_call with tool_name, input_schema, args_dict, agent] --> B{input_schema.properties contains knowledge_set_id}
    B -- Yes --> C[log: Tool requires knowledge_set_id]
    B -- No --> J[log: Tool does NOT require knowledge_set_id] --> K[proceed to call_mcp_tool]

    C --> D{agent is not None}
    D -- No --> E[log: No agent context available for injection] --> K
    D -- Yes --> F[get ks_id from agent.knowledge_set_id]

    F --> G{ks_id is truthy}
    G -- Yes --> H[set args_dict.knowledge_set_id to string of ks_id] --> I[log: Injected knowledge_set_id into args] --> K
    G -- No --> L[log: Agent has NO knowledge_set_id bound] --> K

    K --> M[call_mcp_tool with tool_name and args_dict]
    M --> N[end execute_tool_call]
Loading

File-Level Changes

Change Details Files
Simplified MCP tool argument injection logic to only inject knowledge_set_id from the agent context when required by the tool schema.
  • Removed session-based resolution for knowledge_set_id and user_id when preparing MCP tool call arguments.
  • Stopped injecting user_id into MCP tool arguments entirely, even if present in the tool schema.
  • Now only injects knowledge_set_id when the tool schema has a knowledge_set_id property and an agent with a bound knowledge_set_id is available, with additional logging for missing agent or knowledge_set_id.
service/app/agents/mcp_tools.py
Removed all literature-related utilities, providers, models, and MCP endpoints introduced in the reverted PR.
  • Deleted the MCP literature entrypoint/module used to expose literature functionality.
  • Removed literature aggregation, cleaning (including DOI normalization and record merging), and export utilities.
  • Removed literature models and the OpenAlex provider used for fetching literature metadata.
service/app/mcp/literature.py
service/app/utils/literature/aggregator.py
service/app/utils/literature/cleaners/__init__.py
service/app/utils/literature/cleaners/base.py
service/app/utils/literature/cleaners/doi.py
service/app/utils/literature/exporter.py
service/app/utils/literature/models.py
service/app/utils/literature/providers/base.py
service/app/utils/literature/providers/openalex.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@xinquiry xinquiry merged commit f4b1616 into main Jan 18, 2026
7 of 8 checks passed
@xinquiry xinquiry deleted the revert-literature-utils branch January 18, 2026 11:17
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 3 个问题,并给出了一些总体反馈:

  • 回退后的工具参数注入逻辑移除了基于 session 的覆盖和 user_id 传递;如果这次回退是有意为之,建议记录说明或审查之前依赖会话级 knowledge_set_id/user_id 的工具,以便它们要么能优雅降级,要么相应更新。
  • 新的日志消息 Tool '{tool_name}' requires knowledge_set_id 可能有些误导,因为 schema 只表明它“支持”该字段,而不是“强制要求”;建议将措辞改回 “supports”,以避免调试时产生混淆。
  • 在移除文献工具和 OpenAlex provider 之后,请确保所有残留引用(例如对 mcp/literature 工具或模型的引用)也一并清理掉,以避免由悬空的 import 或工具注册导致的运行时错误。
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The reverted tool argument injection logic removes session-based overrides and user_id propagation; if this revert is intentional, consider documenting or auditing tools that relied on session-level knowledge_set_id/user_id so they fail gracefully or are updated accordingly.
- The new log message `Tool '{tool_name}' requires knowledge_set_id` may be misleading since the schema only indicates it is supported, not strictly required; consider changing the wording back to 'supports' to avoid confusion when debugging.
- With the literature utilities and OpenAlex provider removed, ensure any remaining references (e.g., to mcp/literature tools or models) are also cleaned up to avoid runtime errors from dangling imports or tool registrations.

## Individual Comments

### Comment 1
<location> `service/app/agents/mcp_tools.py:167-176` </location>
<code_context>
+                            if "knowledge_set_id" in input_schema.get("properties", {}):
</code_context>

<issue_to_address>
**issue (bug_risk):** 请评估移除会话级 knowledge_set_id 覆盖逻辑,对所有现有调用方是否都是可接受的。

之前的行为是优先使用会话级的 knowledge_set_id(通过 SessionRepository),仅在缺失时才回退到 agent。现在的新行为完全依赖 agent。任何使用每会话覆盖(例如在共享 agent 下按对话切换知识库)的流程,都会失去这种行为,可能会查询到错误的知识库,或者导致依赖特定 ID 的工具调用失败。如果这项变更是有意的,请验证所有依赖会话级 knowledge_set_id 的调用方,或考虑保留 session 级的回退逻辑。
</issue_to_address>

### Comment 2
<location> `service/app/agents/mcp_tools.py:168-176` </location>
<code_context>
+                                logger.info(f"Tool '{tool_name}' requires knowledge_set_id.")
</code_context>

<issue_to_address>
**suggestion:** 如果 knowledge_set_id 在设计上是可选或允许缺失的,那么新的日志文案和日志级别可能会产生误导或噪音。

由于在很多 schema 中,knowledge_set_id 可能是可选字段,把该字段描述为工具“requires”的是不准确的,而且在其缺失时发出警告(包括没有 agent 上下文的情况)会让合法流程也产生日志噪音。建议弱化措辞(例如改为 “supports knowledge_set_id”),并且将日志级别降低到 info/debug,或者仅在该字段确实在 schema 的 `required` 列表中时才使用 warning。
</issue_to_address>

### Comment 3
<location> `service/app/agents/mcp_tools.py:196-205` </location>
<code_context>
-                            # Inject user_id when tool schema supports it.
</code_context>

<issue_to_address>
**🚨 issue (security):** 完全移除 user_id 注入可能会破坏那些依赖它进行范围控制/授权的工具。

之前,只要工具 schema 暴露了 user_id 字段,就会从 session/agent 中注入 user_id。现在该逻辑被移除,这些工具将收不到任何 user_id,这可能会破坏个性化/范围控制,甚至在依赖 user_id 过滤时扩大数据可见范围,或导致结果为空。如果用户范围控制已经迁移到其他机制,请确认所有在 schema 中声明了 user_id 字段的现有工具都已更新为使用新的方案。
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
帮我变得更有用!请对每条评论点 👍 或 👎,我会根据你的反馈改进后续的评审。
Original comment in English

Hey - I've found 3 issues, and left some high level feedback:

  • The reverted tool argument injection logic removes session-based overrides and user_id propagation; if this revert is intentional, consider documenting or auditing tools that relied on session-level knowledge_set_id/user_id so they fail gracefully or are updated accordingly.
  • The new log message Tool '{tool_name}' requires knowledge_set_id may be misleading since the schema only indicates it is supported, not strictly required; consider changing the wording back to 'supports' to avoid confusion when debugging.
  • With the literature utilities and OpenAlex provider removed, ensure any remaining references (e.g., to mcp/literature tools or models) are also cleaned up to avoid runtime errors from dangling imports or tool registrations.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The reverted tool argument injection logic removes session-based overrides and user_id propagation; if this revert is intentional, consider documenting or auditing tools that relied on session-level knowledge_set_id/user_id so they fail gracefully or are updated accordingly.
- The new log message `Tool '{tool_name}' requires knowledge_set_id` may be misleading since the schema only indicates it is supported, not strictly required; consider changing the wording back to 'supports' to avoid confusion when debugging.
- With the literature utilities and OpenAlex provider removed, ensure any remaining references (e.g., to mcp/literature tools or models) are also cleaned up to avoid runtime errors from dangling imports or tool registrations.

## Individual Comments

### Comment 1
<location> `service/app/agents/mcp_tools.py:167-176` </location>
<code_context>
+                            if "knowledge_set_id" in input_schema.get("properties", {}):
</code_context>

<issue_to_address>
**issue (bug_risk):** Consider whether removing the session-level knowledge_set_id override is acceptable for all existing callers.

The previous behavior preferred a session-level knowledge_set_id (via SessionRepository) and only fell back to the agent. The new behavior relies solely on the agent. Any flows that used per-session overrides (e.g., per-conversation knowledge set switching with a shared agent) would now lose that behavior and might query the wrong knowledge set or break tool calls expecting a specific ID. If this change is intentional, please verify all callers that rely on session-level knowledge_set_id, or consider retaining the session fallback.
</issue_to_address>

### Comment 2
<location> `service/app/agents/mcp_tools.py:168-176` </location>
<code_context>
+                                logger.info(f"Tool '{tool_name}' requires knowledge_set_id.")
</code_context>

<issue_to_address>
**suggestion:** The new log wording and levels may be misleading/noisy if knowledge_set_id is optional or missing by design.

Since knowledge_set_id may be optional in many schemas, saying the tool "requires" it is inaccurate, and warning when it’s absent (including when there’s no agent context) will be noisy for valid flows. Consider softening the wording (e.g., "supports knowledge_set_id") and/or lowering the log level to info/debug, or only using a warning when the field is actually required per the schema’s `required` list.
</issue_to_address>

### Comment 3
<location> `service/app/agents/mcp_tools.py:196-205` </location>
<code_context>
-                            # Inject user_id when tool schema supports it.
</code_context>

<issue_to_address>
**🚨 issue (security):** Dropping user_id injection entirely may break tools that relied on it for scoping/authorization.

Previously, user_id was injected from the session/agent whenever the tool schema exposed a user_id field. With that logic removed, those tools will now receive no user_id, which may break personalization/scoping and could widen data visibility or cause empty results if they rely on user_id filters. If user scoping is being handled elsewhere now, please verify that all existing tools with a user_id field in their schema have been updated to use the new approach.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +167 to +176
if "knowledge_set_id" in input_schema.get("properties", {}):
logger.info(f"Tool '{tool_name}' requires knowledge_set_id.")
if agent:
ks_id = getattr(agent, "knowledge_set_id", None)
logger.info(f"Agent found. Knowledge Set ID: {ks_id}")
if ks_id:
args_dict["knowledge_set_id"] = str(ks_id)
logger.info(f"Injected knowledge_set_id: {ks_id} into args")
else:
logger.warning(f"Agent {agent.id} has NO knowledge_set_id bound!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): 请评估移除会话级 knowledge_set_id 覆盖逻辑,对所有现有调用方是否都是可接受的。

之前的行为是优先使用会话级的 knowledge_set_id(通过 SessionRepository),仅在缺失时才回退到 agent。现在的新行为完全依赖 agent。任何使用每会话覆盖(例如在共享 agent 下按对话切换知识库)的流程,都会失去这种行为,可能会查询到错误的知识库,或者导致依赖特定 ID 的工具调用失败。如果这项变更是有意的,请验证所有依赖会话级 knowledge_set_id 的调用方,或考虑保留 session 级的回退逻辑。

Original comment in English

issue (bug_risk): Consider whether removing the session-level knowledge_set_id override is acceptable for all existing callers.

The previous behavior preferred a session-level knowledge_set_id (via SessionRepository) and only fell back to the agent. The new behavior relies solely on the agent. Any flows that used per-session overrides (e.g., per-conversation knowledge set switching with a shared agent) would now lose that behavior and might query the wrong knowledge set or break tool calls expecting a specific ID. If this change is intentional, please verify all callers that rely on session-level knowledge_set_id, or consider retaining the session fallback.

Comment on lines +168 to +176
logger.info(f"Tool '{tool_name}' requires knowledge_set_id.")
if agent:
ks_id = getattr(agent, "knowledge_set_id", None)
logger.info(f"Agent found. Knowledge Set ID: {ks_id}")
if ks_id:
args_dict["knowledge_set_id"] = str(ks_id)
logger.info(f"Injected knowledge_set_id: {ks_id} into args")
else:
logger.warning(f"Agent {agent.id} has NO knowledge_set_id bound!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: 如果 knowledge_set_id 在设计上是可选或允许缺失的,那么新的日志文案和日志级别可能会产生误导或噪音。

由于在很多 schema 中,knowledge_set_id 可能是可选字段,把该字段描述为工具“requires”的是不准确的,而且在其缺失时发出警告(包括没有 agent 上下文的情况)会让合法流程也产生日志噪音。建议弱化措辞(例如改为 “supports knowledge_set_id”),并且将日志级别降低到 info/debug,或者仅在该字段确实在 schema 的 required 列表中时才使用 warning。

Original comment in English

suggestion: The new log wording and levels may be misleading/noisy if knowledge_set_id is optional or missing by design.

Since knowledge_set_id may be optional in many schemas, saying the tool "requires" it is inaccurate, and warning when it’s absent (including when there’s no agent context) will be noisy for valid flows. Consider softening the wording (e.g., "supports knowledge_set_id") and/or lowering the log level to info/debug, or only using a warning when the field is actually required per the schema’s required list.

@codecov
Copy link

codecov bot commented Jan 18, 2026

Codecov Report

❌ Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
service/app/agents/mcp_tools.py 0.00% 11 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants