Skip to content

Conversation

@SoberPizza
Copy link
Collaborator

@SoberPizza SoberPizza commented Jan 27, 2026

Summary by Sourcery

添加一个可切换的内置文献检索工具,用于查询多个学术来源,并将其暴露在聊天工具栏中。

New Features:

  • 引入一个 literature_search 内置工具,使用基于 LangChain 的实现,查询多个学术文献数据源,并以结构化的 markdown/JSON 形式返回结果。
  • 在聊天工具栏 UI 中将文献检索作为可切换工具暴露出来,并提供英文和中文的本地化标签。

Enhancements:

  • 在后端工具注册表、功能配置和加载流水线中注册文献检索工具,使其与现有的网页搜索和知识工具一同可用。

Documentation:

  • 更新 i18n 应用字符串,在工具栏中记录新的文献检索选项。
Original summary in English

Summary by Sourcery

Add a toggleable built-in literature search tool that can query multiple academic sources and expose it in the chat toolbar.

New Features:

  • Introduce a literature_search built-in tool with a LangChain-based implementation that queries multiple academic literature data sources and returns structured markdown/JSON results.
  • Expose literature search as a toggleable tool in the chat toolbar UI with localized labels in English and Chinese.

Enhancements:

  • Register the literature search tool in the backend tool registry, capabilities, and loading pipeline so it is available alongside existing web search and knowledge tools.

Documentation:

  • Update i18n app strings to document the new literature search option in the toolbar.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 27, 2026

Reviewer's Guide

在聊天工具栏中新增一个可切换的内置文献检索工具,可按代理(agent)启用;通过前端的 agent 工具配置进行串联,在后端工具注册表中完成注册和加载,并基于多个数据源实现底层的文献检索 LangChain 工具,返回结构化的结果格式。

文献检索工具调用的时序图

sequenceDiagram
    actor User
    participant ChatUI as ChatToolbar_ToolSelector
    participant Frontend as Frontend_AgentConfig
    participant Backend as Backend_API
    participant Prep as ToolsPrepare_load_all_builtin_tools
    participant Registry as BuiltinToolRegistry
    participant Tool as literature_search_tool
    participant Dist as WorkDistributor
    participant Sources as Literature_Data_Sources

    User->>ChatUI: Toggle literature_search_enabled
    ChatUI->>Frontend: onUpdateAgent(graph_config with literature_search filter)
    Frontend->>Backend: UpdateAgentRequest(agent_config)
    Backend->>Registry: register_builtin_tools()
    Registry->>Registry: create_literature_search_tool()
    Registry->>Registry: register(tool_id=literature_search)

    Backend->>Prep: _load_all_builtin_tools(agent, session)
    Prep->>Registry: get(literature_search)
    Registry-->>Prep: literature_search_tool
    Prep-->>Backend: tools_list includes literature_search

    User->>Backend: ChatCompletionRequest(message requiring literature search)
    Backend->>Tool: coroutine _search_literature(input from LiteratureSearchInput)
    Tool->>Dist: __aenter__()
    Tool->>Dist: search(SearchRequest)
    Dist->>Sources: Query OpenAlex/SemanticScholar/PubMed
    Sources-->>Dist: Works JSON
    Dist-->>Tool: Aggregated_result
    Tool->>Dist: __aexit__()
    Tool->>Tool: _format_search_result(request, result)
    Tool-->>Backend: Markdown_report
    Backend-->>User: Assistant message with literature summary and links
Loading

新文献检索工具的类图

classDiagram
    class LiteratureSearchInput {
        +str query
        +str mailto
        +str author
        +str institution
        +str source
        +int year_from
        +int year_to
        +bool is_oa
        +str work_type
        +str language
        +bool is_retracted
        +bool has_abstract
        +bool has_fulltext
        +"Literal[relevance,cited_by_count,publication_date]" sort_by
        +List~str~ data_sources
    }

    class SearchRequest {
        +str query
        +str author
        +str institution
        +str source
        +int year_from
        +int year_to
        +bool is_oa
        +str work_type
        +str language
        +bool is_retracted
        +bool has_abstract
        +bool has_fulltext
        +str sort_by
        +int max_results
        +List~str~ data_sources
    }

    class WorkDistributor {
        +str openalex_email
        +search(request SearchRequest) dict
        +__aenter__() WorkDistributor
        +__aexit__()
    }

    class BaseTool {
    }

    class StructuredTool {
        +str name
        +str description
        +type args_schema
        +callable coroutine
    }

    class literature_search_tool_factory {
        +create_literature_search_tool() BaseTool
        +_search_literature(query str, mailto str, author str, institution str, source str, year_from int, year_to int, is_oa bool, work_type str, language str, is_retracted bool, has_abstract bool, has_fulltext bool, sort_by str, data_sources List~str~) str
        +_format_search_result(request SearchRequest, result dict, include_abstract bool) str
    }

    StructuredTool --|> BaseTool
    literature_search_tool_factory ..> StructuredTool : returns
    literature_search_tool_factory ..> LiteratureSearchInput : uses_as_args_schema
    literature_search_tool_factory ..> SearchRequest : builds
    literature_search_tool_factory ..> WorkDistributor : uses
Loading

文件级变更

Change Details Files
在聊天工具栏 UI 中将文献检索暴露为一个可切换工具,并将其与 agent 的 graph_config 连接起来。
  • 通过新的 isLiteratureSearchEnabled helper 从 agent 中计算 literatureSearchEnabled。
  • 在启用工具数量的统计中包含 literatureSearchEnabled,以保持布局一致。
  • 添加处理程序,通过 updateLiteratureSearchEnabled 更新 agent 的 graph_config,从而打开/关闭文献检索。
  • 渲染一个新的 “Literature Search” 按钮,包含合适的图标、标签、描述、状态样式和勾选标记,用它替换之前被注释掉的 memory search 按钮。
  • 为文献检索的标签和描述添加中英文 i18n 字符串。
web/src/components/layouts/components/ChatToolbar/ToolSelector.tsx
web/src/i18n/locales/en/app.json
web/src/i18n/locales/zh/app.json
扩展 agent 工具配置模型,使其识别新的 literature_search 内置工具并提供启用/禁用 helper。
  • 将 LITERATURE_SEARCH 添加到 BUILTIN_TOOLS 枚举和 ALL_BUILTIN_TOOL_IDS 列表中,使其参与通用工具处理。
  • 添加 isLiteratureSearchEnabled 和 updateLiteratureSearchEnabled helper,并将其代理到新工具 id 的 isToolEnabled/updateToolFilter。
web/src/core/agent/toolConfig.ts
在后端工具系统中注册、加载并使用能力标签标记 literature_search 工具,使其可供 agent 使用。
  • 在 builtin 工具包中导入并再导出 create_literature_search_tool。
  • 在注册表中注册一个 literature_search 内置工具,类别为 search,ui_toggleable 为启用,default_enabled 为 false,并配置一个简单的 cost 配置。
  • 在内置工具准备流水线中,如果已注册,则始终加载 literature_search 工具。
  • 为 literature_search 打上 WEB_SEARCH 能力标签,以便基于能力的选择和路由可以将其视为搜索工具。
service/app/tools/registry.py
service/app/tools/prepare.py
service/app/tools/builtin/__init__.py
service/app/tools/capabilities.py
实现文献检索 LangChain 工具,可查询多个文献数据源并返回结构化的 markdown+JSON 报告。
  • 定义一个 LiteratureSearchInput Pydantic 模型,包含丰富的过滤项(query、author、institution、source、年份范围、OA/撤稿标记、摘要/全文要求、language、排序、data_sources)。
  • 实现异步协程 _search_literature,对输入进行校验,对年份范围进行裁剪/清洗,构建 SearchRequest,调用 WorkDistributor.search,处理警告,并格式化结果或错误。
  • 实现 _format_search_result,生成一份 markdown 报告,总结检索条件、统计信息,并嵌入包含文献条目(可选包含摘要)的 JSON 数组及后续操作建议。
  • 创建 create_literature_search_tool,构建一个名为 literature_search 的 StructuredTool,使用上述输入 schema 和协程,并在描述中说明调用方应如何向用户呈现结果(包括 access_url)。
  • 移除之前基于 MCP 的文献工具模块 service/app/mcp/literature.py(现已删除,改用新的内置工具实现)。
service/app/tools/builtin/literature.py
service/app/mcp/literature.py

Tips and commands

Interacting with Sourcery

  • 触发新的代码审查: 在 pull request 中评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审查评论。
  • 从审查评论生成 GitHub issue: 在审查评论下回复,要求 Sourcery 从该评论创建 issue。你也可以在审查评论中回复 @sourcery-ai issue 来从该评论创建 issue。
  • 生成 pull request 标题: 在 pull request 标题的任意位置写入 @sourcery-ai 以随时生成标题。你也可以在 pull request 中评论 @sourcery-ai title 来(重新)生成标题。
  • 生成 pull request 摘要: 在 pull request 正文的任意位置写入 @sourcery-ai summary 来在指定位置生成 PR 摘要。你也可以在 pull request 中评论 @sourcery-ai summary 来(重新)生成摘要。
  • 生成 Reviewer's Guide: 在 pull request 中评论 @sourcery-ai guide 来(重新)生成审查指南。
  • 一次性解决所有 Sourcery 评论: 在 pull request 中评论 @sourcery-ai resolve,即可将所有 Sourcery 评论标记为已解决。如果你已经处理完所有评论且不希望再看到它们,这会很有用。
  • 撤销所有 Sourcery 审查: 在 pull request 中评论 @sourcery-ai dismiss 撤销所有现有的 Sourcery 审查。若你希望从一次全新的审查开始,这尤其有用——别忘了再评论 @sourcery-ai review 来触发新的审查!

Customizing Your Experience

访问你的 dashboard 可以:

  • 启用或禁用审查特性,例如 Sourcery 自动生成的 pull request 摘要、reviewer's guide 等。
  • 更改审查语言。
  • 添加、删除或编辑自定义审查说明。
  • 调整其他审查设置。

Getting Help

Original review guide in English

Reviewer's Guide

Adds a new toggleable built-in literature search tool that can be enabled per-agent in the chat toolbar, wires it through the frontend agent tool config, registers and loads it in the backend tool registry, and implements the underlying literature search LangChain tool using multiple data sources with a structured result format.

Sequence diagram for literature search tool invocation

sequenceDiagram
    actor User
    participant ChatUI as ChatToolbar_ToolSelector
    participant Frontend as Frontend_AgentConfig
    participant Backend as Backend_API
    participant Prep as ToolsPrepare_load_all_builtin_tools
    participant Registry as BuiltinToolRegistry
    participant Tool as literature_search_tool
    participant Dist as WorkDistributor
    participant Sources as Literature_Data_Sources

    User->>ChatUI: Toggle literature_search_enabled
    ChatUI->>Frontend: onUpdateAgent(graph_config with literature_search filter)
    Frontend->>Backend: UpdateAgentRequest(agent_config)
    Backend->>Registry: register_builtin_tools()
    Registry->>Registry: create_literature_search_tool()
    Registry->>Registry: register(tool_id=literature_search)

    Backend->>Prep: _load_all_builtin_tools(agent, session)
    Prep->>Registry: get(literature_search)
    Registry-->>Prep: literature_search_tool
    Prep-->>Backend: tools_list includes literature_search

    User->>Backend: ChatCompletionRequest(message requiring literature search)
    Backend->>Tool: coroutine _search_literature(input from LiteratureSearchInput)
    Tool->>Dist: __aenter__()
    Tool->>Dist: search(SearchRequest)
    Dist->>Sources: Query OpenAlex/SemanticScholar/PubMed
    Sources-->>Dist: Works JSON
    Dist-->>Tool: Aggregated_result
    Tool->>Dist: __aexit__()
    Tool->>Tool: _format_search_result(request, result)
    Tool-->>Backend: Markdown_report
    Backend-->>User: Assistant message with literature summary and links
Loading

Class diagram for the new literature search tool

classDiagram
    class LiteratureSearchInput {
        +str query
        +str mailto
        +str author
        +str institution
        +str source
        +int year_from
        +int year_to
        +bool is_oa
        +str work_type
        +str language
        +bool is_retracted
        +bool has_abstract
        +bool has_fulltext
        +"Literal[relevance,cited_by_count,publication_date]" sort_by
        +List~str~ data_sources
    }

    class SearchRequest {
        +str query
        +str author
        +str institution
        +str source
        +int year_from
        +int year_to
        +bool is_oa
        +str work_type
        +str language
        +bool is_retracted
        +bool has_abstract
        +bool has_fulltext
        +str sort_by
        +int max_results
        +List~str~ data_sources
    }

    class WorkDistributor {
        +str openalex_email
        +search(request SearchRequest) dict
        +__aenter__() WorkDistributor
        +__aexit__()
    }

    class BaseTool {
    }

    class StructuredTool {
        +str name
        +str description
        +type args_schema
        +callable coroutine
    }

    class literature_search_tool_factory {
        +create_literature_search_tool() BaseTool
        +_search_literature(query str, mailto str, author str, institution str, source str, year_from int, year_to int, is_oa bool, work_type str, language str, is_retracted bool, has_abstract bool, has_fulltext bool, sort_by str, data_sources List~str~) str
        +_format_search_result(request SearchRequest, result dict, include_abstract bool) str
    }

    StructuredTool --|> BaseTool
    literature_search_tool_factory ..> StructuredTool : returns
    literature_search_tool_factory ..> LiteratureSearchInput : uses_as_args_schema
    literature_search_tool_factory ..> SearchRequest : builds
    literature_search_tool_factory ..> WorkDistributor : uses
Loading

File-Level Changes

Change Details Files
Expose literature search as a toggleable tool in the chat toolbar UI and wire it to the agent graph_config.
  • Compute literatureSearchEnabled from the agent using a new isLiteratureSearchEnabled helper.
  • Include literatureSearchEnabled in the count of enabled tools to keep layout consistent.
  • Add a handler to toggle literature search on/off by updating the agent graph_config via updateLiteratureSearchEnabled.
  • Render a new Literature Search button with appropriate icon, label, description, state styling, and checkmark, replacing the previously-commented memory search button.
  • Add i18n strings for the literature search label and description in English and Chinese.
web/src/components/layouts/components/ChatToolbar/ToolSelector.tsx
web/src/i18n/locales/en/app.json
web/src/i18n/locales/zh/app.json
Extend the agent tool configuration model to know about the new literature_search built-in tool and provide enable/disable helpers.
  • Add LITERATURE_SEARCH to the BUILTIN_TOOLS enum and ALL_BUILTIN_TOOL_IDS list so it participates in generic tool handling.
  • Add isLiteratureSearchEnabled and updateLiteratureSearchEnabled helpers that delegate to isToolEnabled/updateToolFilter for the new tool id.
web/src/core/agent/toolConfig.ts
Register, load, and capability-tag the literature_search tool in the backend tool system so it is available to agents.
  • Import and re-export create_literature_search_tool in the builtin tools package.
  • Register a literature_search built-in tool in the registry with category search, ui_toggleable enabled, default_enabled false, and a simple cost config.
  • Always load the literature_search tool in the builtin tool preparation pipeline if it is registered.
  • Tag literature_search with the WEB_SEARCH capability so capability-based selection and routing can treat it as a search tool.
service/app/tools/registry.py
service/app/tools/prepare.py
service/app/tools/builtin/__init__.py
service/app/tools/capabilities.py
Implement the literature_search LangChain tool that queries multiple literature data sources and returns a structured markdown+JSON report.
  • Define a LiteratureSearchInput Pydantic schema with rich filters (query, author, institution, source, year range, OA/retracted flags, abstract/fulltext requirements, language, sort, data_sources).
  • Implement the async _search_literature coroutine that validates input, clamps/sanitizes year ranges, builds a SearchRequest, invokes WorkDistributor.search, handles warnings, and formats results or errors.
  • Implement _format_search_result to produce a markdown report summarizing search conditions, statistics, and embedding a JSON array of works (optionally with abstracts) plus next-step guidance.
  • Create create_literature_search_tool that builds a StructuredTool named literature_search with the above input schema and coroutine, and describe how callers should present results (including access_url) to users.
  • Remove the previous MCP-based literature tool module in service/app/mcp/literature.py (now deleted in favor of the builtin tool implementation).
service/app/tools/builtin/literature.py
service/app/mcp/literature.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并给出了一些整体性反馈:

  • literature.py 中的 TRUE_VALUESFALSE_VALUES 常量目前没有被使用,如果你并不打算从调用方接收字符串形式的布尔值,可以将其删除;如果有这个打算,则可以把它们接入输入解析逻辑中。
  • 文献检索工具目前将 max_results=10include_abstract=False 写死了;建议把这两个参数作为可配置的输入参数(带有合理的默认值)暴露给调用方,这样调用方可以根据自己的上下文和 token 预算来调节返回结果数量以及是否返回摘要。
提供给 AI 代理的提示词
Please address the comments from this code review:

## Overall Comments
- The `TRUE_VALUES` and `FALSE_VALUES` constants in `literature.py` are currently unused and can either be removed or wired into input parsing if you intend to accept string-valued booleans from callers.
- The literature search tool currently hard-codes `max_results=10` and `include_abstract=False`; consider exposing these as input parameters (with sensible defaults) so callers can tune result volume and whether abstracts are returned based on their context and token budget.

## Individual Comments

### Comment 1
<location> `service/app/tools/builtin/literature.py:133-141` </location>
<code_context>
-        year_to_int = int(year_to) if year_to and str(year_to).strip() else None
-
-        # Clamp year ranges (warn but don't block search)
-        max_year = datetime.now().year + 1
-        year_warning = ""
-        if year_from_int is not None and year_from_int > max_year:
</code_context>

<issue_to_address>
**suggestion:** Year clamping logic is asymmetric and may yield surprising ranges.

Currently you only clamp `year_from` when it’s above `max_year` and `year_to` when it’s below 1700, but not the opposite cases. That means ranges like `year_from=1500, year_to=2100` remain outside the intended bounds. Please clamp both `year_from` and `year_to` into `[1700, max_year]` so the actual filter and warning text remain consistent with the documented range.
</issue_to_address>

Sourcery 对开源项目免费——如果你觉得这次 Review 有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进之后的 Review。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • The TRUE_VALUES and FALSE_VALUES constants in literature.py are currently unused and can either be removed or wired into input parsing if you intend to accept string-valued booleans from callers.
  • The literature search tool currently hard-codes max_results=10 and include_abstract=False; consider exposing these as input parameters (with sensible defaults) so callers can tune result volume and whether abstracts are returned based on their context and token budget.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `TRUE_VALUES` and `FALSE_VALUES` constants in `literature.py` are currently unused and can either be removed or wired into input parsing if you intend to accept string-valued booleans from callers.
- The literature search tool currently hard-codes `max_results=10` and `include_abstract=False`; consider exposing these as input parameters (with sensible defaults) so callers can tune result volume and whether abstracts are returned based on their context and token budget.

## Individual Comments

### Comment 1
<location> `service/app/tools/builtin/literature.py:133-141` </location>
<code_context>
-        year_to_int = int(year_to) if year_to and str(year_to).strip() else None
-
-        # Clamp year ranges (warn but don't block search)
-        max_year = datetime.now().year + 1
-        year_warning = ""
-        if year_from_int is not None and year_from_int > max_year:
</code_context>

<issue_to_address>
**suggestion:** Year clamping logic is asymmetric and may yield surprising ranges.

Currently you only clamp `year_from` when it’s above `max_year` and `year_to` when it’s below 1700, but not the opposite cases. That means ranges like `year_from=1500, year_to=2100` remain outside the intended bounds. Please clamp both `year_from` and `year_to` into `[1700, max_year]` so the actual filter and warning text remain consistent with the documented range.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +133 to +141
max_year = datetime.now().year + 1
year_warning = ""
year_from_clamped = year_from
year_to_clamped = year_to

if year_from_clamped is not None and year_from_clamped > max_year:
year_warning += f"year_from {year_from_clamped} clamped to {max_year}. "
year_from_clamped = max_year
if year_to_clamped is not None and year_to_clamped < 1700:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: 年份钳制逻辑目前不对称,可能会产生出乎意料的区间。

现在你只在 year_from 大于 max_year 时对其进行钳制,以及在 year_to 小于 1700 时对其进行钳制,但反向的情况没有处理。这意味着像 year_from=1500, year_to=2100 这样的区间会落在预期范围之外。请将 year_fromyear_to 都钳制到 [1700, max_year] 区间内,这样实际的过滤条件和警告文案才能与文档说明的范围保持一致。

Original comment in English

suggestion: Year clamping logic is asymmetric and may yield surprising ranges.

Currently you only clamp year_from when it’s above max_year and year_to when it’s below 1700, but not the opposite cases. That means ranges like year_from=1500, year_to=2100 remain outside the intended bounds. Please clamp both year_from and year_to into [1700, max_year] so the actual filter and warning text remain consistent with the documented range.

@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 0% with 114 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
service/app/tools/builtin/literature.py 0.00% 107 Missing ⚠️
service/app/tools/prepare.py 0.00% 3 Missing ⚠️
service/app/tools/registry.py 0.00% 3 Missing ⚠️
service/app/tools/builtin/__init__.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Mile-Away Mile-Away merged commit 7856703 into main Jan 28, 2026
11 of 13 checks passed
@Mile-Away Mile-Away deleted the test branch January 28, 2026 06:07
@Mile-Away
Copy link
Contributor

🎉 This PR is included in version 1.0.16 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants