-
Notifications
You must be signed in to change notification settings - Fork 4
sync: literature-tool #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's Guide在聊天工具栏中新增一个可切换的内置文献检索工具,可按代理(agent)启用;通过前端的 agent 工具配置进行串联,在后端工具注册表中完成注册和加载,并基于多个数据源实现底层的文献检索 LangChain 工具,返回结构化的结果格式。 文献检索工具调用的时序图sequenceDiagram
actor User
participant ChatUI as ChatToolbar_ToolSelector
participant Frontend as Frontend_AgentConfig
participant Backend as Backend_API
participant Prep as ToolsPrepare_load_all_builtin_tools
participant Registry as BuiltinToolRegistry
participant Tool as literature_search_tool
participant Dist as WorkDistributor
participant Sources as Literature_Data_Sources
User->>ChatUI: Toggle literature_search_enabled
ChatUI->>Frontend: onUpdateAgent(graph_config with literature_search filter)
Frontend->>Backend: UpdateAgentRequest(agent_config)
Backend->>Registry: register_builtin_tools()
Registry->>Registry: create_literature_search_tool()
Registry->>Registry: register(tool_id=literature_search)
Backend->>Prep: _load_all_builtin_tools(agent, session)
Prep->>Registry: get(literature_search)
Registry-->>Prep: literature_search_tool
Prep-->>Backend: tools_list includes literature_search
User->>Backend: ChatCompletionRequest(message requiring literature search)
Backend->>Tool: coroutine _search_literature(input from LiteratureSearchInput)
Tool->>Dist: __aenter__()
Tool->>Dist: search(SearchRequest)
Dist->>Sources: Query OpenAlex/SemanticScholar/PubMed
Sources-->>Dist: Works JSON
Dist-->>Tool: Aggregated_result
Tool->>Dist: __aexit__()
Tool->>Tool: _format_search_result(request, result)
Tool-->>Backend: Markdown_report
Backend-->>User: Assistant message with literature summary and links
新文献检索工具的类图classDiagram
class LiteratureSearchInput {
+str query
+str mailto
+str author
+str institution
+str source
+int year_from
+int year_to
+bool is_oa
+str work_type
+str language
+bool is_retracted
+bool has_abstract
+bool has_fulltext
+"Literal[relevance,cited_by_count,publication_date]" sort_by
+List~str~ data_sources
}
class SearchRequest {
+str query
+str author
+str institution
+str source
+int year_from
+int year_to
+bool is_oa
+str work_type
+str language
+bool is_retracted
+bool has_abstract
+bool has_fulltext
+str sort_by
+int max_results
+List~str~ data_sources
}
class WorkDistributor {
+str openalex_email
+search(request SearchRequest) dict
+__aenter__() WorkDistributor
+__aexit__()
}
class BaseTool {
}
class StructuredTool {
+str name
+str description
+type args_schema
+callable coroutine
}
class literature_search_tool_factory {
+create_literature_search_tool() BaseTool
+_search_literature(query str, mailto str, author str, institution str, source str, year_from int, year_to int, is_oa bool, work_type str, language str, is_retracted bool, has_abstract bool, has_fulltext bool, sort_by str, data_sources List~str~) str
+_format_search_result(request SearchRequest, result dict, include_abstract bool) str
}
StructuredTool --|> BaseTool
literature_search_tool_factory ..> StructuredTool : returns
literature_search_tool_factory ..> LiteratureSearchInput : uses_as_args_schema
literature_search_tool_factory ..> SearchRequest : builds
literature_search_tool_factory ..> WorkDistributor : uses
文件级变更
Tips and commandsInteracting with Sourcery
Customizing Your Experience访问你的 dashboard 可以:
Getting HelpOriginal review guide in EnglishReviewer's GuideAdds a new toggleable built-in literature search tool that can be enabled per-agent in the chat toolbar, wires it through the frontend agent tool config, registers and loads it in the backend tool registry, and implements the underlying literature search LangChain tool using multiple data sources with a structured result format. Sequence diagram for literature search tool invocationsequenceDiagram
actor User
participant ChatUI as ChatToolbar_ToolSelector
participant Frontend as Frontend_AgentConfig
participant Backend as Backend_API
participant Prep as ToolsPrepare_load_all_builtin_tools
participant Registry as BuiltinToolRegistry
participant Tool as literature_search_tool
participant Dist as WorkDistributor
participant Sources as Literature_Data_Sources
User->>ChatUI: Toggle literature_search_enabled
ChatUI->>Frontend: onUpdateAgent(graph_config with literature_search filter)
Frontend->>Backend: UpdateAgentRequest(agent_config)
Backend->>Registry: register_builtin_tools()
Registry->>Registry: create_literature_search_tool()
Registry->>Registry: register(tool_id=literature_search)
Backend->>Prep: _load_all_builtin_tools(agent, session)
Prep->>Registry: get(literature_search)
Registry-->>Prep: literature_search_tool
Prep-->>Backend: tools_list includes literature_search
User->>Backend: ChatCompletionRequest(message requiring literature search)
Backend->>Tool: coroutine _search_literature(input from LiteratureSearchInput)
Tool->>Dist: __aenter__()
Tool->>Dist: search(SearchRequest)
Dist->>Sources: Query OpenAlex/SemanticScholar/PubMed
Sources-->>Dist: Works JSON
Dist-->>Tool: Aggregated_result
Tool->>Dist: __aexit__()
Tool->>Tool: _format_search_result(request, result)
Tool-->>Backend: Markdown_report
Backend-->>User: Assistant message with literature summary and links
Class diagram for the new literature search toolclassDiagram
class LiteratureSearchInput {
+str query
+str mailto
+str author
+str institution
+str source
+int year_from
+int year_to
+bool is_oa
+str work_type
+str language
+bool is_retracted
+bool has_abstract
+bool has_fulltext
+"Literal[relevance,cited_by_count,publication_date]" sort_by
+List~str~ data_sources
}
class SearchRequest {
+str query
+str author
+str institution
+str source
+int year_from
+int year_to
+bool is_oa
+str work_type
+str language
+bool is_retracted
+bool has_abstract
+bool has_fulltext
+str sort_by
+int max_results
+List~str~ data_sources
}
class WorkDistributor {
+str openalex_email
+search(request SearchRequest) dict
+__aenter__() WorkDistributor
+__aexit__()
}
class BaseTool {
}
class StructuredTool {
+str name
+str description
+type args_schema
+callable coroutine
}
class literature_search_tool_factory {
+create_literature_search_tool() BaseTool
+_search_literature(query str, mailto str, author str, institution str, source str, year_from int, year_to int, is_oa bool, work_type str, language str, is_retracted bool, has_abstract bool, has_fulltext bool, sort_by str, data_sources List~str~) str
+_format_search_result(request SearchRequest, result dict, include_abstract bool) str
}
StructuredTool --|> BaseTool
literature_search_tool_factory ..> StructuredTool : returns
literature_search_tool_factory ..> LiteratureSearchInput : uses_as_args_schema
literature_search_tool_factory ..> SearchRequest : builds
literature_search_tool_factory ..> WorkDistributor : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - 我发现了 1 个问题,并给出了一些整体性反馈:
literature.py中的TRUE_VALUES和FALSE_VALUES常量目前没有被使用,如果你并不打算从调用方接收字符串形式的布尔值,可以将其删除;如果有这个打算,则可以把它们接入输入解析逻辑中。- 文献检索工具目前将
max_results=10和include_abstract=False写死了;建议把这两个参数作为可配置的输入参数(带有合理的默认值)暴露给调用方,这样调用方可以根据自己的上下文和 token 预算来调节返回结果数量以及是否返回摘要。
提供给 AI 代理的提示词
Please address the comments from this code review:
## Overall Comments
- The `TRUE_VALUES` and `FALSE_VALUES` constants in `literature.py` are currently unused and can either be removed or wired into input parsing if you intend to accept string-valued booleans from callers.
- The literature search tool currently hard-codes `max_results=10` and `include_abstract=False`; consider exposing these as input parameters (with sensible defaults) so callers can tune result volume and whether abstracts are returned based on their context and token budget.
## Individual Comments
### Comment 1
<location> `service/app/tools/builtin/literature.py:133-141` </location>
<code_context>
- year_to_int = int(year_to) if year_to and str(year_to).strip() else None
-
- # Clamp year ranges (warn but don't block search)
- max_year = datetime.now().year + 1
- year_warning = ""
- if year_from_int is not None and year_from_int > max_year:
</code_context>
<issue_to_address>
**suggestion:** Year clamping logic is asymmetric and may yield surprising ranges.
Currently you only clamp `year_from` when it’s above `max_year` and `year_to` when it’s below 1700, but not the opposite cases. That means ranges like `year_from=1500, year_to=2100` remain outside the intended bounds. Please clamp both `year_from` and `year_to` into `[1700, max_year]` so the actual filter and warning text remain consistent with the documented range.
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进之后的 Review。
Original comment in English
Hey - I've found 1 issue, and left some high level feedback:
- The
TRUE_VALUESandFALSE_VALUESconstants inliterature.pyare currently unused and can either be removed or wired into input parsing if you intend to accept string-valued booleans from callers. - The literature search tool currently hard-codes
max_results=10andinclude_abstract=False; consider exposing these as input parameters (with sensible defaults) so callers can tune result volume and whether abstracts are returned based on their context and token budget.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `TRUE_VALUES` and `FALSE_VALUES` constants in `literature.py` are currently unused and can either be removed or wired into input parsing if you intend to accept string-valued booleans from callers.
- The literature search tool currently hard-codes `max_results=10` and `include_abstract=False`; consider exposing these as input parameters (with sensible defaults) so callers can tune result volume and whether abstracts are returned based on their context and token budget.
## Individual Comments
### Comment 1
<location> `service/app/tools/builtin/literature.py:133-141` </location>
<code_context>
- year_to_int = int(year_to) if year_to and str(year_to).strip() else None
-
- # Clamp year ranges (warn but don't block search)
- max_year = datetime.now().year + 1
- year_warning = ""
- if year_from_int is not None and year_from_int > max_year:
</code_context>
<issue_to_address>
**suggestion:** Year clamping logic is asymmetric and may yield surprising ranges.
Currently you only clamp `year_from` when it’s above `max_year` and `year_to` when it’s below 1700, but not the opposite cases. That means ranges like `year_from=1500, year_to=2100` remain outside the intended bounds. Please clamp both `year_from` and `year_to` into `[1700, max_year]` so the actual filter and warning text remain consistent with the documented range.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| max_year = datetime.now().year + 1 | ||
| year_warning = "" | ||
| year_from_clamped = year_from | ||
| year_to_clamped = year_to | ||
|
|
||
| if year_from_clamped is not None and year_from_clamped > max_year: | ||
| year_warning += f"year_from {year_from_clamped} clamped to {max_year}. " | ||
| year_from_clamped = max_year | ||
| if year_to_clamped is not None and year_to_clamped < 1700: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: 年份钳制逻辑目前不对称,可能会产生出乎意料的区间。
现在你只在 year_from 大于 max_year 时对其进行钳制,以及在 year_to 小于 1700 时对其进行钳制,但反向的情况没有处理。这意味着像 year_from=1500, year_to=2100 这样的区间会落在预期范围之外。请将 year_from 和 year_to 都钳制到 [1700, max_year] 区间内,这样实际的过滤条件和警告文案才能与文档说明的范围保持一致。
Original comment in English
suggestion: Year clamping logic is asymmetric and may yield surprising ranges.
Currently you only clamp year_from when it’s above max_year and year_to when it’s below 1700, but not the opposite cases. That means ranges like year_from=1500, year_to=2100 remain outside the intended bounds. Please clamp both year_from and year_to into [1700, max_year] so the actual filter and warning text remain consistent with the documented range.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
🎉 This PR is included in version 1.0.16 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Summary by Sourcery
添加一个可切换的内置文献检索工具,用于查询多个学术来源,并将其暴露在聊天工具栏中。
New Features:
literature_search内置工具,使用基于 LangChain 的实现,查询多个学术文献数据源,并以结构化的 markdown/JSON 形式返回结果。Enhancements:
Documentation:
Original summary in English
Summary by Sourcery
Add a toggleable built-in literature search tool that can query multiple academic sources and expose it in the chat toolbar.
New Features:
Enhancements:
Documentation: