Feature/literature mcp #189

SoberPizza · 2026-01-21T09:22:11Z

refactor: Recover .gitignore, clean up test utilities and remove unused files

Summary by Sourcery

新增一个文献检索子系统，包括 OpenAlex API 客户端、DOI 规范化/去重工具、多源文献分发器，以及 MCP 服务器入口点，并为这些组件提供了完整的单元测试覆盖。

新功能：

引入标准化的 SearchRequest 和 LiteratureWork 模型，用于文献检索请求和结果表示。
实现 OpenAlexClient，支持限流、重试，并将 OpenAlex 的响应转换为共享模型。
新增 WorkDistributor，用于聚合、去重并排序来自多个文献数据源的结果。
暴露一个用于多源学术检索的文献 MCP 服务器工具，支持丰富过滤条件并返回 Markdown 格式的结果。

增强：

提供 DOI 规范化和去重工具，用于合并重叠的文献检索结果。

测试：

为 OpenAlex 客户端、基础客户端协议、DOI 工具、文献分发器以及文献 MCP 集成行为添加了大量单元测试。

Original summary in English

Summary by Sourcery

Add a new literature search subsystem with an OpenAlex API client, DOI normalization/deduplication helpers, a multi-source work distributor, and an MCP server entrypoint, along with comprehensive unit test coverage for these components.

New Features:

Introduce a standardized SearchRequest and LiteratureWork model for literature queries and results.
Implement an OpenAlexClient with rate limiting, retries, and transformation of OpenAlex responses into the shared models.
Add a WorkDistributor to aggregate, deduplicate, and sort results from multiple literature data sources.
Expose a literature MCP server tool for multi-source academic search with rich filtering and markdown-formatted responses.

Enhancements:

Provide DOI normalization and deduplication utilities to consolidate overlapping literature results.

Tests:

Add extensive unit tests for the OpenAlex client, base client protocol, DOI utilities, work distributor, and literature MCP integration behavior.

…earch

…earch (skip pre-commit due to Pyright false positive on __getattr__)

…earch

…earch (skip pre-commit due to Pyright false positive on __getattr__)

…ed files

…/Xyzen into feature/literature-MCP

sourcery-ai · 2026-01-21T09:22:30Z

审查者指南

引入了一个新的文献检索子系统（模型、基础客户端、OpenAlex 客户端、DOI 规范化/去重、多源工作分发器，以及一个 MCP 服务器入口点），并配套了大量单元测试，同时移除了一个已废弃的内置工具测试；若干新增测试文件中仍然包含未解决的合并冲突标记，必须在合并前修复。

文献 MCP 检索流程的时序图

sequenceDiagram
    actor User
    participant LLM
    participant FastMCP_literature as FastMCP_literature_tool
    participant WorkDistributor
    participant OpenAlexClient
    participant OpenAlexAPI as OpenAlex_API

    User->>LLM: Ask for literature search
    LLM->>FastMCP_literature: Call search_literature(parameters)
    FastMCP_literature->>FastMCP_literature: Parse and normalize parameters
    FastMCP_literature->>FastMCP_literature: Build SearchRequest

    FastMCP_literature->>WorkDistributor: create(openalex_email)
    FastMCP_literature->>WorkDistributor: search(SearchRequest)

    WorkDistributor->>WorkDistributor: _register_clients()
    WorkDistributor->>OpenAlexClient: search(SearchRequest)

    OpenAlexClient->>OpenAlexClient: _build_query_params(...)
    loop optional_id_resolution
        OpenAlexClient->>OpenAlexClient: _resolve_author_id()
        OpenAlexClient->>OpenAlexClient: _resolve_institution_id()
        OpenAlexClient->>OpenAlexClient: _resolve_source_id()
    end

    loop pagination
        OpenAlexClient->>OpenAlexClient: _fetch_all_pages()
        loop per_page_request
            OpenAlexClient->>OpenAlexClient: _request_with_retry(works_endpoint)
            OpenAlexClient->>OpenAlexAPI: GET /works
            OpenAlexAPI-->>OpenAlexClient: JSON page
        end
    end

    OpenAlexClient->>OpenAlexClient: _transform_work() for each item
    OpenAlexClient-->>WorkDistributor: list~LiteratureWork~, warnings

    WorkDistributor->>WorkDistributor: deduplicate_by_doi()
    WorkDistributor->>WorkDistributor: _sort_works()
    WorkDistributor-->>FastMCP_literature: aggregated_result

    FastMCP_literature->>FastMCP_literature: _format_search_result()
    FastMCP_literature-->>LLM: Markdown report
    LLM-->>User: Present formatted results

新文献检索子系统的类图

classDiagram
    class SearchRequest {
        +str query
        +str author
        +str institution
        +str source
        +int year_from
        +int year_to
        +bool is_oa
        +str work_type
        +str language
        +bool is_retracted
        +bool has_abstract
        +bool has_fulltext
        +str sort_by
        +int max_results
        +list~str~ data_sources
    }

    class LiteratureWork {
        +str id
        +str doi
        +str title
        +list~dict~ authors
        +int publication_year
        +int cited_by_count
        +str abstract
        +str journal
        +bool is_oa
        +str oa_url
        +str source
        +dict raw_data
    }

    class BaseLiteratureClient {
        <<abstract>>
        +search(request SearchRequest) async tuple~list~LiteratureWork~~ list~str~~
    }

    class _RateLimiter {
        -float _min_interval
        -Lock _lock
        -float _last_request
        -Semaphore _semaphore
        +__aenter__() async
        +__aexit__(exc_type type~BaseException~, exc BaseException, tb Any) async
        -_throttle() async
    }

    class OpenAlexClient {
        +str BASE_URL
        +int MAX_PER_PAGE
        +int MAX_RETRIES
        +float TIMEOUT
        -str email
        -int rate_limit
        -_RateLimiter rate_limiter
        -AsyncClient client
        +OpenAlexClient(email str, rate_limit int, timeout float)
        +property pool_type str
        +search(request SearchRequest) async tuple~list~LiteratureWork~~ list~str~~
        -_build_query_params(request SearchRequest, author_id str, institution_id str, source_id str) dict~str,str~
        -_resolve_author_id(author_name str) async tuple~str,bool,str~
        -_resolve_institution_id(institution_name str) async tuple~str,bool,str~
        -_resolve_source_id(source_name str) async tuple~str,bool,str~
        -_fetch_all_pages(params dict~str,str~, max_results int) async list~dict~str,Any~~
        -_request_with_retry(url str, params dict~str,str~) async dict~str,Any~
        -_transform_work(work dict~str,Any~) LiteratureWork
        -_reconstruct_abstract(inverted_index dict~str,list~int~~) str
        +close() async
        +__aenter__() async OpenAlexClient
        +__aexit__(exc_type type~BaseException~, exc BaseException, tb Any) async
    }

    class WorkWithDOI {
        <<protocol>>
        +str doi
        +int cited_by_count
        +int publication_year
    }

    class DoiCleanerModule {
        +normalize_doi(doi str) str
        +deduplicate_by_doi(works list~WorkWithDOI~) list~WorkWithDOI~
    }

    class WorkDistributor {
        -dict~str,Any~ clients
        -str openalex_email
        +WorkDistributor(openalex_email str)
        -_register_clients() void
        +close() async
        +__aenter__() async WorkDistributor
        +__aexit__(exc_type type~BaseException~, exc BaseException, tb Any) async
        +search(request SearchRequest) async dict~str,Any~
        -_sort_works(works list~LiteratureWork~, sort_by str) list~LiteratureWork~
    }

    BaseLiteratureClient <|-- OpenAlexClient
    WorkDistributor o--> OpenAlexClient
    OpenAlexClient ..> _RateLimiter
    WorkDistributor ..> SearchRequest
    WorkDistributor ..> LiteratureWork
    OpenAlexClient ..> SearchRequest
    OpenAlexClient ..> LiteratureWork
    DoiCleanerModule ..> WorkWithDOI
    WorkDistributor ..> DoiCleanerModule

文件级变更

Change	Details	Files
添加共享的文献检索数据模型和抽象客户端基类，以规范与多个文献 API 的交互方式。	引入 `SearchRequest` 数据类，封装查询、过滤、排序以及数据源选择等参数，并提供合理的默认值。引入 `LiteratureWork` 数据类，将不同提供方的文献元数据（ID、DOI、作者、引用次数、开放获取状态等）进行规范化，同时包含用于调试的 `raw_data` 字段。定义 `BaseLiteratureClient` 抽象类，约定唯一的异步 `search` 方法接口，返回文献条目列表和警告信息。	`service/app/utils/literature/models.py` `service/app/utils/literature/base_client.py` `service/app/utils/literature/__init__.py` `service/tests/unit/test_literature/test_base_client.py`
为文献条目实现 DOI 规范化和基于 DOI 的去重工具。	提供 `normalize_doi`，用于剥离常见 URL/前缀变体，将字符串转为小写，并依据简化的 `10.x/` 模式校验 DOI。提供 `deduplicate_by_doi`，在去重时优先保留具有有效 DOI、引用次数更高、发表年份更晚的条目，同时保留不含 DOI 的条目。添加覆盖多种 DOI 格式、无效输入、排序保证及复杂去重场景的全面测试。	`service/app/utils/literature/doi_cleaner.py` `service/tests/unit/test_literature/test_doi_cleaner.py`
新增异步 OpenAlex 文献客户端，支持限流、重试逻辑、过滤辅助方法以及向共享模型的转换。	实现 `_RateLimiter` 上下文管理器，根据配置的最小请求间隔和并发度限制请求；并发限制会根据是否提供 email（polite 池 vs 默认池）而变化。实现 `OpenAlexClient`，根据 email 初始化行为、共享 `httpx.AsyncClient`，并在 `search` 方法中处理作者/机构/来源名称到 ID 的解析、查询参数构建、分页，以及将结果转换为 `LiteratureWork` 实例。添加辅助方法，用于从反向索引重建摘要、规范化 ID/DOI、对超时/403/5xx 实施指数退避重试，以及从独立的 OpenAlex 端点解析实体 ID。创建全面的单元测试，覆盖客户端初始化默认值、检索变体与过滤器、分页参数、转换正确性、重试/退避行为以及异步上下文管理。	`service/app/utils/literature/openalex_client.py` `service/tests/unit/test_literature/test_openalex_client.py`
引入 `WorkDistributor`，用于协调多个文献数据源并对结果进行聚合、去重和排序。	在初始化时按条件注册 `OpenAlexClient`，将客户端保存在以数据源名称为键的字典中，并支持可选的 polite-pool email 配置。实现检索编排逻辑，将 `max_results` 约束在安全范围内并给出警告，选择请求的 `data_sources`（默认 `openalex`），调用各客户端，聚合文献条目和警告信息，按 DOI 去重，根据相关性/引用次数/发表日期排序，并最终执行 `max_results` 限制。提供异步上下文管理和 `close` 方法，能优雅地处理客户端 `close` 方法为异步或同步两种情况。添加测试，用于覆盖初始化行为、多源聚合、去重策略、错误处理、结果限制、警告传播、排序语义以及生命周期/清理行为。	`service/app/utils/literature/work_distributor.py` `service/tests/unit/test_literature/test_work_distributor.py`
暴露新的 “literature” MCP 服务器工具，封装 `WorkDistributor`，负责参数规范化和校验，并返回带嵌入 JSON 结果的 Markdown 报告。	基于 FastMCP 创建文献 MCP，包含元数据以及 `search_literature` 工具，将灵活、主要为字符串的接口映射为结构化的 `SearchRequest`。实现年份范围、布尔值和 `max_results` 的转换与约束，并根据配置决定是否在 JSON 负载中包含摘要。将该工具接入 `WorkDistributor` 的异步上下文，处理 HTTP 与通用错误，并记录关键请求信息。添加私有格式化器，用于生成结构化的 Markdown 报告，包括警告、检索条件、统计信息、JSON 编码的文献条目以及后续步骤建议，以避免不必要的重复查询。	`service/app/mcp/literature.py`
清理遗留测试，并标注新增测试文件中的未解决合并冲突标记。	移除 `test_utils` 下已废弃的内置工具单元测试文件。新增若干与文献相关的测试文件，其中仍包含 Git 合并冲突标记（`<<<<<<< HEAD`/`=======`/`>>>>>>>`），尤其是 `OpenAlex` 和 `BaseLiteratureClient` 测试文件，在合并前必须解决这些冲突。	`service/tests/unit/test_utils/test_built_in_tools.py` `service/tests/unit/test_literature/test_openalex_client.py` `service/tests/unit/test_literature/test_base_client.py`

提示与命令

与 Sourcery 交互

触发新一轮审查： 在 Pull Request 中评论 @sourcery-ai review。
继续讨论： 直接回复 Sourcery 的审查评论即可继续讨论。
从审查评论生成 GitHub Issue： 在某条审查评论下回复，请求 Sourcery 将其转换为 Issue。你也可以直接回复 @sourcery-ai issue，从该评论生成 Issue。
生成 Pull Request 标题： 在 Pull Request 标题中任意位置写上 @sourcery-ai，即可随时生成标题。你也可以在 Pull Request 中评论 @sourcery-ai title，以（重新）生成标题。
生成 Pull Request 摘要： 在 Pull Request 正文任意位置写上 @sourcery-ai summary，即可在对应位置生成 PR 摘要。你也可以在 Pull Request 中评论 @sourcery-ai summary，以（重新）生成摘要。
生成审查者指南： 在 Pull Request 中评论 @sourcery-ai guide，即可随时（重新）生成审查者指南。
一次性解决所有 Sourcery 评论： 在 Pull Request 中评论 @sourcery-ai resolve，即可将所有 Sourcery 评论标记为已解决。适用于你已经处理完所有评论且不再需要看到它们的情况。
关闭所有 Sourcery 审查： 在 Pull Request 中评论 @sourcery-ai dismiss，即可关闭所有现有 Sourcery 审查。尤其适用于你希望从一次全新的审查开始——别忘了随后评论 @sourcery-ai review 以触发新的审查！

自定义使用体验

前往你的控制面板可以：

启用或禁用审查功能，例如 Sourcery 自动生成的 Pull Request 摘要、审查者指南等。
更改审查语言。
添加、移除或编辑自定义审查指令。
调整其他审查相关设置。

获取帮助

如有问题或反馈，请联系支持团队。
访问我们的文档获取详细指南和信息。
通过关注我们的 X/Twitter、LinkedIn 或 GitHub 与 Sourcery 团队保持联系。

Original review guide in English

Reviewer's Guide

Introduces a new literature search subsystem (models, base client, OpenAlex client, DOI normalization/deduplication, multi-source work distributor, and an MCP server entrypoint) along with extensive unit tests, while removing an obsolete built-in tools test; several new tests contain unresolved merge conflict markers that must be fixed.

Sequence diagram for literature MCP search flow

sequenceDiagram
    actor User
    participant LLM
    participant FastMCP_literature as FastMCP_literature_tool
    participant WorkDistributor
    participant OpenAlexClient
    participant OpenAlexAPI as OpenAlex_API

    User->>LLM: Ask for literature search
    LLM->>FastMCP_literature: Call search_literature(parameters)
    FastMCP_literature->>FastMCP_literature: Parse and normalize parameters
    FastMCP_literature->>FastMCP_literature: Build SearchRequest

    FastMCP_literature->>WorkDistributor: create(openalex_email)
    FastMCP_literature->>WorkDistributor: search(SearchRequest)

    WorkDistributor->>WorkDistributor: _register_clients()
    WorkDistributor->>OpenAlexClient: search(SearchRequest)

    OpenAlexClient->>OpenAlexClient: _build_query_params(...)
    loop optional_id_resolution
        OpenAlexClient->>OpenAlexClient: _resolve_author_id()
        OpenAlexClient->>OpenAlexClient: _resolve_institution_id()
        OpenAlexClient->>OpenAlexClient: _resolve_source_id()
    end

    loop pagination
        OpenAlexClient->>OpenAlexClient: _fetch_all_pages()
        loop per_page_request
            OpenAlexClient->>OpenAlexClient: _request_with_retry(works_endpoint)
            OpenAlexClient->>OpenAlexAPI: GET /works
            OpenAlexAPI-->>OpenAlexClient: JSON page
        end
    end

    OpenAlexClient->>OpenAlexClient: _transform_work() for each item
    OpenAlexClient-->>WorkDistributor: list~LiteratureWork~, warnings

    WorkDistributor->>WorkDistributor: deduplicate_by_doi()
    WorkDistributor->>WorkDistributor: _sort_works()
    WorkDistributor-->>FastMCP_literature: aggregated_result

    FastMCP_literature->>FastMCP_literature: _format_search_result()
    FastMCP_literature-->>LLM: Markdown report
    LLM-->>User: Present formatted results

Class diagram for new literature search subsystem

classDiagram
    class SearchRequest {
        +str query
        +str author
        +str institution
        +str source
        +int year_from
        +int year_to
        +bool is_oa
        +str work_type
        +str language
        +bool is_retracted
        +bool has_abstract
        +bool has_fulltext
        +str sort_by
        +int max_results
        +list~str~ data_sources
    }

    class LiteratureWork {
        +str id
        +str doi
        +str title
        +list~dict~ authors
        +int publication_year
        +int cited_by_count
        +str abstract
        +str journal
        +bool is_oa
        +str oa_url
        +str source
        +dict raw_data
    }

    class BaseLiteratureClient {
        <<abstract>>
        +search(request SearchRequest) async tuple~list~LiteratureWork~~ list~str~~
    }

    class _RateLimiter {
        -float _min_interval
        -Lock _lock
        -float _last_request
        -Semaphore _semaphore
        +__aenter__() async
        +__aexit__(exc_type type~BaseException~, exc BaseException, tb Any) async
        -_throttle() async
    }

    class OpenAlexClient {
        +str BASE_URL
        +int MAX_PER_PAGE
        +int MAX_RETRIES
        +float TIMEOUT
        -str email
        -int rate_limit
        -_RateLimiter rate_limiter
        -AsyncClient client
        +OpenAlexClient(email str, rate_limit int, timeout float)
        +property pool_type str
        +search(request SearchRequest) async tuple~list~LiteratureWork~~ list~str~~
        -_build_query_params(request SearchRequest, author_id str, institution_id str, source_id str) dict~str,str~
        -_resolve_author_id(author_name str) async tuple~str,bool,str~
        -_resolve_institution_id(institution_name str) async tuple~str,bool,str~
        -_resolve_source_id(source_name str) async tuple~str,bool,str~
        -_fetch_all_pages(params dict~str,str~, max_results int) async list~dict~str,Any~~
        -_request_with_retry(url str, params dict~str,str~) async dict~str,Any~
        -_transform_work(work dict~str,Any~) LiteratureWork
        -_reconstruct_abstract(inverted_index dict~str,list~int~~) str
        +close() async
        +__aenter__() async OpenAlexClient
        +__aexit__(exc_type type~BaseException~, exc BaseException, tb Any) async
    }

    class WorkWithDOI {
        <<protocol>>
        +str doi
        +int cited_by_count
        +int publication_year
    }

    class DoiCleanerModule {
        +normalize_doi(doi str) str
        +deduplicate_by_doi(works list~WorkWithDOI~) list~WorkWithDOI~
    }

    class WorkDistributor {
        -dict~str,Any~ clients
        -str openalex_email
        +WorkDistributor(openalex_email str)
        -_register_clients() void
        +close() async
        +__aenter__() async WorkDistributor
        +__aexit__(exc_type type~BaseException~, exc BaseException, tb Any) async
        +search(request SearchRequest) async dict~str,Any~
        -_sort_works(works list~LiteratureWork~, sort_by str) list~LiteratureWork~
    }

    BaseLiteratureClient <|-- OpenAlexClient
    WorkDistributor o--> OpenAlexClient
    OpenAlexClient ..> _RateLimiter
    WorkDistributor ..> SearchRequest
    WorkDistributor ..> LiteratureWork
    OpenAlexClient ..> SearchRequest
    OpenAlexClient ..> LiteratureWork
    DoiCleanerModule ..> WorkWithDOI
    WorkDistributor ..> DoiCleanerModule

File-Level Changes

Change	Details	Files
Add shared literature search data models and abstract client base class to standardize interactions with multiple literature APIs.	Introduce SearchRequest dataclass encapsulating query, filtering, sorting, and data source selection parameters with sensible defaults. Introduce LiteratureWork dataclass normalizing work metadata (ID, DOI, authors, citations, OA status, etc.) across providers, including a raw_data field for debugging. Define BaseLiteratureClient abstract class with a single async search method contract returning works plus warning messages.	`service/app/utils/literature/models.py` `service/app/utils/literature/base_client.py` `service/app/utils/literature/__init__.py` `service/tests/unit/test_literature/test_base_client.py`
Implement DOI normalization and DOI-based deduplication utilities for literature works.	Provide normalize_doi to strip common URL/prefix variants, lowercase, and validate DOI strings against a simple 10.x/ pattern. Provide deduplicate_by_doi that prefers works with valid DOIs, higher citation counts, and more recent publication years, while preserving non-DOI items. Add extensive tests covering many DOI formats, invalid inputs, ordering guarantees, and complex deduplication scenarios.	`service/app/utils/literature/doi_cleaner.py` `service/tests/unit/test_literature/test_doi_cleaner.py`
Add an asynchronous OpenAlex literature client with rate limiting, retry logic, filtering helpers, and transformation to the shared model.	Implement _RateLimiter context manager enforcing minimum request interval plus concurrency limits dependent on presence of email (polite vs default pool). Implement OpenAlexClient with initialization behavior based on email, shared httpx.AsyncClient, and a search method that resolves author/institution/source names to IDs, builds query params, paginates, and converts works to LiteratureWork instances. Add helper methods to reconstruct abstracts from inverted indices, normalize IDs/DOIs, perform exponential-backoff retries on timeouts/403/5xx, and resolve entity IDs from separate OpenAlex endpoints. Create comprehensive unit tests covering client initialization defaults, search variations and filters, pagination parameters, transformation correctness, retry/backoff behavior, and async context management.	`service/app/utils/literature/openalex_client.py` `service/tests/unit/test_literature/test_openalex_client.py`
Introduce a WorkDistributor to coordinate multiple literature data sources and aggregate, deduplicate, and sort results.	Register OpenAlexClient conditionally at initialization, storing clients in a dict keyed by source name, with optional polite-pool email configuration. Implement search orchestration that clamps max_results into safe bounds with warnings, selects requested data_sources (defaulting to openalex), calls each client, aggregates works and warnings, deduplicates by DOI, sorts by relevance/citations/publication date, and enforces the max_results limit. Provide async context management and a close method that gracefully handles both async and sync client.close methods. Add tests for initialization behavior, multi-source aggregation, deduplication policy, error handling, result limiting, warning propagation, sorting semantics, and lifecycle/cleanup behavior.	`service/app/utils/literature/work_distributor.py` `service/tests/unit/test_literature/test_work_distributor.py`
Expose a new "literature" MCP server tool that wraps the WorkDistributor, handles parameter normalization and validation, and returns a Markdown report with embedded JSON results.	Create FastMCP-based literature MCP with metadata and a search_literature tool that maps a flexible, mostly-string interface into a structured SearchRequest. Implement conversion and clamping of year ranges, booleans, and max_results plus optional inclusion of abstracts in the JSON payload. Wire the tool to use WorkDistributor in an async context, handle HTTP and generic errors, and log key request information. Add a private formatter that produces a structured Markdown report, including warnings, search conditions, statistics, JSON-encoded works, and next-steps guidance to discourage unnecessary repeated queries.	`service/app/mcp/literature.py`
Clean up legacy tests and note unresolved merge conflict markers in new test files.	Remove obsolete built-in tools unit test file under test_utils. Introduce several new literature-related test files that still contain Git merge conflict markers (<<<<<<< HEAD/=======/>>>>>>>), particularly in OpenAlex and BaseLiteratureClient tests, which must be resolved before merging.	`service/tests/unit/test_utils/test_built_in_tools.py` `service/tests/unit/test_literature/test_openalex_client.py` `service/tests/unit/test_literature/test_base_client.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - 我发现了 8 个问题，并给出了一些整体反馈：

在多个新测试文件中（例如 test_openalex_client.py 和 test_base_client.py）还残留着未解决的合并冲突标记（例如 <<<<<<< HEAD / >>>>>>>）；在合并之前需要解决这些冲突，并保留预期的断言/代码。
测试中为 BaseLiteratureClient.get_by_doi / get_by_id 以及对应的 OpenAlexClient 行为添加了预期，但 BaseLiteratureClient 和 OpenAlexClient 的实现目前只定义了 search；因此要么删除/调整这些测试，要么在代码中增加相应的抽象方法和具体实现。

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- 在多个新测试文件中（例如 `test_openalex_client.py` 和 `test_base_client.py`）还残留着未解决的合并冲突标记（例如 `<<<<<<< HEAD` / `>>>>>>>`）；在合并之前需要解决这些冲突，并保留预期的断言/代码。
- 测试中为 `BaseLiteratureClient.get_by_doi` / `get_by_id` 以及对应的 OpenAlexClient 行为添加了预期，但 `BaseLiteratureClient` 和 `OpenAlexClient` 的实现目前只定义了 `search`；因此要么删除/调整这些测试，要么在代码中增加相应的抽象方法和具体实现。

## Individual Comments

### Comment 1
<location> `service/app/utils/literature/openalex_client.py:266-275` </location>
<code_context>
+    sections.append("## Complete Works List (JSON)\n")
+    if include_abstract:
+        sections.append("The following JSON contains all works with full abstracts:\n")
+    else:
+        sections.append("The following JSON contains all works (abstracts excluded to save tokens):\n")
+    sections.append("```json")
</code_context>

<issue_to_address>
**issue (bug_risk):** 除 403 以外的 4xx 错误仍会被重试，这与注释所声称的行为不一致。

由于 `response.raise_for_status()` 被包装在一个通用的 `except Exception` 中，任何非 403 的 4xx 目前都会进入重试循环，这和“不要在 4xx（403 除外）时重试”的约定相矛盾。请要么在 4xx 时直接短路（例如对 `400 <= status_code < 500 且 status_code != 403` 直接抛出/返回、不再重试），要么收窄 `except` 的范围，使 `raise_for_status()` 抛出的错误不会被当作瞬时错误来重试。
</issue_to_address>

### Comment 2
<location> `service/app/utils/literature/work_distributor.py:90-95` </location>
<code_context>
+        """
+        # Clamp max_results to 50/1000 with warnings
+        all_warnings: list[str] = []
+        if request.max_results < 1:
+            all_warnings.append("⚠️ max_results < 1; using default 50")
+            request.max_results = 50
+        elif request.max_results > 1000:
+            all_warnings.append("⚠️ max_results > 1000; using 1000")
+            request.max_results = 1000
+
+        # Determine which data sources to use
</code_context>

<issue_to_address>
**suggestion (bug_risk):** 在分发器内部原地修改 `request.max_results` 可能会给调用方带来意料之外的副作用。

由于分发器会回写到共享的 `SearchRequest` 对象中，任何重用该实例的调用方都会看到与最初设置不同的 `max_results`，这既隐蔽又难以排查。建议派生出一个本地的 `max_results`（或在一个浅拷贝上操作），而不是修改传入的请求对象。

Suggested implementation:

```python
        # Clamp max_results to 50/1000 with warnings, without mutating the request
        all_warnings: list[str] = []
        max_results = request.max_results
        if max_results < 1:
            all_warnings.append("⚠️ max_results < 1; using default 50")
            max_results = 50
        elif max_results > 1000:
            all_warnings.append("⚠️ max_results > 1000; using 1000")
            max_results = 1000

        # Determine which data sources to use
        sources = request.data_sources or ["openalex"]

```

在本方法后面任何使用 `request.max_results` 控制分页或限制结果数量的地方，都应改为使用本地的 `max_results` 变量。例如，类似 `client.search(request, ...)` 的调用如果依赖 `request.max_results`，要么显式把 `max_results` 作为参数传入，要么构造一个带有调整后 `max_results` 的请求浅拷贝，以保证对调用方来说原始对象保持不变。
</issue_to_address>

### Comment 3
<location> `service/app/utils/literature/openalex_client.py:274-283` </location>
<code_context>
+    except httpx.HTTPError as e:
+        logger.error(f"Literature search network error: {e}", exc_info=True)
+        return "❌ Network error while contacting literature sources. Please try again later."
+    except Exception as e:
+        logger.error(f"Literature search failed: {e}", exc_info=True)
+        return "❌ Unexpected error during search. Please retry or contact support."
</code_context>

<issue_to_address>
**suggestion (bug_risk):** 抓取页面时的错误会被吞掉，可能在没有任何显式警告的情况下返回部分结果。

当前 `_fetch_all_pages` 会记录异常并跳出循环，只返回目前已获取到的页面。为避免将“部分结果”在静默情况下当作“完整搜索成功”来处理，要么让异常向上传递，以便调用方显式处理，要么返回一个结构化的结果，其中既包含 works，又包含明确的“部分结果/错误”标记，供调用方对外展示。
</issue_to_address>

### Comment 4
<location> `service/app/mcp/literature.py:261-262` </location>
<code_context>
+        conditions.append(f"- **Work Type**: {request.work_type}")
+    if request.language:
+        conditions.append(f"- **Language**: {request.language}")
+    if request.is_retracted is not None:
+        conditions.append(f"- **Exclude Retracted**: {'No' if request.is_retracted else 'Yes'}")
+    if request.has_abstract is not None:
+        conditions.append(f"- **Require Abstract**: {'Yes' if request.has_abstract else 'No'}")
</code_context>

<issue_to_address>
**suggestion:** “Exclude Retracted” 这个标签与 `is_retracted` 标志位的语义相反/容易令人困惑。

根据 docstring，`is_retracted=True` 表示“只要已撤稿的文献”，`False` 表示“排除已撤稿”。在当前渲染方式下，当 `is_retracted=True` 时会显示 `Exclude Retracted: No`，而 `False` 时显示 `Yes`，虽然在逻辑上自洽，但很难理解。建议重命名/重写这一行，用真正的过滤模式来描述（例如 `Retracted Filter: only retracted / exclude retracted`），而不是用一个语义反转的 “Exclude” 标志来表达。

```suggestion
    if request.is_retracted is not None:
        retracted_mode = "Only retracted" if request.is_retracted else "Exclude retracted"
        conditions.append(f"- **Retracted Filter**: {retracted_mode}")
```
</issue_to_address>

### Comment 5
<location> `service/tests/unit/test_literature/test_openalex_client.py:126-129` </location>
<code_context>
+
+                assert len(works) == 1
+                mock_resolve.assert_called_once_with("Jane Smith")
+<<<<<<< HEAD:service/tests/unit/test_literature/test_openalex_client.py
+                assert any("Author resolved" in msg for msg in warnings)
+=======
+>>>>>>> 1794485dfcc48ad6b089f2b31eb788a021eadea5:service/tests/unit/test_utils/test_openalex_client.py
+
+    @pytest.mark.asyncio
</code_context>

<issue_to_address>
**issue (bug_risk):** 需要解决 OpenAlex client 测试中的合并冲突标记，以恢复有效的测试模块。

该测试文件中仍然存在未解决的合并冲突标记（大致位于 author/institution/source 过滤测试附近，以及文件后面的位置）。在当前状态下，文件无法通过解析，pytest 也无法收集这些测试。请解决这些冲突（选择你想要保留的断言，例如 `assert any(... "Author resolved" ...)` 检查，或按需恢复被删除的测试），并确保文件在合并前语法正确。
</issue_to_address>

### Comment 6
<location> `service/tests/unit/test_literature/test_base_client.py:16-25` </location>
<code_context>
+        """Dummy search implementation."""
+        return [], []
+
+<<<<<<< HEAD:service/tests/unit/test_literature/test_base_client.py
+=======
+    async def get_by_doi(self, doi: str) -> LiteratureWork | None:
+        """Dummy get_by_doi implementation."""
+        return None
+
+    async def get_by_id(self, work_id: str) -> LiteratureWork | None:
+        """Dummy get_by_id implementation."""
+        return None
+
+>>>>>>> 1794485dfcc48ad6b089f2b31eb788a021eadea5:service/tests/unit/test_utils/test_base_client.py
+
+class TestBaseLiteratureClientProtocol:
</code_context>

<issue_to_address>
**issue (testing):** 修复 BaseLiteratureClient 测试中的合并冲突，使其与当前的抽象接口保持一致。

该文件仍然包含合并冲突标记，并有两个互相冲突的 `ConcreteClient` 版本。当前的 `BaseLiteratureClient` 只定义了 `search`，因此依赖 `get_by_doi`/`get_by_id` 是抽象方法的测试不再有效。请通过保留一个与当前基类保持一致的 `ConcreteClient` 实现来解决冲突，并删除或调整那些基于已不再属于抽象接口的方法所写的测试。
</issue_to_address>

### Comment 7
<location> `service/app/utils/literature/openalex_client.py:239` </location>
<code_context>
+
+        return params
+
+    async def _resolve_author_id(self, author_name: str) -> tuple[str | None, bool, str]:
+        """
+        Two-step lookup: author name -> author ID
</code_context>

<issue_to_address>
**issue (complexity):** 可以考虑通过抽取公共辅助函数和数据驱动映射来重构 OpenAlex client，在保持行为完全一致的前提下去除重复的请求和过滤构造逻辑。

你可以在不改变行为的前提下，通过引入一些小的辅助函数和数据驱动映射，显著减少重复。

### 1. 将多个 `_resolve_*_id` 统一为一个通用辅助函数

三个解析方法的结构几乎完全相同。你可以把公共逻辑抽到一个 helper 中，然后保留简洁、可读性好的包装函数：

```python
async def _resolve_entity_id(
    self,
    *,
    kind: str,                 # "Author" / "Institution" / "Source"
    endpoint: str,             # "authors" / "institutions" / "sources"
    name: str,
    not_found_suggestions: str,
) -> tuple[str | None, bool, str]:
    async with self.rate_limiter:
        try:
            url = f"{self.BASE_URL}/{endpoint}"
            params: dict[str, str] = {"search": name}
            if self.email:
                params["mailto"] = self.email
            response = await self._request_with_retry(url, params)

            if results := response.get("results", []):
                entity = results[0]
                entity_id = entity["id"].split("/")[-1]
                display = entity.get("display_name", name)
                logger.info(f"Resolved {kind.lower()} '{name}' -> {entity_id}")
                return entity_id, True, f"✓ {kind} resolved: '{name}' -> '{display}'"

            msg = f"⚠️ {kind} '{name}' not found. {not_found_suggestions}"
            logger.warning(msg)
            return None, False, msg

        except Exception as e:
            msg = f"⚠️ Failed to resolve {kind.lower()} '{name}': {e}"
            logger.warning(msg)
            return None, False, msg
```

然后具体的解析函数就可以变得简短且自说明：

```python
async def _resolve_author_id(self, author_name: str) -> tuple[str | None, bool, str]:
    return await self._resolve_entity_id(
        kind="Author",
        endpoint="authors",
        name=author_name,
        not_found_suggestions=(
            "Suggestions: (1) Try full name format like 'Smith, John' or 'John Smith', "
            "(2) Check spelling, (3) Try removing middle name/initial."
        ),
    )

async def _resolve_institution_id(self, institution_name: str) -> tuple[str | None, bool, str]:
    return await self._resolve_entity_id(
        kind="Institution",
        endpoint="institutions",
        name=institution_name,
        not_found_suggestions=(
            "Suggestions: (1) Use full official name (e.g., 'Harvard University' not 'Harvard'), "
            "(2) Try variations (e.g., 'MIT' vs 'Massachusetts Institute of Technology'), "
            "(3) Check spelling."
        ),
    )

async def _resolve_source_id(self, source_name: str) -> tuple[str | None, bool, str]:
    return await self._resolve_entity_id(
        kind="Source/Journal",
        endpoint="sources",
        name=source_name,
        not_found_suggestions=(
            "Suggestions: (1) Use full journal name (e.g., 'Nature' or 'Science'), "
            "(2) Try alternative names (e.g., 'JAMA' vs 'Journal of the American Medical Association'), "
            "(3) Check spelling."
        ),
    )
```

这样可以在保持所有消息和行为不变的前提下，移除大约一百行重复代码。

---

### 2. 将限流和重试封装到一个 `_get_json` helper 里

你已经有 `_request_with_retry`；再在其外层统一包一层限流逻辑可以简化所有调用点：

```python
async def _get_json(self, path: str, params: dict[str, str]) -> dict[str, Any]:
    async with self.rate_limiter:
        if self.email and "mailto" not in params:
            params = {**params, "mailto": self.email}
        url = f"{self.BASE_URL}/{path}"
        return await self._request_with_retry(url, params)
```

这样，例如 `_fetch_all_pages` 就可以写成：

```python
async def _fetch_all_pages(self, params: dict[str, str], max_results: int) -> list[dict[str, Any]]:
    all_works: list[dict[str, Any]] = []
    page = 1

    while len(all_works) < max_results:
        try:
            page_params = {**params, "page": str(page)}
            response = await self._get_json("works", page_params)

            works = response.get("results", [])
            if not works:
                break

            all_works.extend(works)
            logger.info(f"Fetched page {page}: {len(works)} works")

            meta = response.get("meta", {})
            total_count = meta.get("count", 0)
            if len(all_works) >= total_count:
                break

            page += 1
        except Exception as e:
            logger.error(f"Error fetching page {page}: {e}")
            break

    return all_works[:max_results]
```

同时，上面通用的解析函数也可以调用 `_get_json(endpoint, params)`，而无需自己构造 URL 并使用 `async with self.rate_limiter:`。

---

### 3. 让 `_build_query_params` 更加声明式

过滤条件的拼接可以通过小型映射来写得更短、更直观。例如，布尔类型过滤：

```python
bool_filters = {
    "is_oa": request.is_oa,
    "is_retracted": request.is_retracted,
    "has_abstract": request.has_abstract,
    "has_fulltext": request.has_fulltext,
}
for field, value in bool_filters.items():
    if value is not None:
        filters.append(f"{field}:{str(value).lower()}")
```

简单标量过滤也类似：

```python
scalar_filters = {
    "type": request.work_type,
    "language": request.language,
}
for field, value in scalar_filters.items():
    if value:
        filters.append(f"{field}:{value}")
```

基于 ID 的过滤：

```python
id_filters = {
    "authorships.author.id": author_id,
    "authorships.institutions.id": institution_id,
    "primary_location.source.id": source_id,
}
for field, value in id_filters.items():
    if value:
        filters.append(f"{field}:{value}")
```

综合起来，`_build_query_params` 可以在保持语义不变的情况下减少分支和重复，使得未来在 `SearchRequest` 字段与 OpenAlex 过滤器之间进行映射调整时更容易维护。
</issue_to_address>

### Comment 8
<location> `service/app/mcp/literature.py:139` </location>
<code_context>
+            year_warning += f"year_to {year_to_int}→1700. "
+            year_to_int = 1700
+
+        # Convert is_oa to boolean
+        is_oa_bool: bool | None = None
+        if is_oa is not None:
</code_context>

<issue_to_address>
**issue (complexity):** 可以考虑抽取参数规范化、布尔解析以及 work → dict 转换的公共辅助函数，在不改变行为的前提下简化 `search_literature` 和 `_format_search_result`。

你可以通过抽出几个小 helper，在保持行为不变的前提下减少复杂度和重复代码。

### 1. 去重字符串 → 布尔的转换逻辑

当前你对 `is_oa`、`is_retracted`、`has_abstract`、`has_fulltext` 和 `include_abstract` 都重复了一样的转换模式。可以提炼出一个统一工具函数：

```python
def _str_to_bool(value: str | bool | None) -> bool | None:
    if value is None:
        return None
    if isinstance(value, bool):
        return value
    return str(value).lower() in ("true", "1", "yes")
```

然后在 `search_literature` 中使用它：

```python
# Convert is_oa to boolean
is_oa_bool = _str_to_bool(is_oa)

# Convert is_retracted to boolean
is_retracted_bool = _str_to_bool(is_retracted)

# Convert has_abstract to boolean
has_abstract_bool = _str_to_bool(has_abstract)

# Convert has_fulltext to boolean
has_fulltext_bool = _str_to_bool(has_fulltext)

# Convert include_abstract to bool (keep default False)
include_abstract_bool = _str_to_bool(include_abstract) or False
```

这样可以保持行为不变，同时集中管理转换逻辑。

### 2. 将参数规范化与搜索编排解耦

`search_literature` 目前同时承担输入规范化和搜索编排两件事。可以把规范化部分抽到一个 helper 中，让顶层工具函数更简洁：

```python
def _normalize_search_params(
    query: str,
    mailto: str | None,
    author: str | None,
    institution: str | None,
    source: str | None,
    year_from: str | None,
    year_to: str | None,
    is_oa: str | None,
    work_type: str | None,
    language: str | None,
    is_retracted: str | None,
    has_abstract: str | None,
    has_fulltext: str | None,
    sort_by: str,
    max_results: str | int,
    data_sources: list[str] | None,
    include_abstract: str | bool,
) -> tuple[SearchRequest, str, bool, str | None]:
    # year parsing & clamping
    year_from_int = int(year_from) if year_from and str(year_from).strip() else None
    year_to_int = int(year_to) if year_to and str(year_to).strip() else None

    max_year = datetime.now().year + 1
    year_warning = ""
    if year_from_int is not None and year_from_int > max_year:
        year_warning += f"year_from {year_from_int}→{max_year}. "
        year_from_int = max_year
    if year_to_int is not None and year_to_int < 1700:
        year_warning += f"year_to {year_to_int}→1700. "
        year_to_int = 1700

    request = SearchRequest(
        query=query,
        author=author,
        institution=institution,
        source=source,
        year_from=year_from_int,
        year_to=year_to_int,
        is_oa=_str_to_bool(is_oa),
        work_type=work_type,
        language=language,
        is_retracted=_str_to_bool(is_retracted),
        has_abstract=_str_to_bool(has_abstract),
        has_fulltext=_str_to_bool(has_fulltext),
        sort_by=sort_by,
        max_results=int(max_results) if max_results else 50,
        data_sources=data_sources,
    )

    include_abstract_bool = _str_to_bool(include_abstract) or False
    openalex_email = mailto.strip() if mailto and str(mailto).strip() else None

    return request, year_warning, include_abstract_bool, openalex_email
```

这样 `search_literature` 就可以简化为：

```python
@mcp.tool()
async def search_literature(... ) -> str:
    try:
        request, year_warning, include_abstract_bool, openalex_email = _normalize_search_params(
            query,
            mailto,
            author,
            institution,
            source,
            year_from,
            year_to,
            is_oa,
            work_type,
            language,
            is_retracted,
            has_abstract,
            has_fulltext,
            sort_by,
            max_results,
            data_sources,
            include_abstract,
        )

        logger.info(
            "Literature search requested: query='%s', mailto=%s, max_results=%s",
            request.query,
            openalex_email,
            request.max_results,
        )

        async with WorkDistributor(openalex_email=openalex_email) as distributor:
            result = await distributor.search(request)

        if year_warning:
            result.setdefault("warnings", []).append(f"⚠️ Year adjusted: {year_warning.strip()}")

        return _format_search_result(request, result, include_abstract_bool)
    ...
```

这样可以在保证行为完全一致的同时，实现关注点分离。

### 3. 将 work → dict 的转换从 `_format_search_result` 中抽离

JSON 区块当前将格式化逻辑和数据形状逻辑混在一起。可以把映射提取到一个小 helper 中：

```python
def _works_to_public_dicts(works: list, include_abstract: bool) -> list[dict[str, Any]]:
    items: list[dict[str, Any]] = []
    for work in works:
        work_data: dict[str, Any] = {
            "id": work.id,
            "doi": work.doi,
            "title": work.title,
            "authors": work.authors[:5],
            "publication_year": work.publication_year,
            "cited_by_count": work.cited_by_count,
            "journal": work.journal,
            "is_oa": work.is_oa,
            "oa_url": work.oa_url,
            "source": work.source,
        }
        if include_abstract and work.abstract:
            work_data["abstract"] = work.abstract
        items.append(work_data)
    return items
```

然后在 `_format_search_result` 中：

```python
sections.append("```json")
works_dict = _works_to_public_dicts(works, include_abstract)
sections.append(json.dumps(works_dict, indent=2, ensure_ascii=False))
sections.append("```")
```

这样可以缩短 `_format_search_result`，并为以后演进 JSON 表示提供一个单一的修改入口。
</issue_to_address>

Sourcery 对开源项目免费——如果你觉得这次评审有帮助，欢迎分享 ✨

_{帮我变得更有用！请在每条评论上点 👍 或 👎，我会根据你的反馈改进后续的评审。}

Original comment in English

Hey - I've found 8 issues, and left some high level feedback:

There are unresolved merge conflict markers (e.g., <<<<<<< HEAD / >>>>>>>) left in several new test files such as test_openalex_client.py and test_base_client.py; these need to be resolved and the intended assertions/code retained before merging.
The tests add expectations for BaseLiteratureClient.get_by_doi/get_by_id and corresponding OpenAlexClient behavior, but the BaseLiteratureClient and OpenAlexClient implementations only define search; either remove/adjust these tests or add the corresponding abstract methods and concrete implementations.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- There are unresolved merge conflict markers (e.g., `<<<<<<< HEAD` / `>>>>>>>`) left in several new test files such as `test_openalex_client.py` and `test_base_client.py`; these need to be resolved and the intended assertions/code retained before merging.
- The tests add expectations for `BaseLiteratureClient.get_by_doi`/`get_by_id` and corresponding OpenAlexClient behavior, but the `BaseLiteratureClient` and `OpenAlexClient` implementations only define `search`; either remove/adjust these tests or add the corresponding abstract methods and concrete implementations.

## Individual Comments

### Comment 1
<location> `service/app/utils/literature/openalex_client.py:266-275` </location>
<code_context>
+    sections.append("## Complete Works List (JSON)\n")
+    if include_abstract:
+        sections.append("The following JSON contains all works with full abstracts:\n")
+    else:
+        sections.append("The following JSON contains all works (abstracts excluded to save tokens):\n")
+    sections.append("```json")
</code_context>

<issue_to_address>
**issue (bug_risk):** 4xx errors (other than 403) are retried despite the comment saying they should not be.

Because `response.raise_for_status()` is wrapped by a generic `except Exception`, any non-403 4xx currently enters the retry loop, contradicting the “don’t retry on 4xx (except 403)” contract. Please either short‑circuit 4xx (e.g., `400 <= status_code < 500 and status_code != 403` → raise/return without retry) or narrow the `except` so `raise_for_status()` failures aren’t treated as transient errors.
</issue_to_address>

### Comment 2
<location> `service/app/utils/literature/work_distributor.py:90-95` </location>
<code_context>
+        """
+        # Clamp max_results to 50/1000 with warnings
+        all_warnings: list[str] = []
+        if request.max_results < 1:
+            all_warnings.append("⚠️ max_results < 1; using default 50")
+            request.max_results = 50
+        elif request.max_results > 1000:
+            all_warnings.append("⚠️ max_results > 1000; using 1000")
+            request.max_results = 1000
+
+        # Determine which data sources to use
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Mutating `request.max_results` in-place inside the distributor may have surprising side effects for callers.

Because the distributor writes back into the shared `SearchRequest` object, any caller that reuses this instance will see a different `max_results` than it originally set, which is subtle and hard to debug. Consider deriving a local `max_results` (or working on a shallow copy) instead of mutating the incoming request.

Suggested implementation:

```python
        # Clamp max_results to 50/1000 with warnings, without mutating the request
        all_warnings: list[str] = []
        max_results = request.max_results
        if max_results < 1:
            all_warnings.append("⚠️ max_results < 1; using default 50")
            max_results = 50
        elif max_results > 1000:
            all_warnings.append("⚠️ max_results > 1000; using 1000")
            max_results = 1000

        # Determine which data sources to use
        sources = request.data_sources or ["openalex"]

```

Anywhere later in this method where `request.max_results` is used to control pagination or limit result counts should be updated to use the local `max_results` variable instead. For example, calls like `client.search(request, ...)` that rely on `request.max_results` should either (a) pass `max_results` explicitly as an argument, or (b) construct a shallow copy of `request` with the adjusted `max_results` to keep the original object immutable for callers.
</issue_to_address>

### Comment 3
<location> `service/app/utils/literature/openalex_client.py:274-283` </location>
<code_context>
+    except httpx.HTTPError as e:
+        logger.error(f"Literature search network error: {e}", exc_info=True)
+        return "❌ Network error while contacting literature sources. Please try again later."
+    except Exception as e:
+        logger.error(f"Literature search failed: {e}", exc_info=True)
+        return "❌ Unexpected error during search. Please retry or contact support."
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Page-fetch errors are swallowed, potentially returning partial results with no surfaced warning.

Currently, `_fetch_all_pages` logs exceptions and breaks the loop, returning only the pages fetched so far. To avoid silently treating partial results as a successful full search, either propagate the exception so callers can handle it explicitly, or return a structured result that includes both the works and an explicit partial-results/error indicator for the caller to surface.
</issue_to_address>

### Comment 4
<location> `service/app/mcp/literature.py:261-262` </location>
<code_context>
+        conditions.append(f"- **Work Type**: {request.work_type}")
+    if request.language:
+        conditions.append(f"- **Language**: {request.language}")
+    if request.is_retracted is not None:
+        conditions.append(f"- **Exclude Retracted**: {'No' if request.is_retracted else 'Yes'}")
+    if request.has_abstract is not None:
+        conditions.append(f"- **Require Abstract**: {'Yes' if request.has_abstract else 'No'}")
</code_context>

<issue_to_address>
**suggestion:** The "Exclude Retracted" label is inverted/confusing relative to the `is_retracted` flag semantics.

Given the docstring, `is_retracted=True` means “only retracted” and `False` means “exclude retracted”. With the current rendering, `Exclude Retracted` will show `No` when `True` and `Yes` when `False`, which is logically consistent but hard to interpret. Consider renaming/rephrasing this line to describe the actual filter mode (e.g., `Retracted Filter: only retracted / exclude retracted`) rather than expressing it as an inverted “Exclude” flag.

```suggestion
    if request.is_retracted is not None:
        retracted_mode = "Only retracted" if request.is_retracted else "Exclude retracted"
        conditions.append(f"- **Retracted Filter**: {retracted_mode}")
```
</issue_to_address>

### Comment 5
<location> `service/tests/unit/test_literature/test_openalex_client.py:126-129` </location>
<code_context>
+
+                assert len(works) == 1
+                mock_resolve.assert_called_once_with("Jane Smith")
+<<<<<<< HEAD:service/tests/unit/test_literature/test_openalex_client.py
+                assert any("Author resolved" in msg for msg in warnings)
+=======
+>>>>>>> 1794485dfcc48ad6b089f2b31eb788a021eadea5:service/tests/unit/test_utils/test_openalex_client.py
+
+    @pytest.mark.asyncio
</code_context>

<issue_to_address>
**issue (bug_risk):** Resolve merge conflict markers in OpenAlex client tests to restore a valid test module.

There are unresolved merge conflict markers in this test file (around the author/institution/source filter tests and later in the file). In its current state the file will not parse and pytest will not collect these tests. Please resolve the conflicts (pick the intended assertions, e.g. the `assert any(... "Author resolved" ...)` checks, or restore any removed tests as needed) and ensure the file is syntactically valid before merging.
</issue_to_address>

### Comment 6
<location> `service/tests/unit/test_literature/test_base_client.py:16-25` </location>
<code_context>
+        """Dummy search implementation."""
+        return [], []
+
+<<<<<<< HEAD:service/tests/unit/test_literature/test_base_client.py
+=======
+    async def get_by_doi(self, doi: str) -> LiteratureWork | None:
+        """Dummy get_by_doi implementation."""
+        return None
+
+    async def get_by_id(self, work_id: str) -> LiteratureWork | None:
+        """Dummy get_by_id implementation."""
+        return None
+
+>>>>>>> 1794485dfcc48ad6b089f2b31eb788a021eadea5:service/tests/unit/test_utils/test_base_client.py
+
+class TestBaseLiteratureClientProtocol:
</code_context>

<issue_to_address>
**issue (testing):** Fix merge conflict in BaseLiteratureClient tests so they match the current abstract interface.

This file still has merge conflict markers and two conflicting `ConcreteClient` versions. The current `BaseLiteratureClient` only defines `search`, so tests relying on `get_by_doi`/`get_by_id` being abstract will no longer be valid. Please resolve the conflict by keeping a single `ConcreteClient` implementation aligned with the current base class and remove or adjust tests for methods that are no longer part of the abstraction.
</issue_to_address>

### Comment 7
<location> `service/app/utils/literature/openalex_client.py:239` </location>
<code_context>
+
+        return params
+
+    async def _resolve_author_id(self, author_name: str) -> tuple[str | None, bool, str]:
+        """
+        Two-step lookup: author name -> author ID
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring the OpenAlex client to use shared helpers and data-driven mappings to remove repeated request and filter-building logic while keeping behavior identical.

You can reduce duplication quite a bit without changing behavior by introducing a couple of small helpers and some data‑driven mappings.

### 1. Unify `_resolve_*_id` into a generic helper

All three resolution methods share the same structure. You can factor out the common logic and keep thin, readable wrappers:

```python
async def _resolve_entity_id(
    self,
    *,
    kind: str,                 # "Author" / "Institution" / "Source"
    endpoint: str,             # "authors" / "institutions" / "sources"
    name: str,
    not_found_suggestions: str,
) -> tuple[str | None, bool, str]:
    async with self.rate_limiter:
        try:
            url = f"{self.BASE_URL}/{endpoint}"
            params: dict[str, str] = {"search": name}
            if self.email:
                params["mailto"] = self.email
            response = await self._request_with_retry(url, params)

            if results := response.get("results", []):
                entity = results[0]
                entity_id = entity["id"].split("/")[-1]
                display = entity.get("display_name", name)
                logger.info(f"Resolved {kind.lower()} '{name}' -> {entity_id}")
                return entity_id, True, f"✓ {kind} resolved: '{name}' -> '{display}'"

            msg = f"⚠️ {kind} '{name}' not found. {not_found_suggestions}"
            logger.warning(msg)
            return None, False, msg

        except Exception as e:
            msg = f"⚠️ Failed to resolve {kind.lower()} '{name}': {e}"
            logger.warning(msg)
            return None, False, msg
```

Then your specific resolvers become small and self‑documenting:

```python
async def _resolve_author_id(self, author_name: str) -> tuple[str | None, bool, str]:
    return await self._resolve_entity_id(
        kind="Author",
        endpoint="authors",
        name=author_name,
        not_found_suggestions=(
            "Suggestions: (1) Try full name format like 'Smith, John' or 'John Smith', "
            "(2) Check spelling, (3) Try removing middle name/initial."
        ),
    )

async def _resolve_institution_id(self, institution_name: str) -> tuple[str | None, bool, str]:
    return await self._resolve_entity_id(
        kind="Institution",
        endpoint="institutions",
        name=institution_name,
        not_found_suggestions=(
            "Suggestions: (1) Use full official name (e.g., 'Harvard University' not 'Harvard'), "
            "(2) Try variations (e.g., 'MIT' vs 'Massachusetts Institute of Technology'), "
            "(3) Check spelling."
        ),
    )

async def _resolve_source_id(self, source_name: str) -> tuple[str | None, bool, str]:
    return await self._resolve_entity_id(
        kind="Source/Journal",
        endpoint="sources",
        name=source_name,
        not_found_suggestions=(
            "Suggestions: (1) Use full journal name (e.g., 'Nature' or 'Science'), "
            "(2) Try alternative names (e.g., 'JAMA' vs 'Journal of the American Medical Association'), "
            "(3) Check spelling."
        ),
    )
```

This keeps all messages and behavior intact while removing ~100 lines of repetition.

---

### 2. Hide rate limiting + retry behind a single `_get_json` helper

You already have `_request_with_retry`; wrapping the rate limiter once will simplify all call sites:

```python
async def _get_json(self, path: str, params: dict[str, str]) -> dict[str, Any]:
    async with self.rate_limiter:
        if self.email and "mailto" not in params:
            params = {**params, "mailto": self.email}
        url = f"{self.BASE_URL}/{path}"
        return await self._request_with_retry(url, params)
```

Then, for example, `_fetch_all_pages` becomes:

```python
async def _fetch_all_pages(self, params: dict[str, str], max_results: int) -> list[dict[str, Any]]:
    all_works: list[dict[str, Any]] = []
    page = 1

    while len(all_works) < max_results:
        try:
            page_params = {**params, "page": str(page)}
            response = await self._get_json("works", page_params)

            works = response.get("results", [])
            if not works:
                break

            all_works.extend(works)
            logger.info(f"Fetched page {page}: {len(works)} works")

            meta = response.get("meta", {})
            total_count = meta.get("count", 0)
            if len(all_works) >= total_count:
                break

            page += 1
        except Exception as e:
            logger.error(f"Error fetching page {page}: {e}")
            break

    return all_works[:max_results]
```

And the generic resolver above can call `_get_json(endpoint, params)` instead of constructing URLs and `async with self.rate_limiter:` itself.

---

### 3. Make `_build_query_params` more declarative

The filter construction can be made shorter and more obvious using small mappings. For example, boolean filters:

```python
bool_filters = {
    "is_oa": request.is_oa,
    "is_retracted": request.is_retracted,
    "has_abstract": request.has_abstract,
    "has_fulltext": request.has_fulltext,
}
for field, value in bool_filters.items():
    if value is not None:
        filters.append(f"{field}:{str(value).lower()}")
```

Likewise for simple scalar filters:

```python
scalar_filters = {
    "type": request.work_type,
    "language": request.language,
}
for field, value in scalar_filters.items():
    if value:
        filters.append(f"{field}:{value}")
```

And for the ID‑based filters:

```python
id_filters = {
    "authorships.author.id": author_id,
    "authorships.institutions.id": institution_id,
    "primary_location.source.id": source_id,
}
for field, value in id_filters.items():
    if value:
        filters.append(f"{field}:{value}")
```

Combined, `_build_query_params` keeps the same semantics but with fewer branches and less repetition, making it easier to adjust the mapping from `SearchRequest` fields to OpenAlex filters later.
</issue_to_address>

### Comment 8
<location> `service/app/mcp/literature.py:139` </location>
<code_context>
+            year_warning += f"year_to {year_to_int}→1700. "
+            year_to_int = 1700
+
+        # Convert is_oa to boolean
+        is_oa_bool: bool | None = None
+        if is_oa is not None:
</code_context>

<issue_to_address>
**issue (complexity):** Consider extracting shared helpers for parameter normalization, boolean parsing, and work-to-dict conversion to simplify `search_literature` and `_format_search_result` while keeping behavior unchanged.

You can reduce complexity and duplication without changing behavior by extracting a few small helpers.

### 1. Deduplicate string→bool conversions

You currently repeat the same pattern for `is_oa`, `is_retracted`, `has_abstract`, `has_fulltext`, and `include_abstract`. Extract a single utility:

```python
def _str_to_bool(value: str | bool | None) -> bool | None:
    if value is None:
        return None
    if isinstance(value, bool):
        return value
    return str(value).lower() in ("true", "1", "yes")
```

Then use it in `search_literature`:

```python
# Convert is_oa to boolean
is_oa_bool = _str_to_bool(is_oa)

# Convert is_retracted to boolean
is_retracted_bool = _str_to_bool(is_retracted)

# Convert has_abstract to boolean
has_abstract_bool = _str_to_bool(has_abstract)

# Convert has_fulltext to boolean
has_fulltext_bool = _str_to_bool(has_fulltext)

# Convert include_abstract to bool (keep default False)
include_abstract_bool = _str_to_bool(include_abstract) or False
```

This keeps all behavior but centralizes the conversion logic.

### 2. Isolate parameter normalization from orchestration

`search_literature` does both input normalization and search orchestration. You can pull the normalization into a helper and keep the top-level tool thin:

```python
def _normalize_search_params(
    query: str,
    mailto: str | None,
    author: str | None,
    institution: str | None,
    source: str | None,
    year_from: str | None,
    year_to: str | None,
    is_oa: str | None,
    work_type: str | None,
    language: str | None,
    is_retracted: str | None,
    has_abstract: str | None,
    has_fulltext: str | None,
    sort_by: str,
    max_results: str | int,
    data_sources: list[str] | None,
    include_abstract: str | bool,
) -> tuple[SearchRequest, str, bool, str | None]:
    # year parsing & clamping
    year_from_int = int(year_from) if year_from and str(year_from).strip() else None
    year_to_int = int(year_to) if year_to and str(year_to).strip() else None

    max_year = datetime.now().year + 1
    year_warning = ""
    if year_from_int is not None and year_from_int > max_year:
        year_warning += f"year_from {year_from_int}→{max_year}. "
        year_from_int = max_year
    if year_to_int is not None and year_to_int < 1700:
        year_warning += f"year_to {year_to_int}→1700. "
        year_to_int = 1700

    request = SearchRequest(
        query=query,
        author=author,
        institution=institution,
        source=source,
        year_from=year_from_int,
        year_to=year_to_int,
        is_oa=_str_to_bool(is_oa),
        work_type=work_type,
        language=language,
        is_retracted=_str_to_bool(is_retracted),
        has_abstract=_str_to_bool(has_abstract),
        has_fulltext=_str_to_bool(has_fulltext),
        sort_by=sort_by,
        max_results=int(max_results) if max_results else 50,
        data_sources=data_sources,
    )

    include_abstract_bool = _str_to_bool(include_abstract) or False
    openalex_email = mailto.strip() if mailto and str(mailto).strip() else None

    return request, year_warning, include_abstract_bool, openalex_email
```

Then `search_literature` becomes simpler:

```python
@mcp.tool()
async def search_literature(... ) -> str:
    try:
        request, year_warning, include_abstract_bool, openalex_email = _normalize_search_params(
            query,
            mailto,
            author,
            institution,
            source,
            year_from,
            year_to,
            is_oa,
            work_type,
            language,
            is_retracted,
            has_abstract,
            has_fulltext,
            sort_by,
            max_results,
            data_sources,
            include_abstract,
        )

        logger.info(
            "Literature search requested: query='%s', mailto=%s, max_results=%s",
            request.query,
            openalex_email,
            request.max_results,
        )

        async with WorkDistributor(openalex_email=openalex_email) as distributor:
            result = await distributor.search(request)

        if year_warning:
            result.setdefault("warnings", []).append(f"⚠️ Year adjusted: {year_warning.strip()}")

        return _format_search_result(request, result, include_abstract_bool)
    ...
```

This keeps behavior identical but separates concerns.

### 3. Extract work→dict conversion from `_format_search_result`

The JSON block mixes formatting with data-shape logic. You can pull the mapping into a small helper:

```python
def _works_to_public_dicts(works: list, include_abstract: bool) -> list[dict[str, Any]]:
    items: list[dict[str, Any]] = []
    for work in works:
        work_data: dict[str, Any] = {
            "id": work.id,
            "doi": work.doi,
            "title": work.title,
            "authors": work.authors[:5],
            "publication_year": work.publication_year,
            "cited_by_count": work.cited_by_count,
            "journal": work.journal,
            "is_oa": work.is_oa,
            "oa_url": work.oa_url,
            "source": work.source,
        }
        if include_abstract and work.abstract:
            work_data["abstract"] = work.abstract
        items.append(work_data)
    return items
```

Then in `_format_search_result`:

```python
sections.append("```json")
works_dict = _works_to_public_dicts(works, include_abstract)
sections.append(json.dumps(works_dict, indent=2, ensure_ascii=False))
sections.append("```")
```

This makes `_format_search_result` shorter and gives you a single place to evolve the JSON representation later.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-01-21T09:24:39Z

service/app/utils/literature/openalex_client.py

+                else:
+                    msg = (
+                        f"⚠️ Author '{author_name}' not found. "
+                        f"Suggestions: (1) Try full name format like 'Smith, John' or 'John Smith', "
+                        f"(2) Check spelling, (3) Try removing middle name/initial."
+                    )
+                    logger.warning(msg)
+                    return None, False, msg
+            except Exception as e:
+                msg = f"⚠️ Failed to resolve author '{author_name}': {e}"


issue (bug_risk): 除 403 以外的 4xx 错误仍会被重试，这与注释所声称的行为不一致。

由于 response.raise_for_status() 被包装在一个通用的 except Exception 中，任何非 403 的 4xx 目前都会进入重试循环，这和“不要在 4xx（403 除外）时重试”的约定相矛盾。请要么在 4xx 时直接短路（例如对 400 <= status_code < 500 且 status_code != 403 直接抛出/返回、不再重试），要么收窄 except 的范围，使 raise_for_status() 抛出的错误不会被当作瞬时错误来重试。

Original comment in English

issue (bug_risk): 4xx errors (other than 403) are retried despite the comment saying they should not be.

Because response.raise_for_status() is wrapped by a generic except Exception, any non-403 4xx currently enters the retry loop, contradicting the “don’t retry on 4xx (except 403)” contract. Please either short‑circuit 4xx (e.g., 400 <= status_code < 500 and status_code != 403 → raise/return without retry) or narrow the except so raise_for_status() failures aren’t treated as transient errors.

sourcery-ai · 2026-01-21T09:24:39Z

service/app/utils/literature/work_distributor.py

+        if request.max_results < 1:
+            all_warnings.append("⚠️ max_results < 1; using default 50")
+            request.max_results = 50
+        elif request.max_results > 1000:
+            all_warnings.append("⚠️ max_results > 1000; using 1000")
+            request.max_results = 1000


suggestion (bug_risk): 在分发器内部原地修改 request.max_results 可能会给调用方带来意料之外的副作用。

由于分发器会回写到共享的 SearchRequest 对象中，任何重用该实例的调用方都会看到与最初设置不同的 max_results，这既隐蔽又难以排查。建议派生出一个本地的 max_results（或在一个浅拷贝上操作），而不是修改传入的请求对象。

Suggested implementation:

# Clamp max_results to 50/1000 with warnings, without mutating the request all_warnings: list[str] = [] max_results = request.max_results if max_results < 1: all_warnings.append("⚠️ max_results < 1; using default 50") max_results = 50 elif max_results > 1000: all_warnings.append("⚠️ max_results > 1000; using 1000") max_results = 1000 # Determine which data sources to use sources = request.data_sources or ["openalex"]

在本方法后面任何使用 request.max_results 控制分页或限制结果数量的地方，都应改为使用本地的 max_results 变量。例如，类似 client.search(request, ...) 的调用如果依赖 request.max_results，要么显式把 max_results 作为参数传入，要么构造一个带有调整后 max_results 的请求浅拷贝，以保证对调用方来说原始对象保持不变。

Original comment in English

suggestion (bug_risk): Mutating request.max_results in-place inside the distributor may have surprising side effects for callers.

Because the distributor writes back into the shared SearchRequest object, any caller that reuses this instance will see a different max_results than it originally set, which is subtle and hard to debug. Consider deriving a local max_results (or working on a shallow copy) instead of mutating the incoming request.

Suggested implementation:

# Clamp max_results to 50/1000 with warnings, without mutating the request all_warnings: list[str] = [] max_results = request.max_results if max_results < 1: all_warnings.append("⚠️ max_results < 1; using default 50") max_results = 50 elif max_results > 1000: all_warnings.append("⚠️ max_results > 1000; using 1000") max_results = 1000 # Determine which data sources to use sources = request.data_sources or ["openalex"]

Anywhere later in this method where request.max_results is used to control pagination or limit result counts should be updated to use the local max_results variable instead. For example, calls like client.search(request, ...) that rely on request.max_results should either (a) pass max_results explicitly as an argument, or (b) construct a shallow copy of request with the adjusted max_results to keep the original object immutable for callers.

sourcery-ai · 2026-01-21T09:24:39Z

service/app/utils/literature/openalex_client.py

+            except Exception as e:
+                msg = f"⚠️ Failed to resolve author '{author_name}': {e}"
+                logger.warning(msg)
+                return None, False, msg
+
+    async def _resolve_institution_id(self, institution_name: str) -> tuple[str | None, bool, str]:
+        """
+        Two-step lookup: institution name -> institution ID
+
+        Args:


suggestion (bug_risk): 抓取页面时的错误会被吞掉，可能在没有任何显式警告的情况下返回部分结果。

当前 _fetch_all_pages 会记录异常并跳出循环，只返回目前已获取到的页面。为避免将“部分结果”在静默情况下当作“完整搜索成功”来处理，要么让异常向上传递，以便调用方显式处理，要么返回一个结构化的结果，其中既包含 works，又包含明确的“部分结果/错误”标记，供调用方对外展示。

Original comment in English

suggestion (bug_risk): Page-fetch errors are swallowed, potentially returning partial results with no surfaced warning.

Currently, _fetch_all_pages logs exceptions and breaks the loop, returning only the pages fetched so far. To avoid silently treating partial results as a successful full search, either propagate the exception so callers can handle it explicitly, or return a structured result that includes both the works and an explicit partial-results/error indicator for the caller to surface.

sourcery-ai · 2026-01-21T09:24:39Z

service/app/mcp/literature.py

+    if request.is_retracted is not None:
+        conditions.append(f"- **Exclude Retracted**: {'No' if request.is_retracted else 'Yes'}")


suggestion: “Exclude Retracted” 这个标签与 is_retracted 标志位的语义相反/容易令人困惑。

根据 docstring，is_retracted=True 表示“只要已撤稿的文献”，False 表示“排除已撤稿”。在当前渲染方式下，当 is_retracted=True 时会显示 Exclude Retracted: No，而 False 时显示 Yes，虽然在逻辑上自洽，但很难理解。建议重命名/重写这一行，用真正的过滤模式来描述（例如 Retracted Filter: only retracted / exclude retracted），而不是用一个语义反转的 “Exclude” 标志来表达。

Suggested change

if request.is_retracted is not None:

conditions.append(f"- **Exclude Retracted**: {'No' if request.is_retracted else 'Yes'}")

if request.is_retracted is not None:

retracted_mode = "Only retracted" if request.is_retracted else "Exclude retracted"

conditions.append(f"- **Retracted Filter**: {retracted_mode}")

Original comment in English

suggestion: The "Exclude Retracted" label is inverted/confusing relative to the is_retracted flag semantics.

Given the docstring, is_retracted=True means “only retracted” and False means “exclude retracted”. With the current rendering, Exclude Retracted will show No when True and Yes when False, which is logically consistent but hard to interpret. Consider renaming/rephrasing this line to describe the actual filter mode (e.g., Retracted Filter: only retracted / exclude retracted) rather than expressing it as an inverted “Exclude” flag.

Suggested change

if request.is_retracted is not None:

conditions.append(f"- **Exclude Retracted**: {'No' if request.is_retracted else 'Yes'}")

if request.is_retracted is not None:

retracted_mode = "Only retracted" if request.is_retracted else "Exclude retracted"

conditions.append(f"- **Retracted Filter**: {retracted_mode}")

sourcery-ai · 2026-01-21T09:24:40Z

service/tests/unit/test_literature/test_openalex_client.py

+<<<<<<< HEAD:service/tests/unit/test_literature/test_openalex_client.py
+                assert any("Author resolved" in msg for msg in warnings)
+=======
+>>>>>>> 1794485dfcc48ad6b089f2b31eb788a021eadea5:service/tests/unit/test_utils/test_openalex_client.py


issue (bug_risk): 需要解决 OpenAlex client 测试中的合并冲突标记，以恢复有效的测试模块。

该测试文件中仍然存在未解决的合并冲突标记（大致位于 author/institution/source 过滤测试附近，以及文件后面的位置）。在当前状态下，文件无法通过解析，pytest 也无法收集这些测试。请解决这些冲突（选择你想要保留的断言，例如 assert any(... "Author resolved" ...) 检查，或按需恢复被删除的测试），并确保文件在合并前语法正确。

Original comment in English

issue (bug_risk): Resolve merge conflict markers in OpenAlex client tests to restore a valid test module.

There are unresolved merge conflict markers in this test file (around the author/institution/source filter tests and later in the file). In its current state the file will not parse and pytest will not collect these tests. Please resolve the conflicts (pick the intended assertions, e.g. the assert any(... "Author resolved" ...) checks, or restore any removed tests as needed) and ensure the file is syntactically valid before merging.

sourcery-ai · 2026-01-21T09:24:40Z

service/tests/unit/test_literature/test_base_client.py

+<<<<<<< HEAD:service/tests/unit/test_literature/test_base_client.py
+=======
+    async def get_by_doi(self, doi: str) -> LiteratureWork | None:
+        """Dummy get_by_doi implementation."""
+        return None
+
+    async def get_by_id(self, work_id: str) -> LiteratureWork | None:
+        """Dummy get_by_id implementation."""
+        return None
+


issue (testing): 修复 BaseLiteratureClient 测试中的合并冲突，使其与当前的抽象接口保持一致。

该文件仍然包含合并冲突标记，并有两个互相冲突的 ConcreteClient 版本。当前的 BaseLiteratureClient 只定义了 search，因此依赖 get_by_doi/get_by_id 是抽象方法的测试不再有效。请通过保留一个与当前基类保持一致的 ConcreteClient 实现来解决冲突，并删除或调整那些基于已不再属于抽象接口的方法所写的测试。

Original comment in English

issue (testing): Fix merge conflict in BaseLiteratureClient tests so they match the current abstract interface.

This file still has merge conflict markers and two conflicting ConcreteClient versions. The current BaseLiteratureClient only defines search, so tests relying on get_by_doi/get_by_id being abstract will no longer be valid. Please resolve the conflict by keeping a single ConcreteClient implementation aligned with the current base class and remove or adjust tests for methods that are no longer part of the abstraction.

SoberPizza added 9 commits January 19, 2026 22:11

feat: Implement Literature MCP for multi-source academic literature s…

f23b491

…earch

feat: Implement Literature MCP for multi-source academic literature s…

80f6458

…earch (skip pre-commit due to Pyright false positive on __getattr__)

Delete .gitignore

d6f77c1

Delete service/app/utils/literature/README.md

1794485

feat: Implement Literature MCP for multi-source academic literature s…

c9559f5

…earch

feat: Implement Literature MCP for multi-source academic literature s…

f95aa9f

…earch (skip pre-commit due to Pyright false positive on __getattr__)

Delete service/app/utils/literature/README.md

3fafbb3

refactor: Recover .gitignore, clean up test utilities and remove unus…

0e1c3f0

…ed files

Merge branch 'feature/literature-MCP' of https://github.com/ScienceOL…

17e55bf

…/Xyzen into feature/literature-MCP

sourcery-ai bot reviewed Jan 21, 2026

View reviewed changes

SoberPizza closed this Jan 21, 2026

SoberPizza deleted the feature/literature-MCP branch January 21, 2026 14:06

		if request.is_retracted is not None:
		conditions.append(f"- Exclude Retracted: {'No' if request.is_retracted else 'Yes'}")

Feature/literature mcp #189

Feature/literature mcp #189

Uh oh!

Conversation

SoberPizza commented Jan 21, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

审查者指南

文献 MCP 检索流程的时序图

新文献检索子系统的类图

文件级变更

与 Sourcery 交互

自定义使用体验

获取帮助

Reviewer's Guide

Sequence diagram for literature MCP search flow

Class diagram for new literature search subsystem

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SoberPizza commented Jan 21, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jan 21, 2026 •

edited

Loading