Skip to content

Conversation

@ImoutoHeaven
Copy link
Contributor

  1. 实现 TaskQueueManager 用于异步索引操作
  2. 队列化更新任务并每 30 秒批量处理
  3. 执行新操作前检查待处理任务状态
  4. 优化批量索引和删除逻辑(删除逻辑仅优化了Meilisearch)
  5. 修复 buildSearchDocumentFromResults 中的赋值问题

Description / 描述

新增文件

  • internal/search/meilisearch/task_queue.go - 任务队列管理器实现

修改文件

  • internal/search/build.go

    • 为 Meilisearch 添加任务队列调用逻辑
    • 实现批量索引新增文件/文件夹
    • 移除自动更新时的递归 BuildIndex 调用(需要确认是否符合设计要求)
  • internal/search/meilisearch/init.go

    • 初始化并启动 TaskQueueManager
  • internal/search/meilisearch/search.go

    • 为 Meilisearch 结构体添加 taskQueue 字段
    • 实现 EnqueueUpdate 方法用于入队更新任务
    • 添加 batchIndexWithTaskUID 和 batchDeleteWithTaskUID 用于批量操作并追踪任务
  • internal/search/meilisearch/utils.go

    • 修复 buildSearchDocumentFromResults 中的赋值问题

Motivation and Context / 背景

  1. Meilisearch 提交任务时是异步的,这会导致任务在队列里排队。如果此时用户再次访问文件夹,而之前排队的任务还没执行时,会导致索引和传入的文件夹包含的真实 Objs 对不上,从而重复提交索引任务。在严重情况下,这会淹没任务队列。

  2. 原有的逐个更新索引逻辑效率较低,且自动更新索引遇到文件夹时调用 BuildIndex,对于 Meilisearch 会导致同样的问题,即可能重复提交相同路径的索引请求任务,因此改为将folder和file都视为需要更新的toAddObjs,直接调用BatchIndex索引。

需要确认: 文件夹不再调用BuildIndex后,会失去递归更新文件夹内子文件夹内容的功能。

Closes #1417

How Has This Been Tested? / 测试

触发了TestBuild,已经编译通过。

在实机上部署后,利用Python脚本遍历整个OpenList站点,极短时间内发起了几千个/api/fs/list,之后再次执行遍历脚本,观察Meilisearch的任务队列没有爆炸性增长,且在/api/fs/list持续触发的时候任务队列仍在持续减少,这代表不再有重复的任务被提交和排队。

Checklist / 检查清单

  • I have read the CONTRIBUTING document.
    我已阅读 CONTRIBUTING 文档。
  • I have formatted my code with go fmt or prettier.
    我已使用 go fmtprettier 格式化提交的代码。
  • I have added appropriate labels to this PR (or mentioned needed labels in the description if lacking permissions).
    我已为此 PR 添加了适当的标签(如无权限或需要的标签不存在,请在描述中说明,管理员将后续处理)。
  • I have requested review from relevant code authors using the "Request review" feature when applicable.
    我已在适当情况下使用"Request review"功能请求相关代码作者进行审查。
  • I have updated the repository accordingly (If it’s needed).
    我已相应更新了相关仓库(若适用)。

我在提交PR的界面没看见在哪里打标签,请为我打上enhancement标签

@jyxjjj jyxjjj requested a review from Copilot October 5, 2025 05:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Introduce a Meilisearch task queue to serialize and batch index updates, preventing duplicate async tasks and reducing race conditions; plus minor fixes and batching improvements.

  • Add TaskQueueManager with 30s consumption window, diffing against current index state before submitting Meilisearch tasks
  • Route Update() to enqueue for Meilisearch, and batch index add/delete operations
  • Fix buildSearchDocumentFromResults size assignment from JSON results

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/search/meilisearch/task_queue.go New task queue manager, consumption logic, diffing, and submission to Meilisearch with pending task tracking.
internal/search/build.go Use queue for Meilisearch; batch-index additions; remove recursive BuildIndex for directories.
internal/search/meilisearch/init.go Instantiate and start TaskQueueManager at init.
internal/search/meilisearch/search.go Add taskQueue field, EnqueueUpdate, batch index/delete with task UIDs.
internal/search/meilisearch/utils.go Fix document building: correct assignment and float64-to-int64 size conversion.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@ImoutoHeaven
Copy link
Contributor Author

已经修改以让跳过的任务重新加入任务执行队列中,前提是这个时刻的任务执行队列不存在入队时间更新的相同任务。对于用户修改文件夹内容的场景,可以更快的同步索引。

@jyxjjj
Copy link
Member

jyxjjj commented Nov 3, 2025

请解决所有对话,如果不需要解决的,说明原因并直接点击解决按钮。

@jyxjjj jyxjjj added the stale label Nov 13, 2025
ImoutoHeaven and others added 3 commits November 25, 2025 11:32
- Implement TaskQueueManager for async index operations
- Queue update tasks and process them in batches every 30 seconds
- Check pending task status before executing new operations
- Optimize batch indexing and deletion logic
- Fix type assertion bug in buildSearchDocumentFromResults
When tasks are skipped due to pending dependencies, they are now
re-enqueued if not already in queue. This prevents task loss while
avoiding overwriting newer snapshots for the same parent.
@jyxjjj jyxjjj force-pushed the meilisearch-taskqueue branch from cd53fb9 to 6b6961e Compare November 25, 2025 03:34
@jyxjjj
Copy link
Member

jyxjjj commented Nov 25, 2025

Force pushed due to rebase main

@jyxjjj
Copy link
Member

jyxjjj commented Nov 25, 2025

@PIKACHUIM @elysia-best LGTM

@jyxjjj jyxjjj removed the stale label Nov 25, 2025
@jyxjjj jyxjjj merged commit 316d4ca into OpenListTeam:main Nov 25, 2025
0 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Meilisearch在自动更新索引开启情况下应该调用BatchIndex 减少POST请求数量

3 participants