Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ perf: fix slow delete file sql #4738

Merged
merged 1 commit into from
Nov 19, 2024
Merged

⚡️ perf: fix slow delete file sql #4738

merged 1 commit into from
Nov 19, 2024

Conversation

arvinxx
Copy link
Contributor

@arvinxx arvinxx commented Nov 19, 2024

💻 变更类型 | Change Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 👷 build
  • ⚡️ perf
  • 📝 docs
  • 🔨 chore

🔀 变更说明 | Description of Change

LobeChat Cloud 之前一段时间内存在几百次 504 的请求超时问题,主要问题出在删除文件上。

image

关键的影响字句是 await ctx.chunkModel.deleteOrphanChunks(), 即删除 chunks 孤块

📝 补充信息 | Additional Information

-- 查看现有的索引
SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'file_chunks';

-- 分析查询计划
EXPLAIN ANALYZE 
SELECT c.id 
FROM chunks c 
WHERE NOT EXISTS (
  SELECT 1 
  FROM file_chunks fc 
  WHERE fc.chunk_id = c.id
);

Gather (cost=20752.94..131022.90 rows=287250 width=16) (actual time=10773.043..11474.407 rows=282968 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Anti Join (cost=19752.94..101297.90 rows=119688 width=16) (actual time=10734.384..11362.630 rows=94323 loops=3)
Hash Cond: (c.id = fc.chunk_id)
-> Parallel Index Only Scan using chunks_pkey on chunks c (cost=0.43..71699.93 rows=509132 width=16) (actual time=1.441..10287.412 rows=407322 loops=3)
Heap Fetches: 342362
-> Parallel Hash (cost=12982.45..12982.45 rows=389445 width=16) (actual time=299.098..299.100 rows=313000 loops=3)
Buckets: 262144 Batches: 8 Memory Usage: 7616kB
-> Parallel Seq Scan on file_chunks fc (cost=0.00..12982.45 rows=389445 width=16) (actual time=0.007..107.330 rows=313000 loops=3)
Planning Time: 1.529 ms
Execution Time: 11495.522 ms

能否解释下?我看不懂


让我帮您逐行解析这个执行计划:

Gather  (cost=20752.94..131022.90 rows=287250 width=16) 
(actual time=10773.043..11474.407 rows=282968 loops=1)
  • 这是一个并行查询操作,使用了多个工作进程
  • 预估成本从 20752.94 到 131022.90
  • 实际执行时间从 10.7秒 到 11.4秒
  • 返回了 282,968 行数据
Workers Planned: 2
Workers Launched: 2
  • 计划使用 2 个并行工作进程
  • 实际启动了 2 个工作进程
Parallel Hash Anti Join
  • 使用哈希反连接(Hash Anti Join)来查找不匹配的记录
  • 这就是实现 "NOT EXISTS" 的方式
Hash Cond: (c.id = fc.chunk_id)
  • 连接条件是 chunks.id = file_chunks.chunk_id
Parallel Index Only Scan using chunks_pkey on chunks c
(actual time=1.441..10287.412 rows=407322 loops=3)
Heap Fetches: 342362
  • 使用主键索引扫描 chunks 表
  • 每个工作进程处理了约 407,322 行
  • 需要访问堆(表数据)342,362 次,这个数字较大,说明索引效率不是很理想
Parallel Hash  
(actual time=299.098..299.100 rows=313000 loops=3)
Buckets: 262144  Batches: 8  Memory Usage: 7616kB
  • 为 file_chunks 表建立哈希表
  • 每个工作进程处理约 313,000 行
  • 使用了 262,144 个哈希桶,8个批次
  • 内存使用了 7.6MB
Parallel Seq Scan on file_chunks fc
(actual time=0.007..107.330 rows=313000 loops=3)
  • 对 file_chunks 表进行并行顺序扫描
  • 每个工作进程扫描约 313,000 行
Planning Time: 1.529 ms
Execution Time: 11495.522 ms
  • 查询计划生成时间:1.5毫秒
  • 总执行时间:11.5秒

主要性能问题

  1. 大量的堆获取(Heap Fetches: 342362)表明索引效率不高
  2. 整体执行时间 11.5 秒较长
  3. file_chunks 表需要全表扫描

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Nov 19, 2024
Copy link

vercel bot commented Nov 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
lobe-chat-database ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 19, 2024 1:47pm
lobe-chat-preview ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 19, 2024 1:47pm

@dosubot dosubot bot added the ⚡️ Performance Performance issue | 性能问题 label Nov 19, 2024
Copy link

codecov bot commented Nov 19, 2024

Codecov Report

Attention: Patch coverage is 70.27027% with 11 lines in your changes missing coverage. Please review.

Project coverage is 92.31%. Comparing base (7e9e71a) to head (5c1df57).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/database/server/models/file.ts 70.27% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4738      +/-   ##
==========================================
- Coverage   92.34%   92.31%   -0.03%     
==========================================
  Files         528      528              
  Lines       36950    36987      +37     
  Branches     2365     2521     +156     
==========================================
+ Hits        34120    34146      +26     
- Misses       2830     2841      +11     
Flag Coverage Δ
app 92.31% <70.27%> (-0.03%) ⬇️
server 96.17% <70.27%> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ Performance Performance issue | 性能问题 released size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant