Skip to content

[BUG]对txt文档的格式要求要严格了 #8

Closed
@cnliucheng

Description

@cnliucheng

联系方式

No response

MaxKB 版本

0.9

问题描述

无论是word转txt,还是新建txt文档,上传后都不能识别,段落是0,字符数也是0,实在无语。另外,居然不支持docx和pdf文件,局限性太大了吧

重现步骤

docx转txt后无法识别

期待的正确结果

可以正确识别txt文件

相关日志输出

2024-03-22 12:56:13 [listener_manage INFO] 开始--->向量化文档:81e41878-e808-11ee-8dfc-0242ac120003
2024-03-22 12:56:13 [basehttp INFO] "POST /api/dataset/04c51c78-e805-11ee-b157-0242ac120003/document/_bach HTTP/1.1" 200 393
2024-03-22 12:56:13 [listener_manage INFO] 结束--->向量化文档:81e41878-e808-11ee-8dfc-0242ac120003
2024-03-22 12:56:14 [basehttp INFO] "GET /ui/assets/icon_document-2fa30876.svg HTTP/1.1" 200 1376
2024-03-22 12:56:14 [basehttp INFO] "GET /api/dataset HTTP/1.1" 200 366
2024-03-22 12:56:14 [basehttp INFO] "GET /api/dataset/04c51c78-e805-11ee-b157-0242ac120003/document/1/10 HTTP/1.1" 200 444
2024-03-22 12:56:14 [basehttp INFO] "GET /api/dataset/04c51c78-e805-11ee-b157-0242ac120003 HTTP/1.1" 200 391
2024-03-22 12:56:14 [basehttp INFO] - Broken pipe from ('222.90.143.25', 35901)
2024-03-22 12:56:14 [basehttp INFO] "GET /ui/assets/icon_document-2fa30876.svg HTTP/1.1" 200 1376
2024-03-22 12:56:15 [basehttp INFO] - Broken pipe from ('222.90.143.25', 36208)
2024-03-22 12:56:16 [basehttp INFO] "GET /ui/assets/index-3d9325d8.js HTTP/1.1" 200 5566
2024-03-22 12:56:16 [basehttp INFO] "GET /ui/assets/index-0432a3d8.css HTTP/1.1" 200 2044
2024-03-22 12:56:17 [basehttp INFO] "GET /ui/assets/icon_document-2fa30876.svg HTTP/1.1" 200 1376
2024-03-22 12:56:17 [basehttp INFO] - Broken pipe from ('222.90.143.25', 36209)
2024-03-22 12:56:20 [basehttp INFO] "GET /api/dataset/04c51c78-e805-11ee-b157-0242ac120003/hit_test?query_text=%E5%AE%89%E5%85%A8&similarity=0.6&top_number=5 HTTP/1.1" 200 52
2024-03-22 12:56:20 [basehttp INFO] "GET /ui/assets/user-icon-c413d294.svg HTTP/1.1" 304 0
2024-03-22 12:56:56 [log WARNING] Unauthorized: /api/dataset/04c51c78-e805-11ee-b157-0242ac120003/document/_bach
2024-03-22 12:56:56 [basehttp WARNING] "HEAD /api/dataset/04c51c78-e805-11ee-b157-0242ac120003/document/_bach HTTP/1.1" 401 86
2024-03-22 12:57:35 [basehttp INFO] "GET /api/user_manage/1/20 HTTP/1.1" 200 369
2024-03-22 12:58:59.115 CST [67] LOG:  checkpoint starting: time
2024-03-22 12:59:00.205 CST [67] LOG:  checkpoint complete: wrote 11 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.010 s, sync=0.034 s, total=1.090 s; sync files=10, longest=0.028 s, average=0.004 s; distance=1 kB, estimate=837 kB

附加信息

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions