After updating the Document in spring ai core, I encountered the following problem

This is my code：

       Resource resource = new FileSystemResource(filePath);
        List<Document> documents = new TikaDocumentReader(resource).read();
        return new TokenTextSplitter(knowledgeBaseFileSlice.getDefaultChunkSize(), 
                knowledgeBaseFileSlice.getMinChunkSizeChars(),
                knowledgeBaseFileSlice.getMinChunkLengthToEmbed(), knowledgeBaseFileSlice.getMaxNumChunks(),
                knowledgeBaseFileSlice.isKeepSeparator()).apply(documents);

After TikaDocumentReader reads a Word document, the content read not only includes the text of the document, but also the XML information of the file, if I use getText(), the output will include the following content, like this：
docProps/app.xml
  Normal.dotm 1 0 0 0 0 0 false false 0 WPS Office_10.1.0.7698_F1E327BC-269C-435d-A152-05C5408002CA 0

docProps/core.xml
  2023-08-26T18:18:00Z admin admin 2023-08-26T18:18:41Z 1

docProps/custom.xml
   2052-10.1.0.7698

word/styles.xml

word/settings.xml

word/theme/theme1.xml

word/document.xml

What went wrong？？？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

After updating the Document in spring ai core, I encountered the following problem #1968

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

After updating the Document in spring ai core, I encountered the following problem #1968

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions