Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If the uploaded txt file is a non-UTF8 file. Uploading is no problem. When talking to a file, the document parser reports an error: Failed to decode plain text file #10691

Closed
5 tasks done
wgq910668 opened this issue Nov 14, 2024 · 1 comment · Fixed by #10698
Labels
🐞 bug Something isn't working good first issue Good first issue for newcomers

Comments

@wgq910668
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.10.1

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

If the uploaded txt file is a non-UTF8 file. Uploading is no problem. When talking to a file, the document parser reports an error: Failed to decode plain text file

error message:Failed to decode plain text file

image

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

Copy link

dosubot bot commented Nov 14, 2024

The error you're encountering, "Failed to decode plain text file," occurs because the document parser in Dify attempts to decode text files using UTF-8 encoding. If the file is not encoded in UTF-8, this error will be raised. The specific part of the code responsible for this is the _extract_text_from_plain_text function, which tries to decode the file content using UTF-8 and raises a TextExtractionError if it fails due to a UnicodeDecodeError [1].

Currently, there is no indication of a fallback mechanism for handling non-UTF8 encoded files or a configuration setting to specify the encoding of uploaded text files in Dify. You might need to ensure that your text files are encoded in UTF-8 before uploading them to avoid this issue.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the 🐞 bug Something isn't working label Nov 14, 2024
@crazywoola crazywoola added the good first issue Good first issue for newcomers label Nov 14, 2024
yihong0618 added a commit to yihong0618/dify that referenced this issue Nov 14, 2024
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
AlwaysBluer pushed a commit to AlwaysBluer/dify that referenced this issue Nov 14, 2024
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working good first issue Good first issue for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants