-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Can't decode DOC files #303
Comments
@amomra how would you prefer handling the legacy DOC format? ignore and skip the files, logging an error maybe? |
@dluc I observed that the app enters in an error loop when you uses RabbitMQ. This happens because the extractor throws an error with the DOC file, requeues the message and immediately retrieves it again causing the loop. |
## Motivation and Context (Why the change? What's the scenario?) Proposal for fix #303: Can't decode DOC files. ## High level description (Approach, Design) Like described in the issue, the legacy Word 97-2003 files will be ignored. Also the MIME types for the XML formats are changed to reflect the actual content type for those files.
The old format is now automatically ignored, unless a specific decoder is provided. The same approach is used for old formats of Word, Excel and PowerPoint |
Context / Scenario
The user upload a document with the DOC format
What happened?
When the user upload a document with the legacy DOC format, the application throws an exception saying that the file is corrupt. Analyzing the logs is shown that the application is trying to parse it as a DOCX file.
Importance
I cannot use Kernel Memory
Platform, Language, Versions
It happens in Windows and Linux in dotnet 8
Relevant log output
The text was updated successfully, but these errors were encountered: