[7.x] Attachment ingest processor: add resource_name field (#64389) #66301
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the current plugin: ingest-attachment, the text file cannot be read properly if the encode is not utf-8
and contain some non-ascii characters.
I study a little about Tika witch is used in ingest-attachment. Then I find out if we can tell Tika the file's name, it can recognize the file better. So I add an attachment options
file_name
, if there is a field defined asfile_name
, then this name will sent to Tika to improve the result.But there is something not looks well. That's the
gradle check
. I wrote the unit test for reading different text usingdifferent encoding. But seems there is a role to not commit no-utf8 things.
Backport of #64389