-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FsCrawler 2.10 Rest Service upload with TAG "external" having a file larger than 20 Mb returns exception #1709
Comments
Sounds like you ran it in |
I have shared the --trace output Here I just realized i had shared wrong links to the indexed json document using fscrawler 2.9 and the test file which is needed to be indexed so sharing again Please note that this test file was successfully indexed using fscrawler 2.9 and Elasticsearch 7.8. Click here for the test file. |
I think I understand. So you are trying to manually "attach" the binary file to the final document under That being said, I'm not a big fan of storing huge binary documents into Elasticsearch. Binary storage should be done elsewhere IMO. And you should only keep the URL to the storage. If you really want to do it, and have the same behavior as before (more or less), we can probably use this https://github.com/FasterXML/jackson-core/pull/1019/files which introduced a way to configure the limits. I'd suggest to add in fscrawler a new setting, like StreamReadConstraints constraints = StreamReadConstraints.builder()
.maxStringLength(strLen)
.build(); Note that we might need to either create another "setting file" which can be read from the framework. |
Hi David, Thanks for looking into the issue !
Yes you are absolutely correct, but i store the binary file in external.data tag I don't have any experience in JAVA coding and really require the attachment of the email file to be indexed as binary file as space and memory is not an issue. Adding a setting in Fscrawler seems to be a good idea, but if that is if you think this can affect someone else code in future. Can I have snapshot version of Fscrawler 2.10 with Jackson.core 2.13 for now where we don't have the string length validation ? I am asking because already more then 1 tb of data which has attachment less than 20 mb has be indexed and i donot want to start again from scratch. |
Hi David, Also in the recent snapshot version, after changing the jackson libraries version to 2.13, the file with size 50 mb is ingested into Elasticsearch. |
I am trying to upload an email file with a pdf attachment of size more than 20 MB using .Net webclient and Fscrawler rest service. The attachment is added to external tag which contains filename, content type, and data (containing base64 data of the file).
The upload works for attachments of smaller size. There seems to be some limitation on the size of the string as the error description suggest.
String length (20051112) exceeds the maximum length (20000000)
The issue is also discussed here
Logs
Please check here this issue is because of Jackson-core's StreamReadConstraints.java where it is validating the string length and the default size is set to 20000000.
The StreamReadConstrains.java was introduced in 2.15 version of Jackson-core.
The bug is not reproducible in FsCrawler 2.9 version as this version uses version 2.13 of Jackson-core which does not validate the string length. But i cannot use Fscarwler 2.9 with elastic search 8.10.
Expected behavior
There should be a way by which we can increase the default size limit for the tags or it should allow unlimited size of data for the tags.
Versions:
The text was updated successfully, but these errors were encountered: