You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 4, 2023. It is now read-only.
I am using couchdb-lucene 2.2.0 installed on Windows Server 2019.
Couchdb version I am using is 3.1.1
Fulltext searching works fine with document properties. I also wanted to index based on the content of attachments of the documents. So I configured Design Document as follows
{
"_id": "_design/fts",
"_rev": "2-ec9dfea8eaa44056d74b44135776ef05",
"fulltext": {
"by_message": {
"index": "function(doc) { var ret=new Document(); if (doc._attachments) {for(var a in doc._attachments){ret.attachment('file',a);}} return ret; }"
}
}
}
When I upload attachments to documents with type pdf, txt, word everything works fine as expected. Below is a search result of "Sesame Street" keyword in a ppt document and it works fine.
Then I upload any docx file ( even an empty one with only some plain text. For this specific problem my word docx contains 'This is an example document which I have indexing problem on couchdb-lucene' text only ) or pptx attachment to any of the documents and re-run the above request. If gives timeout error forever.
So indexing is somehow stuck forever. restarting the Couchdb-Lucene does not change anything. If I delete the document with docx file from couchdb and after that if I restart couchdb-lucene everything starts working again.
I believe problem is related to zip format documents such as docx, xlsx and pptx etc.
The text was updated successfully, but these errors were encountered:
I was suspicious that maybe the problem is related to Windows running Lucene so I decided to install CouchDB and CouchDB-Lucene on Ubuntu 20.04 Server. But the result is same.
Everthing works fine until I upload a Docx or pptx document. But it works fine with doc, rtf, txt and pdf files.
I am really stuck with this problem
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I am using couchdb-lucene 2.2.0 installed on Windows Server 2019.
Couchdb version I am using is 3.1.1
Fulltext searching works fine with document properties. I also wanted to index based on the content of attachments of the documents. So I configured Design Document as follows
When I upload attachments to documents with type pdf, txt, word everything works fine as expected. Below is a search result of "Sesame Street" keyword in a ppt document and it works fine.
Then I upload any docx file ( even an empty one with only some plain text. For this specific problem my word docx contains 'This is an example document which I have indexing problem on couchdb-lucene' text only ) or pptx attachment to any of the documents and re-run the above request. If gives timeout error forever.
The log shows only below message
If I try to seach with 'problem' keyword which is in word document result is same timeout.
If I try with stale=ok it response with empty result.
So indexing is somehow stuck forever. restarting the Couchdb-Lucene does not change anything. If I delete the document with docx file from couchdb and after that if I restart couchdb-lucene everything starts working again.
I believe problem is related to zip format documents such as docx, xlsx and pptx etc.
The text was updated successfully, but these errors were encountered: