Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When T-Engine throws an Exception, the document gets indexed with no content #395

Open
hi-ko opened this issue Apr 4, 2022 · 0 comments

Comments

@hi-ko
Copy link

hi-ko commented Apr 4, 2022

this is a reference ticket for SEARCH-2974 due to not beeing able to comment on alfresco.atlassian.net

details provided by @binduwavell

What we discovered is if the T-Engine throws, the node gets indexed with no content and the only way to update the content cache is to update the content on the node or to > PURGE/INDEX the node so the transform runs again.

We extended the NodeContentGet webscript and if the transformer fails we throw in this webscript, then Solr marks the node as having a content transform error. Once we've > grabbed our OCR text it is written as a rendition, which moves the node into a new transaction and Solr re-indexes the node automatically and re-attempts content transform....

I think that if a T-Engine fails, Solr should probably not cache empty content... I don't think Solr necessarily needs to retry getting the content, but the next time the node > is indexed, Solr should not trust the empty content cache and should re-attempt NodeContentGet.

This leads to an unreliable index because technical errors inevitably lead to incomplete index. It is not possible to fix these errors afterwards.
Writing zero content to the index if the node content is not empty should be always seen as wrong index content and therefore as a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant