You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A pdf with a few hundred pages broke my indexing, as after all pages were OCR'ed, the run (via fulltextsearch:document:index or via full-index) crashed with:
[PDOException (HY000)]
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
Exception trace:
at /var/www/html/3rdparty/doctrine/dbal/src/Driver/PDO/Statement.php:92
PDOStatement->execute() at /var/www/html/3rdparty/doctrine/dbal/src/Driver/PDO/Statement.php:92
Doctrine\DBAL\Driver\PDO\Statement->execute() at /var/www/html/3rdparty/doctrine/dbal/src/Connection.php:1059
Doctrine\DBAL\Connection->executeQuery() at /var/www/html/lib/private/DB/Connection.php:261
OC\DB\Connection->executeQuery() at /var/www/html/3rdparty/doctrine/dbal/src/Query/QueryBuilder.php:345
Doctrine\DBAL\Query\QueryBuilder->execute() at /var/www/html/lib/private/DB/QueryBuilder/QueryBuilder.php:281
OC\DB\QueryBuilder\QueryBuilder->execute() at /var/www/html/lib/private/Comments/Manager.php:419
OC\Comments\Manager->getForObject() at /var/www/html/apps/files_fulltextsearch/lib/Service/FilesService.php:820
OCA\Files_FullTextSearch\Service\FilesService->updateCommentsFromFile() at /var/www/html/apps/files_fulltextsearch/lib/Service/FilesService.php:812
OCA\Files_FullTextSearch\Service\FilesService->updateContentFromFile() at /var/www/html/apps/files_fulltextsearch/lib/Service/FilesService.php:741
OCA\Files_FullTextSearch\Service\FilesService->updateFilesDocumentFromFile() at /var/www/html/apps/files_fulltextsearch/lib/Service/FilesService.php:657
OCA\Files_FullTextSearch\Service\FilesService->generateDocumentFromIndex() at /var/www/html/apps/files_fulltextsearch/lib/Service/FilesService.php:705
OCA\Files_FullTextSearch\Service\FilesService->updateDocument() at /var/www/html/apps/files_fulltextsearch/lib/Provider/FilesProvider.php:314
OCA\Files_FullTextSearch\Provider\FilesProvider->updateDocument() at /var/www/html/apps/fulltextsearch/lib/Command/DocumentIndex.php:112
So, the updateDocument seems to run into mysql connection timeouts during the main loop over the pdf pages. Limiting the pdf pages I can ocr 20 pages, but at 100 it timeouts. I presume the connection timeout is at around 5mins or 10mins.
So how do I deal with this problem?
I figure I could increase the mysql connection timeout in the Nextcloud settings, but I'd rather not, as this would impact a whole lot more apps/core possibly negatively, especially since the connection timeout ocr needs would be around 2 hours for 1000 pdf pages...
Ideally the "main loop" could ping the database connection in TesseractService.php:#L278, but as I don't see a database connection anywhere here, I presume this is handled in the general occ code. So is this even touchable in the app?
I don't want to limit my whole FTS to < 20 pdf pages, which also depends on the --psm and on the general load of the server and will lead to random errors when indexing. I have a few hundred users dealing with policymaking involving big pdfs so ideally, it would not be necessary to limit pdf pages at all...
The text was updated successfully, but these errors were encountered:
sistason
changed the title
Deal with mysql connection timeouts for long OCR jobs
How to deal with mysql connection timeouts for long OCR jobs?
Apr 17, 2023
A pdf with a few hundred pages broke my indexing, as after all pages were OCR'ed, the run (via
fulltextsearch:document:index
or via full-index) crashed with:So, the updateDocument seems to run into mysql connection timeouts during the main loop over the pdf pages. Limiting the pdf pages I can ocr 20 pages, but at 100 it timeouts. I presume the connection timeout is at around 5mins or 10mins.
So how do I deal with this problem?
The text was updated successfully, but these errors were encountered: