Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate thumbnail only for first page #1166

Closed
freestyle68 opened this issue Jul 16, 2017 · 7 comments
Closed

generate thumbnail only for first page #1166

freestyle68 opened this issue Jul 16, 2017 · 7 comments
Labels

Comments

@freestyle68
Copy link

Hi again,

I noticed that the generate-thumbnail script make thumbnail of every page of the pdf/office documents.
You can see it in /var/lib/fess/thumbnails/

This is useless because only the first page is supposed to be showed on thumbnails results. And this kill the server, imagine a 300 page document... in fact gs kill the server with this task:

gs -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 ..........

As a solution, you should add the flag

-e PageRange=1-1

to the commandline

unoconv -o $TMP_FILE -f pdf $TARGET_FILE

More details on

http://bernaerts.dyndns.org/linux/76-gnome/325-gnome-shell-generate-msoffice-thumbnail-nautilus

Another suggested feature is to add thumbnail also for image files, you could use

mogrify for jpg, png and other formats:

https://superuser.com/questions/844912/generate-thumbnail-with-imagemagick

@marevol
Copy link
Contributor

marevol commented Jul 17, 2017

Thank you for the info.
I'll add PageRange.

@freestyle68
Copy link
Author

freestyle68 commented Jul 17, 2017

Hi,

even after fix #1168 still PNGs for every document page are generated in /var/lib/fess/thumbnails/

I tried the script

/usr/share/fess/bin/generate-thumbnail msoffice <MS Office file> <Output Path>

for office and pdf files and this correctly generate a single pdf or png of the first page.

But during fess crawling the process start:

gs -sstdout=%stderr -dQUIET .... -sOutputfile/tmp-magik-.......

This is a 100% CPU process that last a lot of time.
And then start the process:

convert -thumbnail

and then again gs. And so on back and forth.
A lot of time for a few dozen files.

Also, after this procedure a single thumbnail is visible on search results (of about 60 files crawled).
If I use the chrome developer tool to analize the missing thumbnails, the result is the following:

<div class="thumbnailBox media-left hidden-xs-down">
<a class="link" href="file://dati/few/064/064452.pdf" data-uri="file://dati/few/064/064452.pdf" data-id="22d2c3a975f049728042aacc6bfe9bb7" data-order="0">
<img src="/images/noimage.png" data-src="/thumbnail/?docId=22d2c3a975f049728042aacc6bfe9bb7&amp;queryId=8919116e193244ae8120a7cd365784fb" class="thumbnail" style="background-image: url(&quot;/images/loading.gif&quot;);">

so the img src is not pointing to image in /var/lib/fess/thumbnails/

In var/log/fess there are a lot of similar lines:

[CommandGeneratorDestoryTimer-1500312534655] WARN CommandGenerator is timed out: [/usr/share/fess/bin/generate-thumbnail, pdf, /var/tmp/fess/thumbnail_3764418062551063326, /var/lib/fess/thumbnails/22d2c/3a975/f0497/28042/aacc6/bfe9b/b7.png]

@marevol
Copy link
Contributor

marevol commented Jul 17, 2017

What is the server hardware spec?

@freestyle68
Copy link
Author

Xeon with 16 GB RAM. Debian 9.
Also tried with Centos 7, same results.

The gs process put a single thread to 100% CPU, not all threads.

Crawled now 27 files (docs and pdfs) and after 15 min still active Thumbnail generator job. No thumbnail visible on search results.

Same errors in /var/log/fess

Tried also with a new from scratch VPS, Debian 9, OpenJDK, 16 GB, but same problem.

@marevol
Copy link
Contributor

marevol commented Jul 20, 2017

Thank you for the info.
I think it's improved by #1173.

@freestyle68
Copy link
Author

freestyle68 commented Jul 20, 2017

YES, now the generate-thumbnail task is very fast and the thumbnails are visible on the search results.

But only for PDF files.
For office files thumbnails aren't visible.

This is not due to missing thumbnail but to a web config problem: in fact with office docs I cannot see the blank image (img src="/images/noimage.png) usually visible before generate thumbnail job.
This image instead is visible with PDF files before thumbnail job.

@freestyle68
Copy link
Author

Perfect! The fix #1175 solved the problem.

I tried crawling about 6000 docs and I have no problem.
But I have changed the timeout from 10 to 30 sec on

src/main/java/org/codelibs/fess/thumbnail/impl/CommandGenerator.java

because a lot of

[CommandGeneratorDestoryTimer-1500312534655] WARN CommandGenerator is timed out

errors (slow disk). This caused the thumbnail job to abort and restart a few minutes later.
Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants