Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change PDF generation engine to either wkhtmltopdf or phantom.js #1470

Closed
lwchkg opened this issue Jul 31, 2016 · 18 comments
Closed

Change PDF generation engine to either wkhtmltopdf or phantom.js #1470

lwchkg opened this issue Jul 31, 2016 · 18 comments
Labels

Comments

@lwchkg
Copy link

lwchkg commented Jul 31, 2016

The current PDF generation engine, calibre, has too much limitations, e.g.

  • It is unable to add page numbers to table of contents;
  • It ignores a majority of CSS code. The worst of all is we do never know what is filtered out by Calibre. (I tried PDF generation for one day, and the CSS part is a major failure.)

Proposed solutions

  • wkhtmltopdf - older engine, but appears to have table of content out of the box. Not sure if any CSS 3 feature is not available.
  • phantom.js - newer, but I don't know of any table of contents generation. That means we need to generate one by ourselves by reading the PDF bookmarks (if possible at all).

I've also heard a new product called WeasyPrint, but it does not support @font-face yet, so it is a turn-off.

Please discuss. Thanks!

@lwchkg
Copy link
Author

lwchkg commented Aug 9, 2016

Since I wanted this badly, I've coded a PDF generator with wkhtmltopdf myself. Now it has a table of contents and some nice header/footer.

Code and sample: vbnet_intro.zip
Generated PDF: book_wk.pdf
Comparison - PDF by GitBook + Calibre: book.pdf

Feel free to discuss and use the code if you're interested.

@GeorgBraunHM
Copy link

@lwchkg
Thank you so much for provoding vbnet_intro.zip. I would like to give it a try. I downloaded and extracted the zip, did npm install and gitbook install.
Next, I did a node gen_pdf_wk.js. But this resulted in the following error in line 153:

    Promise.all([...assetMap].map(([src, dest]) => {
                                   ^

SyntaxError: Unexpected token [
    at exports.runInThisContext (vm.js:53:16)
    at Module._compile (module.js:387:25)
    at Object.Module._extensions..js (module.js:422:10)
    at Module.load (module.js:357:32)
    at Function.Module._load (module.js:314:12)
    at Function.Module.runMain (module.js:447:10)
    at startup (node.js:148:18)
    at node.js:405:3

If I run genpdf.bat, I get the same error.
What woudl be the right way to run your PDF conversion?

I am on Windows 8.1 with node v5.12.0 and npm v3.8.6.

Many thanks and best regards,
Georg

@lwchkg
Copy link
Author

lwchkg commented Sep 3, 2016

@GeorgBraunHM Oh. I made the program with node 6. Maybe you can upgrade to node 6 and try again.

@lwchkg
Copy link
Author

lwchkg commented Sep 3, 2016

Here's the updated sample and the generated PDF. The installation instruction is here:
Code and sample book - vbnet_intro.zip
Generated PDF - book_wk.pdf

npm install -g svgexport
npm install
gitbook install

and then run by genpdf (Windows only) or node gen_pdf_wk.js.
The most important change in the script is to allow more time for JavaScript code to run, which is needed to get the header to show properly. If the page headers does not show up add the arguments --javascript-delay 5000 (the unit is ms).

As you may notice the SVGs in the above PDF are defective, and a few icons are missing. Personally I run with a modded GitBook myself to render the svgs (we don't need svg->png conversion) and added a real cover page. Here is the final product. :-)
vb2015 part 1.pdf

@GeorgBraunHM
Copy link

@lwchkg
Thanks a lot for the update. I upgraded to nodejs v6.5.0 (64-bit version for windows) and run genpdf on your latest zip (from Sept. 3). I got a nice looking PDF including a TOC with page numbers. Taking a closer look, some chapters include a header, others don't. Therefore I ran genpdf --javascript-delay 5000 and the headers are included for all chapters. The pdf really looks exactly like yours on https://github.com/GitbookIO/gitbook/files/453486/book_wk.pdf. Awesome!

I have a few more questions, if you don't mind:

  1. I currenlty use wkhtmltopdf 0.12.3.1 (with patched qt), 32-bit. It seems that your pdf is created with wkhtmltopdf version 0.12.3.2. Are you using the 32-bit or 64-bit edition?
  2. My pdf has the same svg flaws as yours. How did you patch gitbook to get to your pdf at https://github.com/GitbookIO/gitbook/files/453488/vb2015.part.1.pdf?
  3. How did you add the title page on https://github.com/GitbookIO/gitbook/files/453488/vb2015.part.1.pdf?

Many thanks and best regards,
Georg

@jonahfang
Copy link

No pdf file generated:

oot@c8d99232f630:/gitbook# node gen_pdf_wk.js %*
Running GitBook:
info: 11 plugins are installed
info: 8 explicitly listed
info: loading plugin "sunlight-highlighter"... OK
info: loading plugin "include-codeblock"... OK
info: loading plugin "styles-less"... OK
info: loading plugin "katex"... OK
info: loading plugin "search"... OK
info: loading plugin "lunr"... OK
info: loading plugin "sharing"... OK
info: loading plugin "theme-default"... OK
info: found 21 pages
info: found 23 asset files
info: compile less file:  styles/website.less
warn: "options" property is deprecated, use config.get(key) instead
warn: "options.output" property is deprecated, use "output.root()" instead
info: compile less file:  styles/pdf.less
info: compile less file:  styles/epub.less
info: compile less file:  styles/mobi.less
info: compile less file:  styles/ebook.less
warn: "this.generator" property is deprecated, use "this.output.name" instead
warn: "navigation" property is deprecated
warn: "book" property is deprecated, use "this" directly instead
info: >> generation finished with success in 34.1s !
Processing LESS asset: src/styles/wk_headerfooter.less => _ebook/wk_headerfooter.less
Processing LESS asset: src/styles/wk_toc.less => _ebook/wk_toc.less
Processing XSL asset: src/styles/wk_toc.xsl => _ebook/wk_toc.xsl
Copying asset: src/cover.html => _ebook/cover.html
Copying asset: src/styles/wk_header.html => _ebook/wk_header.html
Launching wkhtmltopdf:

I use the lastest code( (from Sept. 3), but run gitbook from docker container:

docker run \
 --rm \
 -it \
 -v $PWD:/gitbook \
 fangzx/gitbook:2.0 /bin/bash

then:

node gen_pdf_wk.js %*

My Docker file looks like this:

FROM node:6

RUN apt-get update && \
    apt-get install -y unzip  && \
    npm install gitbook-cli -g && \
    npm install svgexport -g && \
    apt-get clean && \
    rm -rf /var/cache/apt/* /var/lib/apt/lists/*

RUN apt-get update && apt-get install -y fonts-arphic-gbsn00lp

# install gitbook versions
RUN gitbook fetch 3.2.0

ENV BOOKDIR /gitbook

VOLUME $BOOKDIR

EXPOSE 4000

WORKDIR $BOOKDIR

CMD ["gitbook", "--help"]

#EOP

@lwchkg
Copy link
Author

lwchkg commented Sep 6, 2016

@GeorgBraunHM

  1. 64-bit edition of wkhtmltopdf 0.12.3.2 Windows. (I tried Linux also, but PT Mono rendered poorly there. Appears the web font from Paratype works poorly. Web fonts prepared by SquirrelFonts works well. )
  2. Assume you're using GitBook 3.2.0. The file you want to change is
    C:\Users\[your user name]\.gitbook\versions\3.2.0\lib\output\ebook\
    Change the content of function function onPage(output, page) to
    return WebsiteGenerator.onPage(output, page);
  3. The cover page? Use Illustrator (or any program) to draw a cover, save as PDF. Then use a PDF software (e.g. http://angusj.com/pdftkb/ ) to join the PDFs together.

@jonahfang Appears that you've forgotten to install wkhtmltopdf. It should be exist in your path. (Just a note: different people want to install different version of wkhtmltopdf, because none of them is really stable.)

@jonahfang
Copy link

jonahfang commented Sep 6, 2016

@lwchkg , thank you very much, it works.

@GeorgBraunHM
Copy link

@lwchkg
thanks for your answers.

I will give the return WebsiteGenerator.onPage(output, page); a try somewhat later. For many (local) books, I am still on GitBook 2.6.7 (starting with 3.x, I cannot view the books via file:///... any longer. gitbook serve works, but I am providing my students with the HTML book via a simple file server, so I have to stick with file:///....

For GitBook 2.6.7, there is a flag this.convertImages = true; within file C:\Users<i>UserName.gitbook\versions\2.6.7\lib\generators\ebook.js. Maybe, setting this to false might help (I didn't try it yet).

In the meantime, I have applied your genpdf.bat to a book which is not sitting in a root folder (like src in your example). If I remove the line "root": "src", within book.json, the script fails. I have fixed this by changing line 218 in file gen_pdf_wk.js from config.root = rawConfig.root; to config.root = rawConfig.root ? rawConfig.root : "./";

@xuv
Copy link

xuv commented Oct 24, 2016

For GitBook 2.6.7, there is a flag this.convertImages = true; within file C:\Users\UserName.gitbook\versions\2.6.7\lib\generators\ebook.js. Maybe, setting this to false might help (I didn't try it yet).

Tried with the option and indeed it works as intended. SVG are crips and part of the PDF (still tested with ebook-convert from Calibre)

@jonathanpberger
Copy link

@lwchkg thanks so much for doing this! I FOUND AN NPM-INSTALLABLE VERSION HERE: https://github.com/lwchkg/gitbook-pdfgen

@oxFilla
Copy link

oxFilla commented Apr 8, 2017

I'm getting this error when using gitbook-pdfgen:

$ gitbook-pdfgen --help

/usr/local/lib/node_modules/gitbook-pdfgen/gen_pdf_wk.js:4
const childProcess = require('child_process');
^^^^^
SyntaxError: Use of const in strict mode.
    at Module._compile (module.js:439:25)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:902:3

@lwchkg
Copy link
Author

lwchkg commented Apr 8, 2017

@oxid-filla You're likely running a very old version of node.js. Please update to a recent version. The error message indicates that your node.js installation doesn't support ES6.

@oxFilla
Copy link

oxFilla commented Apr 10, 2017

Ok, now it runs but two new problems.
In my project and tested with your sample code I have the header with page number only on the first site after the TOC and never again.
Putting your stlyes and book.json in my project I only have a TOC when I remove this line out of the book.json:

"tocXsl": "styles/wk_toc.xsl",

I'm using an ordinary SUMMARY.md.

@MuyNooB
Copy link

MuyNooB commented Nov 17, 2017

Thanks for your zip.
The zip woke awesome, but a question, the gen just support adoc? I test my md book and get blank page except summary.How can i change the code to support the md book.
(though bad English, wish you can read hha)

@lwchkg
Copy link
Author

lwchkg commented Nov 17, 2017

@MuyNooB Are you referring to me? Anyway the generator does only recognize ".html" in the output, so whether your content is ".adoc" or ".md" it shouldn't really matter. If you don't mind, you can send me the book so I can try to reproduce the error.

BTW, this is the place for the official gitbook repository. If you're talking about my plugin, it's better to post an issue on https://github.com/lwchkg/gitbook-pdfgen/issues instead.

@OVGav74
Copy link

OVGav74 commented Dec 18, 2017

@lwchkg
For those of us using gitbook.com, and with no coding background, can what you've done be published as a GitBook plugin so we can use it too?

@bbinet
Copy link

bbinet commented Nov 13, 2018

FYI WeasyPrint now supports font-face and table of contents with page numbers, but I've not tried to use it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests