Skip to content

Commit

Permalink
Switch eBook PDF generation from WeasyPrint to Prince (#833)
Browse files Browse the repository at this point in the history
* Move to EJS templates for consistency and to fix missing author pictures

* Add Prince Support (inc fixing figure aria labels)

* Add YouTube fallback

* Fix table in Japanese chapter

* move from WeasyPrint to Prince

* Update README.md

* Remove unnecessary change

* Update statement for PDF accessibility

* Regen PDFs

* Fix typos

* Fix spacing

* Review feedback

* Review feedback

* Review feedback

* Misc fixes

* Make contributors line up

* Fix text wrapping issue in Methodology section

* Prince supports SVG social media icons

* Add print support in case we ever want to print a physical copy

* Remove link underlines in print mode

* Add footnotes and left/right pages

* Fix typos

* templates/base/2019/base_ebook.html

* Avoid hanging headers

* Review feedback
  • Loading branch information
tunetheweb authored May 20, 2020
1 parent f7f369a commit 9fe5729
Show file tree
Hide file tree
Showing 21 changed files with 5,400 additions and 3,110 deletions.
24 changes: 17 additions & 7 deletions src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,7 @@ npm install
npm run generate
```

3. For generating PDFs of the ebook, WeasyPrint will need some additional libraries:

```
brew install cairo
brew install pango
brew install gdk-pixbuf
```
3. For generating PDFs of the ebook, you need to install Prince. Follow the instructions on [the Prince Website](https://www.princexml.com/).

4. To actually generate the ebooks, start your local server, then run the following:

Expand All @@ -75,6 +69,22 @@ npm run ebook_2019_ja

(TODO: make this a script to handle all languages and years at some point)

It is also possible to generate the ebook from the website, with some optional params (e.g. to print it!)

```
prince "https://almanac.httparchive.org/en/2019/ebook?print&inside-margin=4cm" -o web_almanac_2019_en.pdf --pdf-profile='PDF/UA-1'
```

Note `--pdf-profile='PDF/UA-1'` may not be needed if just intend to print.

Params accepted are:

- print - this ads left, right pages, footnotes, and sets roman numerals for front matter page numbers and adds footnotes. It is used by default when running `npm run ebook_2019_en` but we could change that if prefer a less print-like ebook.
- page-size - this allows you to override the default page size of A4
- inside-margin - this allows you to set an inside margin for binding (e.g. on right for left hand pages and vice versa)

You can also download the HTML and override the inline styles there if you want to customise this for something we haven;t exposed as a param.

## Deploying changes

If you've been added to the "App Engine Deployers" role in the GCP project, you're able to push code changes to the production website.
Expand Down
18 changes: 18 additions & 0 deletions src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,21 +130,39 @@ def get_ebook_methodology(lang, year):
return False

methodology_maincontent = methodology_maincontent.group(1)
# Replace methodology links to full anchor link (e.g. #introduction -> #methodology-introduction)
methodology_maincontent = re.sub('href="#', 'href="#methodology-', methodology_maincontent)
# Replace header ids to full id (e.g. <h2 id="introduction"> -> <h2 id="methodology-introduction">)
methodology_maincontent = re.sub('<h([0-6]) id="', '<h\\1 id="methodology-', methodology_maincontent)
# Replace home-relative URLS generated by url_for (e.g. /en/2019/ -> #)
methodology_maincontent = re.sub('href="\/%s\/%s\/' % (lang, year), 'href="#', methodology_maincontent)
# For external links add footnote span
methodology_maincontent = re.sub('href="http(.*?)"(.*?)>(.*?)<\/a>', 'href="http\\1"\\2>\\3<span class="fn">http\\1</span></a>', methodology_maincontent)
# Replace figure image links to full site, to avoid 127.0.0.1:8080 links
methodology_maincontent = re.sub('href="\/', 'href="https://almanac.httparchive.org/', methodology_maincontent)
# Replace other chapter references with hash to anchor link (e.g. ./javascript#fig-1 -> #javascript-fig-1)
methodology_maincontent = re.sub('href="./([a-z0-9-]*)#', 'href="#\\1-', methodology_maincontent)
# Replace other chapter references to anchor link (e.g. ./javascript -> #javascript)
methodology_maincontent = re.sub('href="\.\/', 'href="#', methodology_maincontent)
# Replace double-hashed URLs (e.g. #contributors#patrickhulce -> #contributors-patrickhulce)
methodology_maincontent = re.sub('href="#([a-z0-9-]*)#', 'href="#\\1-', methodology_maincontent)
# Remove lazy-loading attributes
methodology_maincontent = re.sub(' loading="lazy"', '', methodology_maincontent)
return methodology_maincontent


# This function takes a string and adds the footnote links for printing
def add_footnote_links(html):
return re.sub('href="http(.*?)"(.*?)>(.*?)<\/a>', 'href="http\\1"\\2>\\3<span class="fn">http\\1</span></a>', html)


# Make these functions available in templates.
app.jinja_env.globals['get_view_args'] = get_view_args
app.jinja_env.globals['chapter_lang_exists'] = chapter_lang_exists
app.jinja_env.globals['ebook_exists'] = ebook_exists
app.jinja_env.globals['HTTP_STATUS_CODES'] = HTTP_STATUS_CODES
app.jinja_env.globals['get_ebook_methodology'] = get_ebook_methodology
app.jinja_env.globals['add_footnote_links'] = add_footnote_links


@app.route('/<lang>/<year>/')
Expand Down
4 changes: 2 additions & 2 deletions src/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
"homepage": "https://github.com/HTTPArchive/almanac.httparchive.org#readme",
"scripts": {
"generate": "node ./tools/generate",
"ebook_2019_en": "weasyprint http://127.0.0.1:8080/en/2019/ebook static/pdfs/web_almanac_2019_en.pdf",
"ebook_2019_ja": "weasyprint http://127.0.0.1:8080/ja/2019/ebook static/pdfs/web_almanac_2019_ja.pdf",
"ebook_2019_en": "prince http://127.0.0.1:8080/en/2019/ebook?print -o static/pdfs/web_almanac_2019_en.pdf --pdf-profile='PDF/UA-1'",
"ebook_2019_ja": "prince http://127.0.0.1:8080/ja/2019/ebook?print -o static/pdfs/web_almanac_2019_ja.pdf --pdf-profile='PDF/UA-1'",
"deploy": "echo \"Y\" | gcloud app deploy --project webalmanac --stop-previous-version"
},
"devDependencies": {
Expand Down
Loading

0 comments on commit 9fe5729

Please sign in to comment.