Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch eBook PDF generation from WeasyPrint to Prince #833

Merged
merged 29 commits into from
May 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2a84d8e
Move to EJS templates for consistency and to fix missing author pictures
tunetheweb May 16, 2020
89814de
Add Prince Support (inc fixing figure aria labels)
tunetheweb May 16, 2020
351b8cc
Merge branch 'master' into ebook_fixes
tunetheweb May 16, 2020
0b40c94
Add YouTube fallback
tunetheweb May 16, 2020
c0feed7
Fix table in Japanese chapter
tunetheweb May 16, 2020
73ff7bc
move from WeasyPrint to Prince
tunetheweb May 16, 2020
f32d99d
Update README.md
tunetheweb May 16, 2020
7d6bafb
Remove unnecessary change
tunetheweb May 16, 2020
4a02e9c
Update statement for PDF accessibility
tunetheweb May 16, 2020
c262297
Regen PDFs
tunetheweb May 16, 2020
d1b0dcd
Fix typos
tunetheweb May 16, 2020
2094c78
Fix spacing
tunetheweb May 16, 2020
9a20b99
Review feedback
tunetheweb May 16, 2020
fd8494d
Review feedback
tunetheweb May 16, 2020
f1ecaf9
Review feedback
tunetheweb May 16, 2020
9afd4f6
Misc fixes
tunetheweb May 16, 2020
70db42f
Make contributors line up
tunetheweb May 16, 2020
3b7ceb3
Fix text wrapping issue in Methodology section
tunetheweb May 17, 2020
94310b7
Prince supports SVG social media icons
tunetheweb May 17, 2020
608da1a
Add print support in case we ever want to print a physical copy
tunetheweb May 18, 2020
35aab23
Remove link underlines in print mode
tunetheweb May 18, 2020
66519e8
Add footnotes and left/right pages
tunetheweb May 18, 2020
ee28b22
Fix typos
tunetheweb May 18, 2020
19889a3
templates/base/2019/base_ebook.html
tunetheweb May 19, 2020
59abf31
Merge branch 'master' into ebook_fixes
tunetheweb May 19, 2020
1fd7711
Avoid hanging headers
tunetheweb May 19, 2020
63f16ff
Merge branch 'master' into ebook_fixes
tunetheweb May 19, 2020
5c5c75f
Review feedback
tunetheweb May 20, 2020
ef9018f
Merge branch 'master' into ebook_fixes
tunetheweb May 20, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 17 additions & 7 deletions src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,7 @@ npm install
npm run generate
```

3. For generating PDFs of the ebook, WeasyPrint will need some additional libraries:

```
brew install cairo
brew install pango
brew install gdk-pixbuf
```
3. For generating PDFs of the ebook, you need to install Prince. Follow the instructions on [the Prince Website](https://www.princexml.com/).

4. To actually generate the ebooks, start your local server, then run the following:

Expand All @@ -75,6 +69,22 @@ npm run ebook_2019_ja

(TODO: make this a script to handle all languages and years at some point)

It is also possible to generate the ebook from the website, with some optional params (e.g. to print it!)

```
prince "https://almanac.httparchive.org/en/2019/ebook?print&inside-margin=4cm" -o web_almanac_2019_en.pdf --pdf-profile='PDF/UA-1'
```

Note `--pdf-profile='PDF/UA-1'` may not be needed if just intend to print.

Params accepted are:

- print - this ads left, right pages, footnotes, and sets roman numerals for front matter page numbers and adds footnotes. It is used by default when running `npm run ebook_2019_en` but we could change that if prefer a less print-like ebook.
- page-size - this allows you to override the default page size of A4
- inside-margin - this allows you to set an inside margin for binding (e.g. on right for left hand pages and vice versa)

You can also download the HTML and override the inline styles there if you want to customise this for something we haven;t exposed as a param.

## Deploying changes

If you've been added to the "App Engine Deployers" role in the GCP project, you're able to push code changes to the production website.
Expand Down
18 changes: 18 additions & 0 deletions src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,21 +130,39 @@ def get_ebook_methodology(lang, year):
return False

methodology_maincontent = methodology_maincontent.group(1)
# Replace methodology links to full anchor link (e.g. #introduction -> #methodology-introduction)
methodology_maincontent = re.sub('href="#', 'href="#methodology-', methodology_maincontent)
# Replace header ids to full id (e.g. <h2 id="introduction"> -> <h2 id="methodology-introduction">)
methodology_maincontent = re.sub('<h([0-6]) id="', '<h\\1 id="methodology-', methodology_maincontent)
# Replace home-relative URLS generated by url_for (e.g. /en/2019/ -> #)
methodology_maincontent = re.sub('href="\/%s\/%s\/' % (lang, year), 'href="#', methodology_maincontent)
# For external links add footnote span
methodology_maincontent = re.sub('href="http(.*?)"(.*?)>(.*?)<\/a>', 'href="http\\1"\\2>\\3<span class="fn">http\\1</span></a>', methodology_maincontent)
# Replace figure image links to full site, to avoid 127.0.0.1:8080 links
methodology_maincontent = re.sub('href="\/', 'href="https://almanac.httparchive.org/', methodology_maincontent)
# Replace other chapter references with hash to anchor link (e.g. ./javascript#fig-1 -> #javascript-fig-1)
methodology_maincontent = re.sub('href="./([a-z0-9-]*)#', 'href="#\\1-', methodology_maincontent)
# Replace other chapter references to anchor link (e.g. ./javascript -> #javascript)
methodology_maincontent = re.sub('href="\.\/', 'href="#', methodology_maincontent)
# Replace double-hashed URLs (e.g. #contributors#patrickhulce -> #contributors-patrickhulce)
methodology_maincontent = re.sub('href="#([a-z0-9-]*)#', 'href="#\\1-', methodology_maincontent)
# Remove lazy-loading attributes
methodology_maincontent = re.sub(' loading="lazy"', '', methodology_maincontent)
return methodology_maincontent


# This function takes a string and adds the footnote links for printing
def add_footnote_links(html):
return re.sub('href="http(.*?)"(.*?)>(.*?)<\/a>', 'href="http\\1"\\2>\\3<span class="fn">http\\1</span></a>', html)


# Make these functions available in templates.
app.jinja_env.globals['get_view_args'] = get_view_args
app.jinja_env.globals['chapter_lang_exists'] = chapter_lang_exists
app.jinja_env.globals['ebook_exists'] = ebook_exists
app.jinja_env.globals['HTTP_STATUS_CODES'] = HTTP_STATUS_CODES
app.jinja_env.globals['get_ebook_methodology'] = get_ebook_methodology
app.jinja_env.globals['add_footnote_links'] = add_footnote_links


@app.route('/<lang>/<year>/')
Expand Down
4 changes: 2 additions & 2 deletions src/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
"homepage": "https://github.com/HTTPArchive/almanac.httparchive.org#readme",
"scripts": {
"generate": "node ./tools/generate",
"ebook_2019_en": "weasyprint http://127.0.0.1:8080/en/2019/ebook static/pdfs/web_almanac_2019_en.pdf",
"ebook_2019_ja": "weasyprint http://127.0.0.1:8080/ja/2019/ebook static/pdfs/web_almanac_2019_ja.pdf",
"ebook_2019_en": "prince http://127.0.0.1:8080/en/2019/ebook?print -o static/pdfs/web_almanac_2019_en.pdf --pdf-profile='PDF/UA-1'",
"ebook_2019_ja": "prince http://127.0.0.1:8080/ja/2019/ebook?print -o static/pdfs/web_almanac_2019_ja.pdf --pdf-profile='PDF/UA-1'",
"deploy": "echo \"Y\" | gcloud app deploy --project webalmanac --stop-previous-version"
},
"devDependencies": {
Expand Down
Loading