PDF'ing with Jupyter Notebook #163

eddy-ojb · 2017-04-29T17:16:26Z

Hello,

At the end of this message there is json code. Please save it in to a text file, save the file as .ipynb and open in Jupyter to view an illustration of my main issue with Jupyter, which is that exporting to pdf is not useable.

Jupyter is superb in a lot of respects but there is real demand in the data science cumminity when it comes to presenting documents properly with Jupyter. We end up having to translate all the work we've done in Jupyter to another editor so that documents that can be pdf'ed and printed. Tedious and prone to mistakes.

Unfortunately, most of the world uses pdf's for viewing and printing professional documents. It would be handy to produce work with figures using code in the background, selectively hide inputs or outputs and pdf efficiently for professional presentation. This is handy in the event that documents are printed and annotated by hand (we all do it).

If Jupyter tries to appear like everything is laid out on a page and allows the use of markdown, HTML and Latex editing, why not actually try and achieve the real thing instead of presenting the illusion?

Ed

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"toc": "true"
},
"source": [
"# Table of Contents\n",
"

<div class="lev1 toc-item"><a href="#Some-code:" data-toc-modified-id="Some-code:-1"><span class="toc-item-num">1 Some code:<div class="lev1 toc-item"><a href="#Test-pdf'ing-of-bullet-points" data-toc-modified-id="Test-pdf'ing-of-bullet-points-2"><span class="toc-item-num">2 Test pdf'ing of bullet points<div class="lev1 toc-item"><a href="#Line-breaks" data-toc-modified-id="Line-breaks-3"><span class="toc-item-num">3 Line breaks<div class="lev1 toc-item"><a href="#More-HTML-imcompatibility:" data-toc-modified-id="More-HTML-imcompatibility:-4"><span class="toc-item-num">4 More HTML imcompatibility:<div class="lev1 toc-item"><a href="#Page-Break" data-toc-modified-id="Page-Break-5"><span class="toc-item-num">5 Page Break<div class="lev1 toc-item"><a href="#Margins" data-toc-modified-id="Margins-6"><span class="toc-item-num">6 Margins<div class="lev1 toc-item"><a href="#Referencing" data-toc-modified-id="Referencing-7"><span class="toc-item-num">7 Referencing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Some code:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"

\n",
"<table border="1" class="dataframe">\n",
" \n",
" <tr style="text-align: right;">\n",
" \n",
" HomeID\n",
" Price\n",
" SqFt\n",
" Bedrooms\n",
" Bathrooms\n",
" Offers\n",
" Brick\n",
" Neighborhood\n",
" \n",
" \n",
" \n",
" \n",
" 0\n",
" 1\n",
" 114300\n",
" 1790\n",
" 2\n",
" 2\n",
" 2\n",
" No\n",
" East\n",
" \n",
" \n",
"\n",
"

"
],
"text/plain": [
" HomeID Price SqFt Bedrooms Bathrooms Offers Brick Neighborhood\n",
"0 1 114300 1790 2 2 2 No East"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv('house_prices.csv')\n",
"\n",
"df.head(n=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Test pdf'ing of bullet points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"

Lorem ipsum dolor sit amet

Phasellus iaculis neque

Purus sodales ultricies

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Line breaks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is a line break because I want to start a new paragraph.
\n",
"\n",
"Now here are some example quotes:\n",
"\n",
"\n",
"1. “If you're not failing every now and again, it's a sign you're not doing anything very innovative.”\n",
"2. “Confidence is what you have before you understand the problem.”"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Why does Jupyter give the appearence that everything is on pages.
\n",
"We can alter the justification of the text but this doesn't translate to pdf:
\n",
"\n",
"<div style="text-align: center"> some text"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# More HTML imcompatibility:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class="alert alert-block alert-info">\n",
"\n",
"Hello"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Page Break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"-----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Did the page break work?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Margins"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"No clear way of achieving this"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Referencing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, no clear way of achieving this"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [conda root]",
"language": "python",
"name": "conda-root-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"navigate_num": "#000000",
"navigate_text": "#333333",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700",
"sidebar_border": "#EEEEEE",
"wrapper_background": "#FFFFFF"
},
"moveMenuLeft": true,
"nav_menu": {
"height": "141px",
"width": "252px"
},
"navigate_menu": true,
"number_sections": true,
"sideBar": false,
"threshold": 4,
"toc_cell": true,
"toc_section_display": "block",
"toc_window_display": false,
"widenNotebook": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}

takluyver · 2017-05-02T13:00:45Z

We're working on some bits of this (cc @mpacer), but our PDF export is based on Latex, and HTML/Markdown doesn't translate easily to Latex code. Latex is both a more semantic way to describe a document, and a Turing-complete programming language, so it's not the easiest environment to target.

You may have more luck exporting to HTML and then 'printing' the resulting document to a PDF from a web browser. It would be possible to write an nbconvert exporter which automated this using a headless browser such as wkhtmltopdf.

A couple of specific points:

Your Markdown 'page break' produces a line across the document (HTML <hr> tag), not a page break. So I think the horizontal line in the PDF is what I'd expect.
Referencing: I have a side project cite2c to add references, though the citations do not yet show up on export (Work out some story for exporting citations to Latex takluyver/cite2c#7).
If you want to show a notebook, rather than pasting the raw JSON in an issue, you can put it in a gist, and then use nbviewer to get a link that shows the content as HTML.

eddy-ojb · 2017-05-03T08:54:40Z

Thanks takluver.

Cite2c looks awesome!

This post:

jupyter/notebook#2458 (comment)

is a duplicate of this one, which I wrote and revisited later on in the day after my pdf'ing nightmare only to find it wasn't on the thread. That was because I was searching the wrong thread, so assumed I hadn't submitted properly. Are you OK if I close this thread and append your cite2c solution to the other thread?

Thanks for the pointers.

Best

Ed

takluyver · 2017-05-03T10:12:58Z

Yep, let's close this so the conversation can be in one place.

takluyver closed this as completed May 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF'ing with Jupyter Notebook #163

PDF'ing with Jupyter Notebook #163

eddy-ojb commented Apr 29, 2017 •

edited

Loading

takluyver commented May 2, 2017

eddy-ojb commented May 3, 2017

takluyver commented May 3, 2017

PDF'ing with Jupyter Notebook #163

PDF'ing with Jupyter Notebook #163

Comments

eddy-ojb commented Apr 29, 2017 • edited Loading

takluyver commented May 2, 2017

eddy-ojb commented May 3, 2017

takluyver commented May 3, 2017

eddy-ojb commented Apr 29, 2017 •

edited

Loading