Conversion of ipynb to pdf fails because of ANSI escape codes in stacktrace #5633

wstomv · 2019-07-05T17:51:26Z

This issue concerns the conversion of a Jupyter notebook (-f ipynb) to PDF (-t latex -o *.pdf), where the notebook contains a Python stack trace (example attached), being the result of a runtime error in a code cell. Such a stack trace includes ANSI escape sequences to color some of the output. However, LaTeX chokes on the result that pandoc produces. Message with pdflatex as engine:

Error producing PDF.
! Package inputenc Error: Unicode character ^^[ (U+001B)
(inputenc)                not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.102 \end{verbatim}

Try running pandoc with --pdf-engine=xelatex.

With xelatex as engine (lualatex responds similarly):

Error producing PDF.
! Text line contains an invalid character.
l.96 ^^[

These ANSI escape sequences are encoded in plain ASCII in the Jupyter notebook (inside a JSON string), but appear as real escape sequences in the produced LaTeX source file, inside a verbatim environment.

Further details:

Version of pandoc:

$ pandoc -v
pandoc 2.7.3
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2.2, skylighting 0.8.1

Command line:

$ pandoc -f ipynb -t latex --pdf-engine=xelatex -o Stacktrace.pdf Stacktrace.ipynb

Stacktrace.ipynb is attached (zipped, because *.ipynb are not supported).
Stacktrace.ipynb.zip

A workaround would be nice to have.

(One option is to strip the ASCII encoded ANSI escape sequences from the notebook before conversion with pandoc (e.g. using sed). Alternatively, the LaTeX source generated by pandoc can be stripped of ANSI escape sequences (e.g. using ansifilter) and then pulled through a LaTeX-to-PDF engine separately. None of these options is very appealing.)

The text was updated successfully, but these errors were encountered:

jgm · 2019-07-06T21:23:23Z

For convenience, this is what's in the ipynb:

   "outputs": [
    {
     "ename": "ZeroDivisionError",
     "evalue": "division by zero",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mZeroDivisionError\u001b[0m                         Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-1-9e1622b385b6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;36m1\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mZeroDivisionError\u001b[0m: division by zero"
     ]
    }
   ],

and this is how pandoc parses it:

jgm@macbook-air-3:~/src/pandoc % pandoc -t native ~/Downloads/Stacktrace.ipynb
[Div ("",["cell","code"],[("execution_count","1"),("ExecuteTime","{\"start_time\":\"2019-07-05T17:24:16.423114Z\",\"end_time\":\"2019-07-05T17:24:16.477717Z\"}")])
 [CodeBlock ("",["python"],[]) "1/0"
 ,Div ("",["output","error"],[("ename","ZeroDivisionError"),("evalue","division by zero")])
  [CodeBlock ("",[],[]) "\ESC[0;31m---------------------------------------------------------------------------\ESC[0m\n\ESC[0;31mZeroDivisionError\ESC[0m                         Traceback (most recent call last)\n\ESC[0;32m<ipython-input-1-9e1622b385b6>\ESC[0m in \ESC[0;36m<module>\ESC[0;34m\ESC[0m\n\ESC[0;32m----> 1\ESC[0;31m \ESC[0;36m1\ESC[0m\ESC[0;34m/\ESC[0m\ESC[0;36m0\ESC[0m\ESC[0;34m\ESC[0m\ESC[0;34m\ESC[0m\ESC[0m\n\ESC[0m\n\ESC[0;31mZeroDivisionError\ESC[0m: division by zero\n"]]]

Here's a hexdump of the latex output:

0000000   \   b   e   g   i   n   {   S   h   a   d   e   d   }  \n   \
0000010   b   e   g   i   n   {   H   i   g   h   l   i   g   h   t   i
0000020   n   g   }   [   ]  \n   \   D   e   c   V   a   l   T   o   k
0000030   {   1   }   \   O   p   e   r   a   t   o   r   T   o   k   {
0000040   /   }   \   D   e   c   V   a   l   T   o   k   {   0   }  \n
0000050   \   e   n   d   {   H   i   g   h   l   i   g   h   t   i   n
0000060   g   }  \n   \   e   n   d   {   S   h   a   d   e   d   }  \n
0000070  \n   \   b   e   g   i   n   {   v   e   r   b   a   t   i   m
0000080   }  \n 033   [   0   ;   3   1   m   -   -   -   -   -   -   -
0000090   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -
*
00000d0   -   -   -   - 033   [   0   m  \n 033   [   0   ;   3   1   m
00000e0   Z   e   r   o   D   i   v   i   s   i   o   n   E   r   r   o
00000f0   r 033   [   0   m                                            
0000100                                                           T   r
0000110   a   c   e   b   a   c   k       (   m   o   s   t       r   e
0000120   c   e   n   t       c   a   l   l       l   a   s   t   )  \n
0000130 033   [   0   ;   3   2   m   <   i   p   y   t   h   o   n   -
0000140   i   n   p   u   t   -   1   -   9   e   1   6   2   2   b   3
0000150   8   5   b   6   > 033   [   0   m       i   n     033   [   0
0000160   ;   3   6   m   <   m   o   d   u   l   e   > 033   [   0   ;
0000170   3   4   m 033   [   0   m  \n 033   [   0   ;   3   2   m   -
0000180   -   -   -   >       1 033   [   0   ;   3   1   m     033   [
0000190   0   ;   3   6   m   1 033   [   0   m 033   [   0   ;   3   4
00001a0   m   / 033   [   0   m 033   [   0   ;   3   6   m   0 033   [
00001b0   0   m 033   [   0   ;   3   4   m 033   [   0   m 033   [   0
00001c0   ;   3   4   m 033   [   0   m 033   [   0   m  \n 033   [   0
00001d0   m  \n 033   [   0   ;   3   1   m   Z   e   r   o   D   i   v
00001e0   i   s   i   o   n   E   r   r   o   r 033   [   0   m   :    
00001f0   d   i   v   i   s   i   o   n       b   y       z   e   r   o
0000200  \n   \   e   n   d   {   v   e   r   b   a   t   i   m   }  \n

jgm · 2019-07-06T21:25:04Z

How do you think pandoc should deal with this? We could easily modify the latex writer to strip out ANSI escape sequences. Is that a good solution?

wstomv · 2019-07-06T21:59:01Z

The simplest solution for now is to strip those escape sequences, so that conversion to PDF works. Currently, I pull the notebook through sed before feeding it to pandoc. Note that also output of a code cell written to stderr is shown in red. On 6 Jul 2019 23:25, John MacFarlane <notifications@github.com> wrote: How do you think pandoc should deal with this? We could easily modify the latex writer to strip out ANSI escape sequences. Is that a good solution? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#5633?email_source=notifications&email_token=ABKZ44YJZONISW6KRD6GZ3TP6EEUFA5CNFSM4H6NBJH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLAP6I#issuecomment-508954617>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABKZ44Z67U4USU2DV4CI7B3P6EEUFANCNFSM4H6NBJHQ>.

wstomv · 2019-07-08T08:33:54Z

Just in case someone needs it, here is the sed command that I use to strip the ANSI color codes:

sed -E 's/\\\\u001b[^m]*m//g' file.ipynb | pandoc ...

jgm · 2019-07-11T01:38:15Z

We could either modify the ipynb reader to strip out ANSI escape sequences in code, or modify the latex writer to strip them out. The former approach seems more sensible since we'd get similar problems in other output formats (HTML, docx?). However, this has the drawback that ipynb would not round trip the escape sequences when going ipynb -> ipynb. Maybe that's not an issue?

panisson · 2019-07-16T16:09:01Z

I've found the same problem when using the question mark to access the documentation of an object.

Very simple example:

s = "a"
s.strip?

produces an output like this:

   "outputs": [
    {
     "data": {
      "text/plain": [
       "\u001b[0;31mSignature:\u001b[0m \u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mchars\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m/\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
       "\u001b[0;31mDocstring:\u001b[0m\n",
       "Return a copy of the string with leading and trailing whitespace remove.\n",
       "\n",
       "If chars is given and not None, remove characters in chars instead.\n",
       "\u001b[0;31mType:\u001b[0m      builtin_function_or_method\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],

And running XeTeX with the resulting latex code produces the error Text line contains an invalid character..

jgm · 2019-07-16T16:30:35Z

I came up with a good solution that strips them in most cases but still allows for round-trip.

aarchiba · 2020-02-10T14:26:20Z

Could this be fixed by jupyter/nbconvert#1181 which converts the escape sequences into colors?

maegul · 2020-09-01T13:12:15Z

Could this be fixed by jupyter/nbconvert#1181 which converts the escape sequences into colors?

Hmmm ... seems that you've addressed this in 2.10.1 (having read the changelog) .... would it be possible to have the option of stripping the ansi escape sequences without using --ipynb-output=best? I find that notebooks with lots of images can take some time when using --ipynb-output=best.

ickc · 2022-04-11T06:19:52Z

Hi, I encountered this problem in my custom workflow, using pandoc 2.18.

Is it true that the ASNI are striped only if it is from ipynb?

In my workflow there's some steps that

combine many ipynb into a single file with native format first,
(which was converted from ipynb to native with --ipynb-output=all as we want to keep everything at this step),
then in another step to convert the native file to pdf/tex, and applying --ipynb-output=best doesn't help removing the ANSI sequences.

I can provide MWE and open a new issue if needed.

jgm · 2022-04-11T16:07:12Z

I can provide MWE and open a new issue if needed.

Yes please.

jgm added format:LaTeX writer labels Jul 6, 2019

jgm closed this as completed in 5454aad Jul 16, 2019

This was referenced Aug 24, 2020

ENH: Enable auto-configuration to build all content pages as individual PDF files jupyter-book/jupyter-book#687

Closed

Support for ANSI color sequences in Jupyter cell output jupyter-book/jupyter-book#762

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion of ipynb to pdf fails because of ANSI escape codes in stacktrace #5633

Conversion of ipynb to pdf fails because of ANSI escape codes in stacktrace #5633

wstomv commented Jul 5, 2019

jgm commented Jul 6, 2019

jgm commented Jul 6, 2019

wstomv commented Jul 6, 2019 via email

wstomv commented Jul 8, 2019

jgm commented Jul 11, 2019

panisson commented Jul 16, 2019

jgm commented Jul 16, 2019

aarchiba commented Feb 10, 2020

maegul commented Sep 1, 2020

ickc commented Apr 11, 2022

jgm commented Apr 11, 2022 •

edited

Loading

Conversion of ipynb to pdf fails because of ANSI escape codes in stacktrace #5633

Conversion of ipynb to pdf fails because of ANSI escape codes in stacktrace #5633

Comments

wstomv commented Jul 5, 2019

jgm commented Jul 6, 2019

jgm commented Jul 6, 2019

wstomv commented Jul 6, 2019 via email

wstomv commented Jul 8, 2019

jgm commented Jul 11, 2019

panisson commented Jul 16, 2019

jgm commented Jul 16, 2019

aarchiba commented Feb 10, 2020

maegul commented Sep 1, 2020

ickc commented Apr 11, 2022

jgm commented Apr 11, 2022 • edited Loading

jgm commented Apr 11, 2022 •

edited

Loading