Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF build issue #35

Closed
JulienPalard opened this issue Apr 13, 2022 · 41 comments
Closed

PDF build issue #35

JulienPalard opened this issue Apr 13, 2022 · 41 comments

Comments

@JulienPalard
Copy link
Member

Since #34 and #31 I have issues building PDFs on docs.python.org, it can easily be reproduced using https://github.com/python/docsbuild-scripts/ as:

./build_docs.py --build-root ./build_root --www-root ./www --log-directory ./logs --group $(id -g) --skip-cache-invalidation --language ja --branch 3.9

(you can easily try other branches by changing the --branch argument)

@JulienPalard
Copy link
Member Author

This looks like to completly block the update of docs.python.org/ja/, as the make invocation fails, the docsbuild-script does not rsync the output.

See #35

@methane

This comment was marked as outdated.

@take6
Copy link

take6 commented Jan 9, 2023

I reproduced the error with Docker container based on Ubuntu 22.04. Here is the contents of Dockerfile. Essential part to reproduce the issue is a list of packages installed by apt-get.

FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

WORKDIR /pydoc

RUN apt update \
    && apt-get install -y python3.11 python3-pip python3.11-venv git rsync zip \
    latexmk xindy texinfo \
    texlive-xetex texlive-latex-recommended texlive-fonts-extra texlive-lang-japanese \
    && apt-get clean
RUN python3.11 -m venv /pydoc/pydoc-venv

@take6
Copy link

take6 commented Jan 9, 2023

It turned out that U+C4CF is Korean character, 쓏.

http://www.unicode-symbol.com/u/C4CF.html

So, I attempted to use package kotex by editing docsbuild-scripts/build_docs.py. This is quite ad hoc way but it appears to be working.

PLATEX_DEFAULT = (
    "-D latex_engine=platex",
    "-D latex_elements.inputenc=",
    "-D latex_elements.fontenc=",
    r"-D latex_elements.preamble=\\usepackage{kotex}",
)

However, the build on my environment was failed due to another error, "TeX capacity exceeded, sorry [input stack size=5000].", which may be indicating the lack of enough memory. Maybe I can open pull request to docsbuild-scripts repo later so that anyone can try this (awkward) fix.

@methane
Copy link
Member

methane commented Jan 9, 2023

Curiously, there are no "\uC4CF" in Japanese html doc.
I don't know why LaTeX searches this character.

@m-aciek
Copy link
Contributor

m-aciek commented Jan 9, 2023

For reference, this also has been discussed here: texjporg/platex#84

@take6
Copy link

take6 commented Jan 9, 2023

Curiously, there are no "\uC4CF" in Japanese html doc.
I don't know why LaTeX searches this character.

That is true. This is the reason why I call my fix "ad hoc way", which just avoid the error instead of fixing essential problem.

@take6
Copy link

take6 commented Jan 9, 2023

Pushed branch.

https://github.com/take6/docsbuild-scripts/tree/fix-japanese-doc-build-error

Could anyone try if it works?

@take6
Copy link

take6 commented Jan 9, 2023

Created pull request.

@methane
Copy link
Member

methane commented Jan 9, 2023

I failed to build PDF with this error.

LaTeX Warning: Hyper reference `library/socket:module-socket' on page 4 undefin
ed on input line 167.

! TeX capacity exceeded, sorry [parameter stack size=10000].
\@inmathwarn #1->
                 \ifmmode \@latex@warning {Command \protect #1 invalid in ma...
l.170 ...tml\textgreater{}}{Emscripten Networking}
                                                  ^^M
If you really absolutely need more capacity,
you can ask a wizard to enlarge me.

@atsuoishimoto
Copy link
Collaborator

Building 3.9/10 branch(./build_docs.py ... --branch 3.9 or 3.10)

Causes following error. I confirmed the error in c-api and library, but it might happen in other document too.

"Improper discretionary list" error when creating divs

Building 3.11 branch

  • Error on dvi->pdf conversion while building files following(Actual Unicode character may vary): howto-unicode.pdf, howto-regex.pdf, whatsnew.pdf

    ! LaTeX Error: Unicode character 顛 (U+C4CF)
    not set up for use with LaTeX

  • Error on dvi->pdf conversion while building library.pdf

    library.pdf ! TeX capacity exceeded, sorry [input stack size=5000].

@atsuoishimoto
Copy link
Collaborator

atsuoishimoto commented Jan 15, 2023

Unicode character error in howto-regex is caused by Non-ASCII/Non-Japanese letters in the IGNORECASE section (https://docs.python.org/3/howto/regex.html#compilation-flags).

These Unicode letters are introduced in 2017(python/cpython@cd195e2).

I wonder why build starts failing. Are the build procedures changed?

@atsuoishimoto
Copy link
Collaborator

With python/docsbuild-scripts#145, I managed to build PDFs other than library.pdf.

To build library.pdf, I had to remove two occurrences of (U+FFFD, the official REPLACEMENT CHARACTER) letters in the codecs.rst.

@atsuoishimoto
Copy link
Collaborator

Error in codecs.rst

! String contains an invalid utf-8 sequence.
l.13748 decoding, use \sphinxcode{\sphinxupquote{
                                               �}} (U+FFFD, the official
?
! Emergency stop.

rest src:

 decoding, use ``�`` (U+FFFD, the official

Generated TeX src

decoding, use \\sphinxcode{\\sphinxupquote{\xef\xbf\xbd}} (U+FFFD, the official

@atsuoishimoto
Copy link
Collaborator

atsuoishimoto commented Jan 17, 2023

@methane
Copy link
Member

methane commented Jan 17, 2023

To build library.pdf, I had to remove two occurrences of (U+FFFD, the official REPLACEMENT CHARACTER) letters in the codecs.rst.

I think we should remove the character from the official doc.

@JulienPalard
Copy link
Member Author

Multiple U+FFFD is used twice in codecs.rst:

$ git grep $'\xef\xbf\xbd'
Doc/library/codecs.rst:|                         | decoding, use ```` (U+FFFD, the official     |
Doc/library/codecs.rst:   Substitutes ``?`` (ASCII character) for encoding errors or ```` (U+FFFD,

@methane
Copy link
Member

methane commented Jan 17, 2023

All other languages can show U+FFFD. And most fonts has glyph for it.
So it seems just a Japanese LaTeX issue. No strong reason to prohibit it in Python doc.

@atsuoishimoto
Copy link
Collaborator

atsuoishimoto commented Jan 20, 2023

Here's a minimum example we should build for a Japanese PDF document.

TeX source: sample.tex

\documentclass[a4paper,10pt,dvipdfmx]{ujreport}
\usepackage[T1]{fontenc}

\usepackage[noto-otc]{pxchfon}

\usepackage[utf8]{inputenc}
\usepackage[german]{babel}

\begin{document}

こんにちは

ſ:  (U+017F, LATIN SMALL LETTER LONG S) <- LaTeX Error: Unicode character ſ (U+017F) not set up for use with LaTeX.

�:  (U+FFFD, REPLACEMENT CHARACTER). <- Undefined control sequence

K: (U+212A, KELVIN SIGN) <- No error in uplatex, but dvipdfmx show warning
[1
dvipdfmx:warning: No character mapping available.
 CMap name: NotoSerifCJK-Regular.ttc:0:jp90-UCS4-H
  input str: <0000212a>
  ]

\end{document}

Build command:

$ uplatex sample.tex
$ dvipdfmx sample.dvi

@Daku-on
Copy link

Daku-on commented Jan 20, 2023

Hope this help. (I can generate dvi file but cannot open. I'll make sure of the situation)
https://tex.stackexchange.com/questions/448465/unicode-character-%C5%BF-u17f-in-lyx-2-3

@atsuoishimoto
Copy link
Collaborator

atsuoishimoto commented Jan 20, 2023

Hope this help. (I can generate dvi file but cannot open. I'll make sure of the situation)

Thank you very much! It makes to render the ſ in the PDF!!!!!

@Daku-on
Copy link

Daku-on commented Jan 20, 2023

Great!
I have to go now. But if there remains some problems when I go back home I'll check and try them later.

@atsuoishimoto
Copy link
Collaborator

atsuoishimoto commented Feb 5, 2023

Progress report:

Although I managed to build PDF with LuaTex, the following two issues remain.

  1. I need to specify the following preamble, but it is difficult to specify with the current docsbuild-scripts. (Newline required after luacode).

    \usepackage[noto-otf]{luatexja-preset}
    \usepackage{newunicodechar,luacode}
    \begin{luacode*}
     luatexbase.add_to_callback('process_input_buffer', function (s)
      if s:match('\xef\xbf\xbd') then
        return s:gsub('\xef\xbf\xbd', '\xef\xa3\xbd')
      end
     end, 'hedge_fffd')
    \end{luacode*}
    \newunicodechar{^^^^212a}{K}
    \newfontface{\fRepC}{DejaVu Sans Mono}
    \newunicodechar{^^^^f8fd}{{\fRepC\ltjalchar"FFFD}}
    
  2. LuaTeX fails building library.pdf with Arithmetic overflow. I created TeX to reproduce the error at https://github.com/atsuoishimoto/lualatex-arithmetic-overflow

@atsuoishimoto
Copy link
Collaborator

Hello @JulienPalard, I want to specify the following string as a preamble in build_docs.py, but no luck. Do you have any idea to do it?

\usepackage[noto-otf]{luatexja-preset}
\usepackage{newunicodechar,luacode}
\begin{luacode*}
 luatexbase.add_to_callback('process_input_buffer', function (s)
  if s:match('\xef\xbf\xbd') then
    return s:gsub('\xef\xbf\xbd', '\xef\xa3\xbd')
  end
 end, 'hedge_fffd')
\end{luacode*}
\newunicodechar{^^^^212a}{K}
\newfontface{\fRepC}{DejaVu Sans Mono}
\newunicodechar{^^^^f8fd}{{\fRepC\ltjalchar"FFFD}}

@jfbu
Copy link

jfbu commented Feb 7, 2023

It appears most current problems are with Unicode characters which are not supported by the fonts.

So, in order to contribute what are the constraints on the fonts for your builds?

Related: I made a comment and another one on numpy project regarding some analogous issue there of missing Chinese ideograms. Their project use xelatex, but for lualatex based approach which I understand is experimented with here (rather than upLaTeX) the same method would work. With upLaTeX I am not sure as I don't read Japanese and I am not sure how that would go, no tested yet.

With xelatex/lualatex I know that Unicode characters are a single TeX token from the user point of view so syntax is simple to do the catcode activation and let the character pick up a suitable (OpenType/TrueType) font. With uplatex I don't know which kind of fonts it supports...

@atsuoishimoto
Copy link
Collaborator

@jfbu Thank you for the examples and comment.

We are using the default NotoSerifCJK-Regular.ttc font with uplatex. Currently, some characters in Python documents that are listed here are causing problems with uplatex. Unicode characters will be used more in the future, so I think lualatex will be the way to go.

@jfbu
Copy link

jfbu commented Feb 8, 2023

@atsuoishimoto I agree that using lualatex should simplify support of Unicode. In fact I wanted to help with uplatex but unfortunately I am too ignorant with it and do not know how to use an OpenType/TrueType font with it, although I understand from pxchfon.sty that it is able to do that. And even though I can see using albatross (a fontconfig java wrapper coming with TeXLive, recently at release 0.5.0) that some old fashioned prepared-for-TeX fonts do support the needed characters (fonts with extension .pfb in some type1/ hierarchy), one has then to find if there is for those characters some suitable for-LaTeX-font-encoding (such as are TS1, or LGR or X2 or ... ) and if the font has the corresponding "fd" support files, a very laborious process.

With lualatex/xelatex on the other hand one can concentrate on OpenType/TrueType fonts only. So for example I did:

albatross -b0 -t -d ſ K | grep texlive

and identified that there are quite a few available fonts in TeXLive. With the newer albatross release, the tacit "and" for search works and it is simpler indeed to find a single font for all problematic characters. For example among them DejaVuSans.ttf, FreeSerif.otf (and .ttf), NotoSans-Regular.ttf. And the FreeSerif, FreeSans, FreeMono is already default font for Sphinx using Lualatex (see below).

The model to add to some lualatex document using other fonts a special handling of problematic characters would be like this

\usepackage{fontspec}% a priori already inserted by default by Sphinx for Lualatex
\newfontfamily\DejaVuSans{DejaVuSans}[Extension=.ttf]
\catcode`ſ\active\defſ{{\DejaVuSans\stringſ}}

One may need perhaps to add in the \newfontfamily line some extra Script= within the brackets in some cases, I am simply too ignorant but see fontspec documentation section OpenType scripts and languages.

The package newunicodechar only brings syntactic sugar to the line with \catcode, internally it does (about) exactly the same and you still have to enter as above explicitely the suitable font change, so there is no real reason to use newunicodechar and not directly the core TeX \catcode syntax. (the newunicodechar document says the native syntax is scary; but in truth its real usefulness is not with lualatex/xelatex but with pdflatex).

In passing I noticed that with lualatex and the ltjsbook class the character K seemed to render to PDF out-of-the-box: it seems ltjsbook does some font configuration already and the K U+212A in source ends up as a K U+004B in PDF, at least if copied from evince, and trying to apply to it a recipe like the above in fact has nil effect, I don't know why.

As for the f it is available in FreeSerif so with the default Sphinx font set-up for lualatex it works fine too, without any special set-up such as above.

The U+FFFD on the other hand is very problematic in LuaLaTeX and must be removed from source as you know already and have a workaround at python/docsbuild-scripts#145.

This being said, if you go to check https://github.com/sphinx-doc/sphinx/blob/master/sphinx/builders/latex/constants.py you will see that for Japanese, normally Sphinx uses neither polyglossia nor the FreeSerif, FreeSans, FreeMono fonts.

I thus believe if you use the ltjsbook class, it may not be needed nor even recommended to use polyglossia. Perhaps you could experiment doing builds with

latex_elements = {
    'polyglossia': '',
   #  'fontpkg': '',
}

(make sure to do make clean or to clean the latex build directory if you do this change on existing project)

in the preamble. You will then have a possibly more natural usage of ltjsbook class as it is already Japanese native. The 'fontpkg' key loads the FreeSerif, FreeSans, FreeMono, for Latin and other scripts. With them the f works out of the box. There is also a 'fontenc' key which loads fontspec which provides \setmainfont et al. as needed in default 'fontpkg' for Sphinx lualatex. It also lets fontspec not activate some TeX ligatures which, at least with xelatex, not sure with lualatex, would cause -- in a code-block to render as a long dash, if I recall correctly, or perhaps it was a problem with curly quotes.

Sorry for long comment but basically the message is that if you do switch to Lualatex, maybe you don't want to keep the Sphinx usage of polyglossia, as you already use a Japanese dedicated class in your configuration (ltjsbook). You may however keep using the default 'fontenc' + 'fontpkg' configuration which load fontspec and the GNU FreeFont families. (without them I expect ltjsbook will end up using Latin Modern for Latin characters).

@JulienPalard
Copy link
Member Author

Hello @JulienPalard, I want to specify the following string as a preamble in build_docs.py, but no luck. Do you have any idea to do it?

I just tried. No luck neither.

$ cat Makefile 
.PHONY: test
test:
	@printf "%s\n" $(DEFINE)

$ make DEFINE=$'a\nb'
a
make: b: No such file or directory
make: *** [Makefile:3 : test] Error 127

$ cat Makefile 
.PHONY: test
test:
	@printf "%s\n" "$(DEFINE)"

$ make DEFINE=$'a\nb'
/bin/sh: 1: Syntax error: Unterminated quoted string
make: *** [Makefile:3 : test] Error 2

The \n completly mess with make trying hard to avoid us newlines. It tries to run:

execve("/bin/sh", ["/bin/sh", "-c", "printf \"%s\\n\" \"a"], ...

The "b" part was completly dropped so anyway there's a big problem for us here.

Unescaping strace string repr, it look like:

printf "%s\n" "a

So yes, "Unterminated quoted string" it is...

@JulienPalard
Copy link
Member Author

OK I'm back. I tried:

The error changed, so looks like we're going forward. We're now hitting:

[140
! error:  (pdf backend): 'endlink' ended up in different nesting level than 'sta
rtlink'
!  ==> Fatal error occurred, no output PDF file produced!

this happen in reference.log, here is the incriminating reference.tex if someone wants to read it (artificially added .txt to ease Github filters).

@atsuoishimoto
Copy link
Collaborator

atsuoishimoto commented Mar 14, 2023

Hello @JulienPalard. Thank you for your work!

I succeeded in building reference.pdf with the docker image following. So the error is probably due to different versions of the relevant packages. Can I see a list of the versions of the packages installed in the environment you are using for the build?

FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

WORKDIR /pydoc
RUN apt update \
    && apt-get install -y python3.11 python3-pip python3.11-venv git rsync zip curl \
build-essential \
fonts-freefont-otf \
fonts-noto \
mercurial \
latexmk \
texinfo \
texlive \
texlive-latex-extra \
texlive-latex-recommended \
texlive-fonts-recommended \
texlive-lang-all \
texlive-xetex \
xindy


RUN python3.11 -m venv .venv

ENV PATH="/pydoc/.venv/bin:${PATH}"
COPY requirements.txt .
RUN pip install -r requirements.txt
WORKDIR /builddir

image

@atsuoishimoto
Copy link
Collaborator

@jfbu I'm sorry for my slow response, and I sincerely appreciate your excellent commentary. Thanks to the tools and settings you taught me, I will be able to solve the problems by myself in the future.

Thanks also for the information about polyglossia, I was able to remove polyglossia, and I am now free from a ton of warning messages!

@JulienPalard
Copy link
Member Author

@atsuoishimoto the server is running an Ubuntu 20.04.5 LTS, it's currently having:

$ apt-cache policy fonts-freefont-otf fonts-noto latexmk texinfo texlive texlive-latex-extra texlive-latex-recommended texlive-fonts-recommended texlive-lang-all texlive-xetex xindy | grep '^[^ ]\|Installed'
fonts-freefont-otf:
  Installed: 20120503-10
fonts-noto:
  Installed: 20200323-1build1~ubuntu20.04.1
latexmk:
  Installed: 1:4.67-0.1
texinfo:
  Installed: 6.7.0.dfsg.2-5
texlive:
  Installed: 2019.20200218-1
texlive-latex-extra:
  Installed: 2019.202000218-1
texlive-latex-recommended:
  Installed: 2019.20200218-1
texlive-fonts-recommended:
  Installed: 2019.20200218-1
texlive-lang-all:
  Installed: 2019.20200218-1
texlive-xetex:
  Installed: 2019.20200218-1
xindy:
  Installed: 2.5.1.20160104-8

@JulienPalard
Copy link
Member Author

Oh wait. It works for 3.10, 3.11 and 3.12! I'm digging further...

@JulienPalard
Copy link
Member Author

Yes, I can reproduce it on docs.python.org : 3.9 fails, 3.10 succeed.

@atsuoishimoto
Copy link
Collaborator

atsuoishimoto commented Mar 18, 2023

@JulienPalard Thank you for the info.
The error is not Japanses specific issue, but happens when a link splits across a page boundary. In this case, the link to async for separated async and for between the pages.

I updated the translation to adjust page break, so the build will succeed after the translation merged.

@atsuoishimoto
Copy link
Collaborator

Oh, the old version doesn't reflect Transifex edits to the repository. Please merge #42 to avoid error.

@atsuoishimoto
Copy link
Collaborator

I think the error in the reference.tex for 3.9 is now resolved, so we can close this issue.

@atsuoishimoto
Copy link
Collaborator

@JulienPalard We should have resolved the Python 3.9 issue a few days ago, but the Python 3.9 Documentation and Downloads page has not been updated since February 10. Can you check if the error is still there?

@JulienPalard
Copy link
Member Author

3.9 being in "security only" mode is no longer being built. I just ran a build manually for you.

@atsuoishimoto
Copy link
Collaborator

@JulienPalard Sorry for bothering you. I confirmed the 3.9 documents are now built successfully.

I'm closing this issue. I sincerely thank everyone involved!

@JulienPalard
Copy link
Member Author

I sincerely thank everyone involved too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants