-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF build issue #35
Comments
This looks like to completly block the update of See #35 |
This comment was marked as outdated.
This comment was marked as outdated.
I reproduced the error with Docker container based on Ubuntu 22.04. Here is the contents of Dockerfile. Essential part to reproduce the issue is a list of packages installed by
|
It turned out that http://www.unicode-symbol.com/u/C4CF.html So, I attempted to use package
However, the build on my environment was failed due to another error, "TeX capacity exceeded, sorry [input stack size=5000].", which may be indicating the lack of enough memory. Maybe I can open pull request to docsbuild-scripts repo later so that anyone can try this (awkward) fix. |
Curiously, there are no "\uC4CF" in Japanese html doc. |
For reference, this also has been discussed here: texjporg/platex#84 |
That is true. This is the reason why I call my fix "ad hoc way", which just avoid the error instead of fixing essential problem. |
Pushed branch. https://github.com/take6/docsbuild-scripts/tree/fix-japanese-doc-build-error Could anyone try if it works? |
Created pull request. |
I failed to build PDF with this error.
|
Building 3.9/10 branch(./build_docs.py ... --branch 3.9 or 3.10)Causes following error. I confirmed the error in c-api and library, but it might happen in other document too.
Building 3.11 branch
|
Unicode character error in howto-regex is caused by Non-ASCII/Non-Japanese letters in the IGNORECASE section (https://docs.python.org/3/howto/regex.html#compilation-flags). These Unicode letters are introduced in 2017(python/cpython@cd195e2). I wonder why build starts failing. Are the build procedures changed? |
With python/docsbuild-scripts#145, I managed to build PDFs other than library.pdf. To build library.pdf, I had to remove two occurrences of |
Error in codecs.rst
rest src:
Generated TeX src
|
I think we should remove the character from the official doc. |
Multiple U+FFFD is used twice in codecs.rst: $ git grep $'\xef\xbf\xbd'
Doc/library/codecs.rst:| | decoding, use ``�`` (U+FFFD, the official |
Doc/library/codecs.rst: Substitutes ``?`` (ASCII character) for encoding errors or ``�`` (U+FFFD, |
All other languages can show U+FFFD. And most fonts has glyph for it. |
Here's a minimum example we should build for a Japanese PDF document. TeX source: sample.tex \documentclass[a4paper,10pt,dvipdfmx]{ujreport}
\usepackage[T1]{fontenc}
\usepackage[noto-otc]{pxchfon}
\usepackage[utf8]{inputenc}
\usepackage[german]{babel}
\begin{document}
こんにちは
ſ: (U+017F, LATIN SMALL LETTER LONG S) <- LaTeX Error: Unicode character ſ (U+017F) not set up for use with LaTeX.
�: (U+FFFD, REPLACEMENT CHARACTER). <- Undefined control sequence
K: (U+212A, KELVIN SIGN) <- No error in uplatex, but dvipdfmx show warning
[1
dvipdfmx:warning: No character mapping available.
CMap name: NotoSerifCJK-Regular.ttc:0:jp90-UCS4-H
input str: <0000212a>
]
\end{document} Build command: $ uplatex sample.tex
$ dvipdfmx sample.dvi |
Hope this help. (I can generate dvi file but cannot open. I'll make sure of the situation) |
Thank you very much! It makes to render the |
Great! |
Progress report: Although I managed to build PDF with LuaTex, the following two issues remain.
|
Hello @JulienPalard, I want to specify the following string as a preamble in build_docs.py, but no luck. Do you have any idea to do it? \usepackage[noto-otf]{luatexja-preset}
\usepackage{newunicodechar,luacode}
\begin{luacode*}
luatexbase.add_to_callback('process_input_buffer', function (s)
if s:match('\xef\xbf\xbd') then
return s:gsub('\xef\xbf\xbd', '\xef\xa3\xbd')
end
end, 'hedge_fffd')
\end{luacode*}
\newunicodechar{^^^^212a}{K}
\newfontface{\fRepC}{DejaVu Sans Mono}
\newunicodechar{^^^^f8fd}{{\fRepC\ltjalchar"FFFD}} |
It appears most current problems are with Unicode characters which are not supported by the fonts. So, in order to contribute what are the constraints on the fonts for your builds? Related: I made a comment and another one on numpy project regarding some analogous issue there of missing Chinese ideograms. Their project use xelatex, but for lualatex based approach which I understand is experimented with here (rather than upLaTeX) the same method would work. With upLaTeX I am not sure as I don't read Japanese and I am not sure how that would go, no tested yet. With xelatex/lualatex I know that Unicode characters are a single TeX token from the user point of view so syntax is simple to do the catcode activation and let the character pick up a suitable (OpenType/TrueType) font. With uplatex I don't know which kind of fonts it supports... |
@jfbu Thank you for the examples and comment. We are using the default NotoSerifCJK-Regular.ttc font with uplatex. Currently, some characters in Python documents that are listed here are causing problems with uplatex. Unicode characters will be used more in the future, so I think lualatex will be the way to go. |
@atsuoishimoto I agree that using lualatex should simplify support of Unicode. In fact I wanted to help with uplatex but unfortunately I am too ignorant with it and do not know how to use an OpenType/TrueType font with it, although I understand from With lualatex/xelatex on the other hand one can concentrate on OpenType/TrueType fonts only. So for example I did:
and identified that there are quite a few available fonts in TeXLive. With the newer albatross release, the tacit "and" for search works and it is simpler indeed to find a single font for all problematic characters. For example among them The model to add to some lualatex document using other fonts a special handling of problematic characters would be like this
One may need perhaps to add in the The package In passing I noticed that with As for the f it is available in FreeSerif so with the default Sphinx font set-up for lualatex it works fine too, without any special set-up such as above. The U+FFFD on the other hand is very problematic in LuaLaTeX and must be removed from source as you know already and have a workaround at python/docsbuild-scripts#145. This being said, if you go to check https://github.com/sphinx-doc/sphinx/blob/master/sphinx/builders/latex/constants.py you will see that for Japanese, normally Sphinx uses neither polyglossia nor the FreeSerif, FreeSans, FreeMono fonts. I thus believe if you use the
(make sure to do in the preamble. You will then have a possibly more natural usage of Sorry for long comment but basically the message is that if you do switch to Lualatex, maybe you don't want to keep the Sphinx usage of polyglossia, as you already use a Japanese dedicated class in your configuration ( |
I just tried. No luck neither.
The
The "b" part was completly dropped so anyway there's a big problem for us here. Unescaping strace string repr, it look like:
So yes, "Unterminated quoted string" it is... |
OK I'm back. I tried:
The error changed, so looks like we're going forward. We're now hitting:
this happen in |
Hello @JulienPalard. Thank you for your work! I succeeded in building reference.pdf with the docker image following. So the error is probably due to different versions of the relevant packages. Can I see a list of the versions of the packages installed in the environment you are using for the build?
|
@jfbu I'm sorry for my slow response, and I sincerely appreciate your excellent commentary. Thanks to the tools and settings you taught me, I will be able to solve the problems by myself in the future. Thanks also for the information about polyglossia, I was able to remove polyglossia, and I am now free from a ton of warning messages! |
@atsuoishimoto the server is running an
|
Oh wait. It works for 3.10, 3.11 and 3.12! I'm digging further... |
Yes, I can reproduce it on docs.python.org : 3.9 fails, 3.10 succeed. |
@JulienPalard Thank you for the info. I updated the translation to adjust page break, so the build will succeed after the translation merged. |
Oh, the old version doesn't reflect Transifex edits to the repository. Please merge #42 to avoid error. |
I think the error in the reference.tex for 3.9 is now resolved, so we can close this issue. |
@JulienPalard We should have resolved the Python 3.9 issue a few days ago, but the Python 3.9 Documentation and Downloads page has not been updated since February 10. Can you check if the error is still there? |
3.9 being in "security only" mode is no longer being built. I just ran a build manually for you. |
@JulienPalard Sorry for bothering you. I confirmed the 3.9 documents are now built successfully. I'm closing this issue. I sincerely thank everyone involved! |
I sincerely thank everyone involved too! |
Since #34 and #31 I have issues building PDFs on docs.python.org, it can easily be reproduced using https://github.com/python/docsbuild-scripts/ as:
(you can easily try other branches by changing the
--branch
argument)The text was updated successfully, but these errors were encountered: