Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc2tex: generate docs to Latex #17997

Merged
merged 2 commits into from
May 14, 2021
Merged

doc2tex: generate docs to Latex #17997

merged 2 commits into from
May 14, 2021

Conversation

a-mr
Copy link
Contributor

@a-mr a-mr commented May 11, 2021

We have rst2tex command, which converts *.rst files to Latex, but didn't have its counterpart for documentation generation.

Here is an example of pdf file generated by nim doc2tex and pdflatex: os.nim -> os.tex -> os.pdf:

image

Implementation comment:

  1. renderVerbatim.renderNimCode did not correctly handle whitespaces for Latex so it was commonized in favor of rstgen version. Hence also most changes in the Html tests: more concise displaying of whitespaces, they are shown literally, without span (which did not have a dedicated Whitespace class in nimdoc.cfg anyway)
  2. most problems were caused by escaping stuff:
  • version of escaping used in rstgen.nim did not escape single apostrophe ' but I found that this actually the correct behavior proof (w.r.t. xmltree escaping). Hence the 2nd part of Html test changes
  • existing escaping scheme did not work for Latex TOC, e.g. { represented as \symbol{123} was shown as just 123 in the TOC. So I digged into Latex escaping basics.

Tested only manually with os.nim, rst.nim, highlite.nim.

cc @narimiran @timotheecour

handleDocOutputOptions graph.config
graph.config.setErrorMaxHighMaybe
semanticPasses(graph)
if json: registerPass(graph, docgen2JsonPass)
else: registerPass(graph, docgen2Pass)
if ext == TexExt:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

case ext
of TexExt
of JsonExt
of HtmlExt
else: doAssert false, $ext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this style, done, thanks.

@@ -299,18 +304,21 @@ proc mainCommand*(graph: ModuleGraph) =
else:
loadConfigs(DocConfig, cache, conf, graph.idgen)
commandRst2Html(cache, conf)
of cmdRst2tex:
of cmdRst2tex, cmdDoc2tex:
Copy link
Member

@timotheecour timotheecour May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my intuition is that cmdDoc2tex should be more similar to cmdDoc2 as far as this part is concerned and hence should be grouped with it instead, eg so that the other statements are taken into account; wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same intuition initially. In practice it was a bit easier to insert it into cmdRst2tex section.
And there is no really significant impact, it can be written both ways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc2tex generates a ton of LockLevel warnings for eg, which the code path for cmdDoc2 handles; it makes sense: the frontend for cmdDoc2 and cmdDoc2tex is the same.

this block is not relevant for doc2tex but is executed:

    for warn in [warnRedefinitionOfLabel, warnUnknownSubstitutionX,
                 warnLanguageXNotSupported,
                 warnFieldXNotSupported, warnRstStyle]:
      conf.setNoteDefaults(warn, true)

this block is relevant for doc2tex but is not executed:

      conf.setNoteDefaults(warnLockLevel, false) # issue #13218
      conf.setNoteDefaults(warnRedefinitionOfLabel, false) # issue #13218
        # because currently generates lots of false positives due to conflation
        # of labels links in doc comments, e.g. for random.rand:
        #  ## * `rand proc<#rand,Rand,Natural>`_ that returns an integer
        #  ## * `rand proc<#rand,Rand,range[]>`_ that returns a float

see what i mean? ;-)

a viable alternative is to use an auxiliary template/proc to call the shared part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block is not relevant for doc2tex but is executed:

why is it not relevant for doc2tex? May be it is the other way around, it should be enabled for cmdDoc2 also? We need RST/Markdown warning for nim files also, am I wrong?..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-mr just saw your response, yes, it looks like it should be relevant for both cmdDoc2 and cmdDoc2tex

(but this block is relevant for doc2tex but is not executed: is still relevant)

@@ -8,22 +8,24 @@ split.item.toc = "20"
# after this number of characters

doc.section = """
\chapter{$sectionTitle}\label{$sectionID}
\begin{description}
\rsthA{$sectionTitle}\label{$sectionID}
Copy link
Member

@timotheecour timotheecour May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's \rsthA
(found it:

config/nimdoc.tex.cfg:96:14:\newcommand{\rsthA}[1]{\section{#1}}

)

and can you explain the removal of description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc.section never worked (and \chapter AFAIK is available only for Latex book class or similar).

description got in the way because its labels (between square brackets in \item[label] description) had problems with some symbols, I did not dig deeper — because IMHO description does not add any useful formatting (or TOC items or any other value) here and it's convenient to just drop it.

@@ -38,6 +40,10 @@ $moduledesc
$content
"""

# $1 - number of listing in document, $2 - language (e.g. langNim), $3 - anchor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand this comment; where is $1, $2, $3 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a variable what a user can input into the following config variable if he decides to customize it. The same as nimdoc.cfg.

@@ -195,26 +197,26 @@ proc addRtfChar(dest: var string, c: char) =
else: add(dest, c)

proc addTexChar(dest: var string, c: char) =
# Escapes 10 special Latex characters. Note that [, ], and ` are not
# considered as such. TODO: neither is @, am I wrong?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you find a reference and link it here?

@@ -195,26 +197,26 @@ proc addRtfChar(dest: var string, c: char) =
else: add(dest, c)

proc addTexChar(dest: var string, c: char) =
# Escapes 10 special Latex characters. Note that [, ], and ` are not
# considered as such. TODO: neither is @, am I wrong?
case c
of '_': add(dest, "\\_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factor everything that can be factored via:

of '_', '{', ...: add(dest, "\\" & c)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done.

@@ -169,14 +169,17 @@ proc getOutFile2(conf: ConfigRef; filename: RelativeFile,
else:
result = getOutFile(conf, filename, ext)

proc newDocumentor*(filename: AbsoluteFile; cache: IdentCache; conf: ConfigRef, outExt: string = HtmlExt, module: PSym = nil): PDoc =
proc isLatexCmd(conf: ConfigRef): bool = conf.cmd in {cmdRst2tex, cmdDoc2tex}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs nim --fullhelp entry

@@ -169,14 +169,17 @@ proc getOutFile2(conf: ConfigRef; filename: RelativeFile,
else:
result = getOutFile(conf, filename, ext)

proc newDocumentor*(filename: AbsoluteFile; cache: IdentCache; conf: ConfigRef, outExt: string = HtmlExt, module: PSym = nil): PDoc =
proc isLatexCmd(conf: ConfigRef): bool = conf.cmd in {cmdRst2tex, cmdDoc2tex}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need changelog entry under Compiler changes

@@ -444,6 +444,7 @@ proc parseCommand*(command: string): Command =
of "e": cmdNimscript
of "doc0": cmdDoc0
of "doc2", "doc": cmdDoc2
of "doc2tex": cmdDoc2tex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future work we could add doc2pdf, and use nimcache for the artifacts produced by pdflatex to avoid polluting cwd for eg
(assumes user has pdflatex in PATH)

@@ -169,14 +169,17 @@ proc getOutFile2(conf: ConfigRef; filename: RelativeFile,
else:
result = getOutFile(conf, filename, ext)

proc newDocumentor*(filename: AbsoluteFile; cache: IdentCache; conf: ConfigRef, outExt: string = HtmlExt, module: PSym = nil): PDoc =
proc isLatexCmd(conf: ConfigRef): bool = conf.cmd in {cmdRst2tex, cmdDoc2tex}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nimdoc.out.css is generated but should not be
(ok to defer to future work)

@@ -285,7 +290,7 @@ proc mainCommand*(graph: ModuleGraph) =
# of labels links in doc comments, e.g. for random.rand:
# ## * `rand proc<#rand,Rand,Natural>`_ that returns an integer
# ## * `rand proc<#rand,Rand,range[]>`_ that returns a float
commandDoc2(graph, false)
commandDoc2(graph, HtmlExt)
if optGenIndex in conf.globalOptions and optWholeProject in conf.globalOptions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nim doc2tex --project lib/std/jsonutils.nim seems to do the right thing, but it generates idx files which don't make sense (IIRC) unless something like theindex.tex (from commandBuildIndex) is also generated

I think generating theindex.tex (which would end up, say, in a global TOC) would make sense to generate.

(ok to defer to future work)

@timotheecour timotheecour mentioned this pull request May 12, 2021
5 tasks
@Araq
Copy link
Member

Araq commented May 12, 2021

I know this is gonna be controversial but I found that in practice "print to PDF" for a website works much better than LaTex for Nim's documentation. We should use something like https://stackoverflow.com/questions/46077392/additional-options-in-chrome-headless-print-to-pdf

Latex cannot create documents that respect the margin. Yes, I know, I know, there are settings for that. They don't work, esp not for generated code that is generated from documentation that isn't aware of Latex's quirks.

@timotheecour
Copy link
Member

"print to PDF" for a website works much better than LaTex for Nim's documentation

see also this thread: timotheecour#727 (comment)

nim doc --outdir:/tmp/d05 --doccmd:skip lib/std/jsonutils.nim
pandoc /tmp/d05/jsonutils.html --pdf-engine /Library/TeX/texbin/pdflatex -t latex -o /tmp/d05/jsonutils.pdf

which seems more direct than going through chrome; but the margin is, as you noted, not respected; I would hope this is fixable though, with either approaches.

@Araq
Copy link
Member

Araq commented May 12, 2021

with either approaches.

Maybe but then we have to maintain the LaTeX feature which was added by me before "chrome headless" was a thing (or maybe I simply wasn't aware of it). I'm not aware of any benefits that LaTeX offers in 2021.

@timotheecour
Copy link
Member

timotheecour commented May 12, 2021

I'm not aware of any benefits that LaTeX offers in 2021.

At least most research papers still use latex, when you need to write math (eg: machine learning, computer vision, math etc); for all of latex flaws, it's still a better language to write math formulas in compared to alternatives (macro system etc). But that's a different topic from nim docs, admittedly.

@a-mr
Copy link
Contributor Author

a-mr commented May 13, 2021

Usually when I tried Chromium printing in Linux it turned to be pretty buggy, e.g. lines are broken horizontally in half. Selecting smaller page size works weirdly like it just downscales size of font. One can not really say that there is no overflow because it always uses ragged right edge (without full alignment). Firefox printing is on the same level.

Latex cannot create documents that respect the margin. Yes, I know, I know, there are settings for that. They don't work

@Araq You mean overflows in paragraphs (especially when code is inserted)? Then there is a setting for that in Latex. It allows any amount of whitespace stretching inside a line. I will send it in one of next PR. (the current setting in nimdoc.tex.cfg is its "lite" version). It does not work in verbatim env aka code blocks (so one would need to observe 80 characters limit or downsize fonts) and I need to check that for tables.

I'm not aware of any benefits that LaTeX offers in 2021.

Advantages of Latex that I know are largely unrelated to automatic doc generation:

  1. It's a markup language purported for human editing in contrast to HTML/Xml with their ugly closing tags. This also makes easier to fine-tune things in Latex. If people had ignored ergonomics they would have used something like Lisp instead of Nim.
  2. AFAIK latex is better at both-sides text alignment — I believe that's why it turned off by default in HTML. Though Latex eagerness by default causes overflows often, yes.
  3. latex allows multi-column layout
  4. many things are customized easier in Latex: by redefining commands. And there is much more "batteries inside" — predefined packages/classes. In HTML you need to comb the Internet for recipes.

If you believe that Web technologies will continue to grow fast then it's logical they will deprecate Latex one day. Personally I doubt the bright future of that bloatware (though I may be biased because for half a year I experienced a severe unresposiveness of new versions of Chromium on my Debian 10, I even had to switch back to Firefox after a few years with Chromium).

@Araq
Copy link
Member

Araq commented May 13, 2021

Usually when I tried Chromium printing in Linux it turned to be pretty buggy, e.g. lines are broken horizontally in half.

I tested Nim's manual with it and the result looked better than Latex's PDF.

It's a markup language purported for human editing in contrast to HTML/Xml with their ugly closing tags.

These markup languages (which you don't have to write by hand btw, there are decent editors for them available) at least preserve the tree structure of the math formular, Latex doesn't... It's simply a bad tool, I keep running into PDF files produced by Latex that cannot respect the margin. See https://thrift.apache.org/static/files/thrift-20070401.pdf page 7 for a recent example.

If you believe that Web technologies will continue to grow fast then it's logical they will deprecate Latex one day.

I have an unreasonable hate towards Latex and want to remove the support for it.

@a-mr
Copy link
Contributor Author

a-mr commented May 13, 2021

See https://thrift.apache.org/static/files/thrift-20070401.pdf page 7 for a recent example.

OK see how we can fix it with the setting I mentioned above.

\documentclass{article}

\usepackage[a6paper]{geometry}

\begin{document}

\textbf{Before}:

Though the Thrift transport interfaces map more directly to a blocking I/O 
model, we have  implemented a  high performance \texttt{TNonBlockingServer} in 
C++ based on \texttt{libevent} and the \texttt{TFramedTransport}.

\setlength\emergencystretch{\hsize}\hbadness=10000

\textbf{After}:

Though the Thrift transport interfaces map more directly to a blocking I/O 
Though the Thrift transport interfaces map more directly to a blocking I/O 
model, we have  implemented a  high performance \texttt{TNonBlockingServer} in 
C++ based on \texttt{libevent} and the \texttt{TFramedTransport}.

\end{document}

image

@a-mr
Copy link
Contributor Author

a-mr commented May 13, 2021

I have an unreasonable hate towards Latex and want to remove the support for it.

I've been using Latex since 2001 and I have unreasonable love for it, and what?

If you don't want support it then just don't. I will.

@Araq Araq merged commit 97970d9 into nim-lang:devel May 14, 2021
@kaushalmodi
Copy link
Contributor

@a-mr Can you mention this in the Changelog? This is a pretty big feature.

@timotheecour
Copy link
Member

changelog + nim --fullhelp, refs #17997 (comment) + a few other points that are not yet addressed

@a-mr
Copy link
Contributor Author

a-mr commented Jun 1, 2021

@timotheecour yes, i haven't addressed many of your comments, i'll try to within a month. Especially sorting out warnings stuff.

@kaushalmodi , sure. I need to update changelog and docgen.rst.

@timotheecour timotheecour added the TODO: followup needed remove tag once fixed or tracked elsewhere label Jun 1, 2021
PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022
* `doc2tex`: generate docs to Latex

* address some comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO: followup needed remove tag once fixed or tracked elsewhere
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants