Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle \def better in LaTeX #2888

Closed
ickc opened this issue Apr 29, 2016 · 10 comments
Closed

Handle \def better in LaTeX #2888

ickc opened this issue Apr 29, 2016 · 10 comments

Comments

@ickc
Copy link
Contributor

ickc commented Apr 29, 2016

This problem involve the following LaTeX macros, which is put in the markdown source (I'm going to attached them below as files too):

LaTeX macros in markdown source

\def\BDpos{}
\def\BDneg{-}
\def\BDplus{+}
\def\BDminus{-}
\def\thetasigmamuthetadagger{\theta\sigma^\mu\theta^\dagger}
\def\thetasigmamuloweredthetadagger{\theta\sigma_\mu\theta^\dagger}
\newcommand{\dagg}[1]{#1^\dagger}
\newcommand{\smallnegspacedagger}{\hspace{-0.1pt}}
\newcommand{\thdthd}{\theta^\dagger\hspace{-1pt}\theta^\dagger}
\newcommand{\nablasubmu}{\nabla\hspace{-2pt}{}_\mu}
\def\beq{\begin{align}}
\def\eeq{\end{align}}
\def\bea{\begin{align*}}
\def\eea{\end{align*}}
\def\Baryon{{\rm B}}
\def\Lepton{{\rm L}}
\def\sbar{\overline}
\def\stilde{\widetilde}
\def\sst{\scriptscriptstyle}
\def\vac{|0\rangle}
\def\antivac{\langle 0|}
\def\G{\stilde G}
\def\Wmess{W_{\rm mess}}
\def\NI{\stilde N_1}
\def\nmess{N_5}
\def\lagr{{\cal L}}
\def\drbar{\overline{\rm DR}}
\def\msbar{\overline{\rm MS}}
\def\conj{{{\rm c.c.}}}
\def\Et{{\slashchar{E}_T}}
\def\Etot{{\slashchar{E}}}
\def\MPlanck{M_{\rm P}}
\def\cbeta{c_{\beta}}
\def\sbeta{s_{\beta}}
\def\cW{c_{W}}
\def\sW{s_{W}}
\def\deltaeps{\delta}
\def\sigmabar{\overline\sigma}
\def\epsilonbar{\overline\epsilon}
\def\half{{1\over 2}}
\def\FX{F}
\def\Branching{{\rm Br}}
\def\Splus{S_+}
\def\Sminus{S_-}
\def\mAMSB{F_\phi}
\def\Dcon{\overline D}
\def\centeron#1#2{{\setbox0=\hbox{#1}\setbox1=\hbox{#2}\ifdim
\wd1>\wd0\kern.5\wd1\kern-.5\wd0\fi
\copy0\kern-.5\wd0\kern-.5\wd1\copy1\ifdim\wd0>\wd1
\kern.5\wd0\kern-.5\wd1\fi}}
\def\ltap{\;\centeron{\raise.35ex\hbox{$<$}}{\lower.65ex\hbox{$\sim$}}\;}
\def\gtap{\;\centeron{\raise.35ex\hbox{$>$}}{\lower.65ex\hbox{$\sim$}}\;}
\def\gsim{\mathrel{\gtap}}
\def\lsim{\mathrel{\ltap}}

Test

As a test, I created test.md, and run it with pandoc -s -o test.tex test.md.

In the results, I spotted at least 2 of these LaTeX command are being parsed and escaped:

Before

...
\newcommand{\dagg}[1]{#1^\dagger}
...
\def\centeron#1#2{{\setbox0=\hbox{#1}\setbox1=\hbox{#2}\ifdim
\wd1>\wd0\kern.5\wd1\kern-.5\wd0\fi
\copy0\kern-.5\wd0\kern-.5\wd1\copy1\ifdim\wd0>\wd1
\kern.5\wd0\kern-.5\wd1\fi}}
...

After

...
\newcommand{\dagg}{[}1{]}\{\#1\^{}\dagger\}
...
\def\centeron\#1\#2\{\{\setbox0=\hbox{#1}\setbox1=\hbox{#2}\ifdim
\wd1\textgreater{}\wd0\kern.5\wd1\kern-.5\wd0\fi
\copy0\kern-.5\wd0\kern-.5\wd1\copy1\ifdim\wd0\textgreater{}\wd1
\kern.5\wd0\kern-.5\wd1\fi\}\}
...

Another small problem I notice is that comparing to the markdown source to the generated TeX, the line breaking and spacing between things are changed. Why aren't them leave as is?

By the way, the def are not written by me.

@mb21
Copy link
Collaborator

mb21 commented Apr 29, 2016

you should probably put such definitions in the template instead...

@ickc
Copy link
Contributor Author

ickc commented Apr 29, 2016

What you suggest is a workaround and I do have some other workaround. But I think it is supposed to work so it is a bug?

Edit: note that among those seas of def only the 2 I put in before & after has problems.

@jgm
Copy link
Owner

jgm commented Apr 29, 2016

Pandoc will parse pretty standard LaTeX macro definitions
with \newcommand. Once you start using TeX primitives like \def,
all bets are off. So I'd suggest putting these in a
template, or using another workaround.

I was surprised by what you reported with the \newcommand,
though. I get a different result:

pandoc -t latex
\newcommand{\dagg}[1]{#1^\dagger}
^D
\newcommand{\dagg}[1]{#1^\dagger}

Most likely either you're using an older pandoc, or the
parser got mixed up with some of the primitive TeX
definitions that come before this \newcommand.

Anyway, I think this issue should just be closed.

+++ ickc [Apr 29 16 12:29 ]:

What you suggest is a workaround and I do have some other workaround.
But I think it is supposed to work so it is a bug?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or [1]view it on GitHub

References

  1. Handle \def better in LaTeX #2888 (comment)

@ickc
Copy link
Contributor Author

ickc commented Apr 29, 2016

Yes. I did try to get a MWE with those commands alone but it didn't show the bug. Only if a particular combination of things are put together then the bug shows up. That's why I uploaded a MWE.

Before you close the issue, I want to empahsize that I am putting raw LaTeX code in the Markdown source and convert it to LaTeX, so that's why I said it is supposed to work in the other reply. If I were converting it to HTML then its my fault (but by the way, I do enclose the def and newcommand in $$ so that pandoc leave it as is and the MathJax can parse it successfully), but I am talking about raw LaTeX for LaTeX output, so I was hoping embedding raw LaTeX code can be more robust, since I'm going to do a lot.

edit: that's why I also mention the extra spaces and line breaks. Because I suppose they should be treated as raw LaTeX and pandoc shouldn't touch it in LaTeX output. This is related to another issue that currently there's no way (except for filter) to specify a certain section is raw latex/html but is left for pandoc to decide. I actually like this better (than say in MultiMarkdown) to keep the source clean, but that relies heavily on if pandoc can recognize them as raw LaTeX or not.

@jgm
Copy link
Owner

jgm commented Apr 29, 2016

+++ ickc [Apr 29 16 13:28 ]:

Yes. I did try to get a MWE with those commands alone but it didn't
show the bug. Only if a particular combination of things are put
together then the bug shows up. That's why I uploaded a MWE.

Before you close the issue, I want to empahsize that I am putting raw
LaTeX code in the Markdown source and convert it to markdown, so that's
why I said it is supposed to work in the other reply. If I were
converting it to HTML then its my fault (but by the way, I do enclose
the def and newcommand in $$ so that pandoc leave it as is and the
MathJax can parse it successfully), but I am talking about raw LaTeX
for LaTeX output, so I was hoping embedding raw LaTeX code can be more
robust, since I'm going to do a lot.

I understand that. But passing through raw LaTeX requires
identifying the bits that are LateX and separating them from
the bits that are Markdown. That's not always easy.

If you stick to \newcommand, pandoc will do this reliably.
If you use tex primitives, especially things like

\def\centeron#1#2{{\setbox0=\hbox{#1}\setbox1=\hbox{#2}\ifdim
\wd1>\wd0\kern.5\wd1\kern-.5\wd0\fi
\copy0\kern-.5\wd0\kern-.5\wd1\copy1\ifdim\wd0>\wd1
\kern.5\wd0\kern-.5\wd1\fi}}

things are more difficult for pandoc. It will recognize
\def as a LaTeX command, and \centeron, but when it
hits #1#2 it treats this as regular text. (And then
everything that comes after is screwed up.)

We could probably do better in this particular case, but
I think there's no hope of making it always work for arbitrary
tex.

Note that you could easily rewrite the above using
\newcommand.

@ickc
Copy link
Contributor Author

ickc commented May 10, 2016

@jgm

But passing through raw LaTeX requires
identifying the bits that are LateX and separating them from
the bits that are Markdown. That's not always easy.

Agree. I actually ran into this problem quite a while ago but didn't file an issue here because I know \def is not supported. But later on I was thinking about how dependable it is to put raw LaTeX code in pandoc, that's why I later filed the issue using the problem I encountered before.

I don't know how exactly this should be solved. One way I'm brain storming is to provide an optional way to explicitly declare a section to be raw LaTeX (or even HTML). @bpj has a filter do something like this in Pandoc filter to insert arbitrary raw output markup as Code/CodeBlocks with an attribute raw=.. Perhaps a solution like that should make into the official pandoc to deal with these kind of situation (to guarantee pandoc don't break the LaTeX code in markdown source when output to LaTeX).

As a sidenote, MathJax handles those macros fine (they support \let, \def, \(re)newcommand, (re)newenvironment). But a problem is MathJax requires an extra pair of math delimiters to enclose it. I'm still thinking how it should be done to write a markdown source that's compatible with both HTML and LaTeX output.

@ickc
Copy link
Contributor Author

ickc commented May 10, 2016

Bug report for pandoc parsing LaTeX macros

Although the nature of the issue is different but since the following bug is an example of the code I was talking about initially here, so I post it here rather than a new issue:

In LaTeX source:

\newcommand{\sbar}{\overline}
$-\lambda_f H \sbar f
f$

using the following pandoc command:

pandoc -s -o test.md test.tex

will resulted in:

$-\lambda_f H {\overline}f
f$

But the expected result is

$-\lambda_f H \overline f
f$

MWE attached here.

@jgm
Copy link
Owner

jgm commented May 10, 2016

This is a known issue: see #1390

+++ ickc [May 09 16 19:01 ]:

will resulted in:

$-\lambda_f H {\overline}f
f$

But the expected result is

$-\lambda_f H \overline f
f$

@jgm
Copy link
Owner

jgm commented May 10, 2016

It might be worth adding code to handle \def better, even
if this doesn't solve the problem in full generality.

@jgm jgm changed the title Bug report: pandoc to LaTeX: LaTeX command being parsed in a certain situation Handle \def better in LaTeX Dec 7, 2016
jgm added a commit that referenced this issue Jul 5, 2017
This rewrite is primarily motivated by the need to
get macros working properly (#982, #934, #3779, #3236,
 #1390, #2888, #2118).

We now tokenize the input text, then parse the token stream.
Macros modify the token stream, so they should now be effective in any
context, including math. (Thus, we no longer need the clunky macro
processing capacities of texmath.)

A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.
jgm added a commit that referenced this issue Jul 6, 2017
This rewrite is primarily motivated by the need to
get macros working properly (#982, #934, #3779, #3236,
 #1390, #2888, #2118).  A side benefit is that the
reader is significantly faster (27s -> 19s in one
benchmark, and there is a lot of room for further
optimization).

We now tokenize the input text, then parse the token stream.

Macros modify the token stream, so they should now be effective
in any context, including math. Thus, we no longer need the clunky
macro processing capacities of texmath.

A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.

* Added Text.Pandoc.Readers.LaTeX.Types (new exported module).
  Exports Macro, Tok, TokType, Line, Column.  [API change]
* Text.Pandoc.Parsing: adjusted type of `insertIncludedFile`
  so it can be used with token parser.
* Removed old texmath macro stuff from Parsing.
  Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
* Removed texmath macro material from Markdown reader.
* Changed types for Text.Pandoc.Readers.LaTeX's
  rawLaTeXInline and rawLaTeXBlock.  (Both now return a String,
  and they are polymorphic in state.)
* Added orgMacros field to OrgState.  [API change]
* Removed readerApplyMacros from ReaderOptions.
  Now we just check the `latex_macros` reader extension.
@jgm
Copy link
Owner

jgm commented Aug 7, 2017

Closed by c806ef1 which adds support for simple \def macros.

@jgm jgm closed this as completed Aug 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants