Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backslashes preceded by spaces wrongly doubled when converting from Dokuwiki to reStructedText #8178

Closed
dbitouze opened this issue Jul 12, 2022 · 15 comments
Labels

Comments

@dbitouze
Copy link

Explain the problem.
In the output of:

pandoc -f dokuwiki -t rst <<< "<latex>$\sum_{\substack{(i,j) \in I^2 \i \neq j}}$</latex>"

backslashes preceded by spaces are wrongly doubled:

<latex>$\sum_{\substack{(i,j) \\in I^2 \\i \\neq j}}$</latex>

BTW, no such issue when the string is converted from dokuwiki to org:

$ pandoc -f dokuwiki -t org <<< "<latex>$\sum_{\substack{(i,j)\in I^2\i\neq j}}$</latex>"
<latex>$\sum_{\substack{(i,j)\in I^2\i\neq j}}$</latex>

Pandoc version?

  • pandoc: v2.18
  • OS: Linux Mageia 8
@dbitouze dbitouze added the bug label Jul 12, 2022
@jgm
Copy link
Owner

jgm commented Jul 12, 2022

Simpler demonstration of the phenomenon:

% pandoc -f native -t rst
[ Str "a\\x", Space, Str "\\y" ]
a\x \\y

@jgm
Copy link
Owner

jgm commented Jul 12, 2022

Here's what I see with docutils' rst2html:

a\\x \\y

gets converted to

<p>a\x \y</p>

and

a\x \y

to

<p>ax y</p>

Finally,

a\x \\y

gets converted to

<p>ax \y</p>

So it seems we should be doubling all literal escape characters. I no longer remember why the code does something fancier:

  escapeString' firstChar opts (c:cs) =
    case c of
         _    | c `elemText` "\\`*_|" &&
                (firstChar || null cs) -> '\\':c:escapeString' False opts cs

@jgm
Copy link
Owner

jgm commented Jul 12, 2022

My guess is that you want it to be single backslash both times, because (I assume) this is a representation of LaTeX math in dokuwiki? So maybe a second issue is that the dokuwiki reader should be taught to recognize it as such. Can you link to official documentation of this syntax?

jgm added a commit that referenced this issue Jul 12, 2022
Previously we didn't escape it when it is word-internal,
but that seems wrong.  See #8178.
@dbitouze
Copy link
Author

Here's what I see with docutils' rst2html:

In fact, I'm using rst for a Sphinx-doc LaTeX FAQ. And the double backslashes harm.

@dbitouze
Copy link
Author

dbitouze commented Jul 12, 2022

My guess is that you want it to be single backslash both times, because (I assume) this is a representation of LaTeX math in dokuwiki?

Indeed.

So maybe a second issue is that the dokuwiki reader should be taught to recognize it as such.

Would be nice.

Can you link to official documentation of this syntax?

I'm not sure to understand what you want: an official documentation for the Dokuwiki or the reStructedText side?

  • About Dokuwiki, there are LaTeX plugins (e.g. this one) and MathJax plugin but, since I'm not the administrator of the Dokuwiki starting LaTeX FAQ, I don't know which are the one(s) used.
  • About reStructedText, the official syntax is here.

@jgm
Copy link
Owner

jgm commented Jul 14, 2022

Yes, for dokuwiki. Sounds like this syntax is from an optional plugin, not part of the core syntax? That may be why it's not supported.

@dbitouze
Copy link
Author

Yes, for dokuwiki. Sounds like this syntax is from an optional plugin, not part of the core syntax?

Indeed.

That may be why it's not supported.

I can understand.

But, anyway, why backslashes are treated differently, depending on whether they are preceded by spaces (doubled) or not (not doubled)?

@jgm
Copy link
Owner

jgm commented Jul 15, 2022

But, anyway, why backslashes are treated differently, depending on whether they are preceded by spaces (doubled) or not (not doubled)?

As noted above, that's an issue with the RST writer. However, it has been fixed in the commit linked above.

@jgm
Copy link
Owner

jgm commented Jul 15, 2022

If this is a widely used plugin, I'm open to supporting it, because I don't think this syntax would be likely to appear with another meaning.

@dbitouze
Copy link
Author

dbitouze commented Jul 15, 2022

If this is a widely used plugin,

I don't know if the one used in the Dokuwiki site I'm trying to convert to Sphinx-doc is widely used and I must admit it is rather old (last updated on 2011-04-29).

I'm open to supporting it,

Thanks! But see above.

because I don't think this syntax would be likely to appear with another meaning.

Indeed.

@dbitouze
Copy link
Author

But, anyway, why backslashes are treated differently, depending on whether they are preceded by spaces (doubled) or not (not doubled)?

As noted above, that's an issue with the RST writer. However, it has been fixed in the commit linked above.

Well, please remove a doubt from my mind: does this fix make the RST writer:

  • never double,
  • or always double,

the backslashes?

@jgm
Copy link
Owner

jgm commented Jul 16, 2022

always double (see above)

@dbitouze
Copy link
Author

always double (see above)

Sigh... This will be very harmful when converting to RST files that contain TeX commands that always start with a backslash, very often preceded by a space.

@jgm
Copy link
Owner

jgm commented Jul 16, 2022

As I explained, this requires that the dokuwiki reader recognize the special TeX math contexts in which backslashes behave differently. Currently the reader does not, because it's not part of core dokuwiki syntax.

@dbitouze
Copy link
Author

OK, thanks for the clarification.

AFAICS, the only LaTeX Dokuwiki's plugin is LaTeX Plugin, despite its oldness. Considering its syntax, the ideal would be to not double backslashes:

  • that are in <latex>…</latex> tags: not a difficult part I guess,
  • that are in math introduced by the other delimiters ($…$, $$…$$, \begin{displaymath}…\end{displaymath}, \begin{eqnarray}…\end{eqnarray}, \begin{eqnarray*}…\end{eqnarray*}, \begin{equation}…\end{equation}, \begin{equation*}…\end{equation*}): more difficult as it would be nice to put them in a math RST directive or role but only when they are not in code blocks or snippets.

@jgm jgm closed this as completed in 5c3423f Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants