Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with emphasis when followed by Czech left double quotes #2331

Closed
wilx opened this issue Jul 26, 2015 · 5 comments
Closed

issue with emphasis when followed by Czech left double quotes #2331

wilx opened this issue Jul 26, 2015 · 5 comments

Comments

@wilx
Copy link
Contributor

wilx commented Jul 26, 2015

I have started using proper Czech quote characters in Markdown recently and I have noticed that LaTeX output does not use emphasis and shows the literal asterisk instead.

Here is the test. First, conversion to native shows the emphasis:

echo "Tyto dvě citace ze *„SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky sčítání bezdomovců“* hovoří za vše:" | pandoc -f markdown -t native
[Para [Str "Tyto",Space,Str "dv\283",Space,Str "citace",Space,Str "ze",Space,Emph [Str "\8222S\268\205T\193N\205",Space,Str "LIDU,",Space,Str "DOM\366",Space,Str "A",Space,Str "BYT\366",Space,Str "2011---V\253sledky",Space,Str "s\269\237t\225n\237",Space,Str "bezdomovc\367\8220"],Space,Str "hovo\345\237",Space,Str "za",Space,Str "v\353e:"]]

Second, the same input converted to LaTeX:

echo "Tyto dvě citace ze *„SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky sčítání bezdomovců“* hovoří za vše:" | pandoc -f markdown -t latex
Tyto dvě citace ze *„SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky sčítání
bezdomovců``* hovoří za vše:

However, if I replace the Czech double quotes with ordinary ASCII quotes, it works:

echo "Tyto dvě citace ze *"SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky sčítání bezdomovců"* hovoří za vše:" | pandoc -f markdown -t latex
Tyto dvě citace ze \emph{SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky
sčítání bezdomovců} hovoří za vše:
@jgm
Copy link
Owner

jgm commented Jul 26, 2015

The problem is that latex enables --smart by default (see pandoc.hs lines 1242-3), and --smart treats as a "double quote start" character. That's obviously not what you want on an input like this. (Try with -t native --smart and you'll see that --smart is the culprit.)

Unfortunately, there's currently no way to turn OFF the "smart" default for LaTeX. You can do --no-tex-ligatures, and you'll get the emphasis, but --- won't be interpreted as an em-dash, so this probably isn't quite right for you either unless you want to give up the dash ligatures too.

What's the best way forward? I could remove the "default to smart" behavior for LaTeX and ConText, but this might break some existing document workflows. (Still, this seems the most principled solution.) Or I could add a --no-smart option, but I think that leads to a confusing situation with different defaults for different writers.

+++ Václav Haisman [Jul 26 15 11:20 ]:

I have started using proper Czech quote characters in Markdown recently
and I have noticed that LaTeX output does not use emphasis and shows
the literal asterisk instead.

Here is the test. First, conversion to native shows the emphasis:
echo "Tyto dvě citace ze „SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky sčítání bez
domovců“
hovoří za vše:" | pandoc -f markdown -t native
[Para [Str "Tyto",Space,Str "dv\283",Space,Str "citace",Space,Str "ze",Space,Emp
h [Str "\8222S\268\205T\193N\205",Space,Str "LIDU,",Space,Str "DOM\366",Space,St
r "A",Space,Str "BYT\366",Space,Str "2011---V\253sledky",Space,Str "s\269\237t\2
25n\237",Space,Str "bezdomovc\367\8220"],Space,Str "hovo\345\237",Space,Str "za"
,Space,Str "v\353e:"]]

Second, the same input converted to LaTeX:
echo "Tyto dvě citace ze „SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky sčítání bez
domovců“
hovoří za vše:" | pandoc -f markdown -t latex
Tyto dvě citace ze „SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky sčítání
bezdomovců``
hovoří za vše:

However, if I replace the Czech double quotes with ordinary ASCII
quotes, it works:
echo "Tyto dvě citace ze "SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky sčítání bez
domovců"
hovoří za vše:" | pandoc -f markdown -t latex
Tyto dvě citace ze \emph{SČÍTÁNÍ LIDU, DOMŮ A BYTŮ 2011---Výsledky
sčítání bezdomovců} hovoří za vše:


Reply to this email directly or [1]view it on GitHub.

References

  1. issue with emphasis when followed by Czech left double quotes #2331

@wilx
Copy link
Contributor Author

wilx commented Jul 27, 2015

I have no idea what, if anything, should be changed. I am not sure that I understand why are the quotes not considered to be part of the emphasized bit but I can live with it if it is not a bug but a feature. :) I realise that Markdown is supposed to be a simple format and this might be pushing it a bit too far. I can remove the emphasis or put it inside the quotes or remove the quotes.

Thank you for the explanation.

@brainchild0
Copy link

See #5812 for other pitfalls with the smart feature in LaTeX output. Through accumulated examples, the view emerges that the default enabling of the smart functionality generates problems in LaTeX output that otherwise could be averted. To ask users to disable the feature manually for most or all of their uses of LaTeX output is incompatible with the principle that operation should just work naturally and correctly in the base case, with further options providing the utility principally of offering customizability, rather than improving correctness.

If the smart functionality would be disabled by default for LaTeX output, then users might retain the ability to use it through a further option that might be offered to force smart behavior regardless of writer defaults, (e.g. --force-smart). Also, though perhaps the choice violates current design considerations or limitations, it might be considered to achieve similar flexibility through adding a writer extension, (e.g +allow-smart).

Personally, I would like to see, if one exists or could be created, a comprehensive, or indeed any, list of genuine advantages for using the smart feature in any case of LaTeX output.

@brainchild0
Copy link

brainchild0 commented May 1, 2020

The original remarks note that ASCII but not local (e.g. as in Czech writing) quotations are associated with desired behavior. Is a fundamental limitation currently that LaTeX but not Pandoc is suited to handle localized typographic transformations from input text?

@tarleb
Copy link
Collaborator

tarleb commented May 21, 2021

It seems that this has been fixed in the years since.

@tarleb tarleb closed this as completed May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants