Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Babel's shorthand option makes some characters to be skipped #6817

Closed
lygamac opened this issue Nov 7, 2020 · 19 comments
Closed

Babel's shorthand option makes some characters to be skipped #6817

lygamac opened this issue Nov 7, 2020 · 19 comments

Comments

@lygamac
Copy link

lygamac commented Nov 7, 2020

When setting lang:es in the header, the . decimal separator is not rendered at all (it's supposed to be transformed to a comma). French still renders the decimal separator though.

image


pandoc version:
pandoc.exe 2.11.1
Compiled with pandoc-types 1.22, texmath 0.12.0.3, skylighting 0.10.0.3,
citeproc 0.1.0.3, ipynb 0.1

@lygamac lygamac changed the title Figure and longtable uses unneeded spaces, and is decimal separator not rendered in Spanish Figure and longtable use unneeded spaces, and decimal separator is not rendered in Spanish Nov 7, 2020
@jgm
Copy link
Owner

jgm commented Nov 7, 2020

Figure placement is done by LaTeX -- I'm not sure there's anything we can do about that in pandoc.

Second, when rendered from markdown directly to pdf, the column width of the longtable seems to be the length of the longest string, including spaces:

See the manual for information about how relative column widths are computed from the markdown source. (It depends which kind of markdown table you are using, but this is all documented. Looks like you have a pipe table, so you can adjust these widths by changing the widths of the lines under the headers.)

And last, when setting lang:es in the header, the . decimal separator is not rendered at all (it's supposed to be transformed to a comma). French still renders the decimal separator though.

I don't know why this is happening, but it seems to be something LaTeX is doing. Pandoc is just passing through the math verbatim. I'd ask about this on a LaTeX forum, using a simple pure tex example.

@jgm jgm closed this as completed Nov 7, 2020
@lygamac
Copy link
Author

lygamac commented Nov 8, 2020

I see, the problem seems to be the default latex template. I'll try to modify that to suit my use.

I found the comma problem. From the default latex template:

\ifxetex
  % Load polyglossia as late as possible: uses bidi with RTL langages (e.g. Hebrew, Arabic)
  \usepackage{polyglossia}
  \setmainlanguage[]{spanish}
\else
  \usepackage[shorthands=off,main=spanish]{babel}
\fi

The decimal is not rending due to \usepackage[shorthands=off,main=spanish]{babel} when using pdflatex. Without shorthands=off the decimal separator is rendered as comma, with the XeLatex condition above it's rendered as a point.

(Neither with shorthand=on the decimal is rendered. I don't know why but you have to remove that part in order to work.)

Seems when dot . is a shorthand character (Galician and Spanish, according to babel's documentation) the decimal operator is going to be ignored. If you @jgm don't mind, could you add something like this to the default template?

\ifxetex
  % Load polyglossia as late as possible: uses bidi with RTL langages (e.g. Hebrew, Arabic)
  \usepackage{polyglossia}
  \setmainlanguage[]{lang}
\else
  \ifdotshorthand
    \usepackage[main=lang]{babel}
  \else
    \usepackage[shorthands=off,main=lang]{babel}
  \fi
\fi

You had a typo on the comment btw! ;)

@lygamac
Copy link
Author

lygamac commented Nov 8, 2020

For the figure float, I figure out the problem.

In the default template the figure float has been defined as htbp, where p stands for special page only for floats. Specifying htb! instead to ignore special pages fixed the float problem.

@jgm jgm reopened this Nov 8, 2020
@jgm
Copy link
Owner

jgm commented Nov 8, 2020

Thanks for this -- I'll reopen so we can consider possible changes to the default latex template.

  1. consider adding the conditional on \ifdotsshorthand
  2. consider changing the default float position to htb!.

I'm not sure that I understand yet the implications of either.

What exactly is a "shorthand," anyway?

And what would be the effect of htb!? Does it mean that special pages cannot be used, even if the alternative is putting the figure on a page with, say, one line? I'd welcome feedback from texperts about what is best here.

@lygamac
Copy link
Author

lygamac commented Nov 8, 2020

Although I'm not a latex professional, as far as I understood, in short:

  • shorthand replaces characters to some TeX codes.
  • htb! prevents latex from creating pages only for figures. And yes, if the size of the figure does not fill the page, and a line is able to fit in, it would create a page with one figure followed by one line.
    • Changing \floatpagefraction to a higher margin seems to be a better choice (default is 0.5. When a figure occupies 50% of the page, latex will create a special page in which only figures can be placed).

With more detail, shorthand is defined as:

A shorthand is a sequence of one or two characters that expands to arbitrary TeX code.

In case of the babel package, the shorthand seems to be used for localization in order to minimize the changes in the tex file. Spanish and Galician packages redefine . character as tex code such that in math mode it's replaced by a comma and in text mode, it's still a dot -- it is formal to write decimal separator as comma in these languages instead of a dot.

(If there's no need to remove the shorthands defined by babel, I'll remove that part from the code.)

htb! forces latex to choice a float between h (here, right in the latex code), t (top of the page), and b (bottom of the page). Without !, latex will still put large figures in a special page when it finds it suits the best.

Anyway, the \floatpagefraction option one can always redefine it using a --include-in-header.

EDIT: In order to have \floatpagefraction working properly, the float modifier has be forced: htbp!. So with \renewcommand{\floatpagefraction}{0.8}, the figure is only going to have a special page without text when it occupies more than 80% of the page.

@jgm
Copy link
Owner

jgm commented Nov 8, 2020

I'm really confused about this. My understanding is that shorthands=off should eliminate all the "shorthands"...but then, . should remain as ., not be removed or ignored. Can anyone explain this?

I like the idea of adding

\renewcommand{\floatpagefraction}{0.8}

and changing the htbp to htbp!. Can anyone think of drawbacks? @tarleb @mb21 @adunning

@lygamac
Copy link
Author

lygamac commented Nov 8, 2020

According to this, even with shorthands=off the babel package sets some parameter for the shorthand characters.

It's considered to be a babel bug in the community: latex3/babel/issues/38. However, as the Galician babel had the same problem when I tested it before, my best guess is that it was done by purpose and not a bug.

Although passing the shorthand=off is not working properly, redefining the shorthand command is working correctly. Adding:

\let\LanguageShortHands\languageshorthands
\def\languageshorthands#1{}

after the babel package would disable all the shorthands while the decimal separator is rendered correctly in both Spanish and Galician (. remains as .).

@lygamac
Copy link
Author

lygamac commented Nov 8, 2020

Another thing, related to the default template. The default template's page number in the title page is always centered. The page position only takes effect beyond the second page.

Although for myself I disabled the page numbering for the title page, making it starts to display and count after the table of contents, considering that there are people who start the document immediately after the title, my style won't suit them.

As result, it might be a good idea to add something like this before \maketitle:

\makeatletter
	\@ifpackageloaded{fancyhdr}{
	\fancypagestyle{plain}{}
	}
\makeatother

So the page number position is always defined by the user in the header files.

@jgm
Copy link
Owner

jgm commented Nov 9, 2020

The post you linked to recommends simply using the es-nodecimaldot option. Did you try that? Maybe we should use shorthands=off,es-nodecimaldot. That seems simpler than redefining commands.

As for the first page number, why don't you open a separate issue for that?

@lygamac
Copy link
Author

lygamac commented Nov 9, 2020

es-nodecimaldot is solution only for Spanish babel. I'm afraid that there are more character and languages (for example Galician) not being rendered correctly due to the same reason.

Redefining command achieves the expected behavior for all languages.

why don't you open a separate issue for that?

On my way

@jgm
Copy link
Owner

jgm commented Nov 9, 2020

I'm hesitant to add this kind of low-level workaround to the default template, if it's really a bug in babel. Maybe it's the thing to do, though.

Query: are you sure this is caused by babel and not by the additional content pandoc inserts in the template slot babel-newcommands? Looking at the source of the LaTeX writer, we have

        $ defField "babel-newcommands" (vcat $
           map (\(poly, babel) -> literal $
            -- \textspanish and \textgalician are already used by babel
            -- save them as \oritext... and let babel use that
            if poly `elem` ["spanish", "galician"]
               then "\\let\\oritext" <> poly <> "\\text" <> poly <> "\n" <>
                    "\\AddBabelHook{" <> poly <> "}{beforeextras}" <>
                      "{\\renewcommand{\\text" <> poly <> "}{\\oritext"
                      <> poly <> "}}\n" <>
                    "\\AddBabelHook{" <> poly <> "}{afterextras}" <>
                      "{\\renewcommand{\\text" <> poly <> "}[2][]{\\foreignlanguage{"
                      <> poly <> "}{##2}}}"
               else (if poly == "latin" -- see #4161
                        then "\\providecommand{\\textlatin}{}\n\\renewcommand"
                        else "\\newcommand") <> "{\\text" <> poly <>
                    "}[2][]{\\foreignlanguage{" <> babel <> "}{#2}}\n" <>
                    "\\newenvironment{" <> poly <>
                    "}[2][]{\\begin{otherlanguage}{" <>
                    babel <> "}}{\\end{otherlanguage}}"

This affects precisely spanish and galician -- might it be interfering with babel's shorthands=off setting somehow?

Looks like it was added by @mb21 in 9328f4c?
Maybe he can comment.

@jgm
Copy link
Owner

jgm commented Nov 9, 2020

Oddly, I don't see this code appearing in latex results, with either pdflatex or xelatex as the pdf-engine. [EDIT: it appears that's because the list to which this map is applied is empty. Apparently this is just a list of languages that are used in the document, other than the main language. @mb21 is that as intended? I'd like to understand what's going on here a bit better. Note that if you add a fenced div to the document with {lang=es}, then the list is nonempty and this code gets added. However, and this I think is a separate bug, doing this causes an error with pdflatex or lualatex: "Environment spanish undefined."]

Reminder to self: as noted above, the . disappears with lang=es and --pdf-engine= pdflatex or lualatex (babel), but it is retained with xelatex (polyglossia).

@jgm
Copy link
Owner

jgm commented Nov 9, 2020

It looks as if this code is designed to avoid a conflict between babel and polyglossia, but the conflict shouldn't arise, since we use one or the other, right?
Or perhaps the worry is that some documentclasses might automatically load babel?

@lygamac
Copy link
Author

lygamac commented Nov 9, 2020

are you sure this is caused by babel and not by the additional content pandoc inserts in the template slot babel-newcommands?

Pretty sure, I have tried with a blank tex file where only babel is included.

@jgm
Copy link
Owner

jgm commented Nov 19, 2020

@mb21 did you see the query above? I'm wondering if we should remove some of this code.

@mb21
Copy link
Collaborator

mb21 commented Nov 20, 2020

Thanks for the ping, didn't see it before...

so the reasoning/discussion for:

            -- \textspanish and \textgalician are already used by babel
            -- save them as \oritext... and let babel use that
            if poly `elem` ["spanish", "galician"]

is #895 (comment) (and subsequent comments).
Maybe we don't need to support the babel in TeX Live 2015 and below anymore?

Yes, the otherlangs variable is an array of languages that the document contains on spans and divs – it is not the document language. From the manual:

Use native pandoc Divs and Spans with the lang attribute to switch the language:

@jgm
Copy link
Owner

jgm commented Nov 20, 2020

Maybe we don't need to support the babel in TeX Live 2015 and below anymore?

I don't think so.

@mb21
Copy link
Collaborator

mb21 commented Nov 20, 2020

Ah, I misread the comments I linked to. From #895 (comment)

I’m afraid \textspanish and \textgalician are still in texlive 2015 (just not in the babel manual): in
/usr/local/texlive/2015/texmf-dist/tex/generic/babel-spanish/spanish.ldf and
/usr/local/texlive/2015/texmf-dist/tex/generic/babel-galician/galician.ldf

And the 2017 version I've installed has it as well in:

/usr/local/texlive/2017basic/texmf-dist/tex/generic/babel-spanish/spanish.ldf

And seems even the newest version has them:

So it seems we need to keep that hack for the moment... if we have a problem with those lines.. we could ask the the person who gave me that tip over at https://tex.stackexchange.com/questions/273512/renewcommand-textspanish ?

@lygamac lygamac changed the title Figure and longtable use unneeded spaces, and decimal separator is not rendered in Spanish Babel's shorthand option makes some characters to be skipped Nov 25, 2020
@jgm jgm closed this as completed in e26d31d Nov 25, 2020
@jgm
Copy link
Owner

jgm commented May 29, 2021

Apparently this bug has now been fixed in babel.

After a suitable delay, I'd like to remove the hackish code we currently include in the template to disable shorthands.
So, reopening this to track it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants