Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandoc md to pdf - text of footnote becomes superscript #5878

Closed
cjns1989 opened this issue Nov 5, 2019 · 23 comments
Closed

pandoc md to pdf - text of footnote becomes superscript #5878

cjns1989 opened this issue Nov 5, 2019 · 23 comments

Comments

@cjns1989
Copy link

cjns1989 commented Nov 5, 2019

In a 19th book I am reproducing using md to pdf conversion via pandoc I ran into a rather bizarre issue.

Here is my original code:

Philarète ^[Autre correspondant de Démocrite [à vérifier].], Sergius ^[Alchimiste grec.], d'autres encore, sans compter Sophé l'Égyptien ^[Auteur supposé d’un Livre de vérité attribué à Zozime ou Démocrite.] et l'astrologue Pétosiris ^[Astrologue égyptien qui vécut au iv^e^ s. avant J.-C. La Sphère de Petosiris est destinée à pronostiquer l’issue des maladies en s’appuyant sur certaines combinaisons numériques.]. Sans aucune aide, il avait résolu l'énigme de la Sibylle ; et les deux Hollandais ^[Jean Isaac et Isaac, dits les Hollandais, père et fils ayant apparemment vécu au xv^e^ siècle. Les ouvrages qui portent le nom d’Isaac le Hollandais renferment la description d'un très grand nombre de procédés de chimie, qui, bien que dirigés d’après des vues alchimiques, sont restés dans la science comme la suite des travaux de Geber. Il paraît que Isaac le Hollandais a été un habile fabricant d’émaux et de pierres gemmes artificielles et il a décrit sans arrière pensée ses procédés pour la préparation de ces produits artificiels.],

Now if I compile this via the following command:

pandoc x.md --pdf-engine=xelatex -o x.pdf

I find that pandoc has generated an extra-long superscript with the text of my footnote 'Autre correspondant… etc. for Philarète — instead of the expected superscript '¹' together with the corresponding footnote '1. Autre correspondant de Démocrite…' at the bottom of the page.

After a bit of twiddling, I found that if my md file has every word that has a footnote on a separate line together with the text of the footnote without changing anything else the problem does not occur.

I verified this by extracting the lines that had the problem sticking a carriage return before and after the word+footnote tuple… running pandoc and the problem went away.

I proceeded to reflow the x.md file (in vim I use the gwap command)… saved the file under a different name… and Bam…! I was back with the initial problem.

Here are the two files:

  1. the t0.md 'reflowed' file that demonstrates the problem:

https://drive.google.com/file/d/1MdR7e0r8oEjqhmNha4wLIcIKNmHs4A3E/view?usp=sharing

  1. the less readable t9.md 'workaround' file:

https://drive.google.com/file/d/1zxf86TOEwpUKh9AnOR2sSAlXF1rm_THu/view?usp=sharing

I did this with pandoc 2.2.1-3 which is the version installed on debian 10 (stable) should this matter.

Thanks,

CJ

N.B. Obviously, I didn't reflow a file that worked in order to run into this annoyance (spent half a day finding a workaround :( … the actual scenario is that I was chugging along adding footnotes to this medium book and noticed that while the footnotes worked just about everywhere else, I had this one occurrence where they did not and couldn't figure out my error. So I reduced the size of the file… one step at a time and twiddled it until the problem went away. That's when I noticed that in order to recreate the problem I just had to reflow the contents of t9.md and create t0.md. You obviously do not ned to use vim or have any knowledge of the workings of vim paragraph reflowing to look into this. I'm sure you could do the same with emacs (pardon my French…). I did take a look at the intermediate LaTex file and noticed some odd coloring with the vim syntax highlighting but couldn't make much sense of it.

All the same I thought it wouldn't hurt to report this rather minor issue.

@jgm
Copy link
Owner

jgm commented Nov 5, 2019

I can't reproduce this. With the input you provide, current pandoc (master) works fine ; there is no extra-long superscript. Perhaps this issue is already fixed.

@cjns1989
Copy link
Author

cjns1989 commented Nov 6, 2019

Good news! I'll give the current pandoc (master) a shot tomorrow, hopefully I'll be able to install it on debian stable. Thanks.

@jgm
Copy link
Owner

jgm commented Nov 6, 2019

Try a release candidate (see Actions tab), go to Release Candidate, then Artifacts.

@jgm jgm closed this as completed Nov 6, 2019
@cjns1989
Copy link
Author

cjns1989 commented Nov 6, 2019

In the meantime I gave it another shot with the 2.5 version of ubuntu that ships with ubuntu 1910 and I still had the problem.

The reason being that I had other business pending on that system… but also that I have no clue what I am supposed to do after I go to Artifacts.

Should I click on 'fork' and hopefully create a mirrored copy of the release canditate on my system and install?

I don't use github to speak of… I don't know how to use it aside from downloading the current zip files and downloading what's there once in a while.

Sorry to bother you with this.

Thanks,

@cjns1989
Copy link
Author

cjns1989 commented Nov 7, 2019

Update. I installed the latest 2.7.3 release from the tarball and I am still getting the same error with the t0.md file.

@jgm
Copy link
Owner

jgm commented Nov 7, 2019

go here:
https://github.com/jgm/pandoc/commit/8f3b3afc70d163afe5e103abec77a4ffafb995bd/checks?check_suite_id=295084985
Click Artifacts then download the one appropriate for your OS and install it.

@cjns1989
Copy link
Author

cjns1989 commented Nov 7, 2019

The problem is fixed with candidate release 2.8.

Thanks.

@cjns1989
Copy link
Author

cjns1989 commented Nov 8, 2019

Hmm… I take that back. I recompiled the entire document with pandoc 2.8 and I am still getting the same error, something that didn't happen with my MWE.

So I started over and eventually came up with another MWE of just one line:

Philarète ^[Autre correspondant de Démocrite.], Sergius ^[Alchimiste grec.],

Twiddling it a bit I found something that may be of interest.

In order to ensure that the superscripts '¹' '²' etc. do not end up on the next line, I had gotten into the habit of typing a non-breaking thinspace between e.g. Philarète, Sergius and the ensuing '^[' that starts the footnote.

I thought… could this have any bearing on our case…? So I replaced them with plain 0x20 spaces… ran pandoc and the problem went away…

I also tried replacing them with 0xa0 regular width non-breakable spaces and the problem came back. Then I tried using the   html entity and ran into the same issue.

So this would turn out to be an error on my part: I am using the 'inline notes' syntax… 'word_with_note ^[inline note]' there is nothing in the pandoc manual that says fancy spaces are allowed between the word_with_note and the '^['.

Just curious… but did you try to recreate with the actual file I linked or did you copy/paste the MWE I had at the beginnning of the issue… because that might have caused my 0x202f thin spaces to be translated to regular spaces…?

Now why would this happen with this particular passage and not with the other 100+ footnote I have elsewhere in the book…? I have not idea… they all look the same to me…

Thanks,

CJ

P.S. From what I have read French typesetters of the old school appear to loved using thin spaces ­— they are compulsory between text and all punctuation comprising two elements such as ';', ':', '!', '?'… On the other hand, punctuation such as commas or periods (one element) should immediately follow the last letter of the preceding word with no intervening space. I don't know about superscripts referring to footnotes but I suppose I'll just have to change all my thin spaces to regular spaces and forget about the whole thing… and cross fingers and hope latex will make sure that there are no hanging footnote superscripts left hanging all by themselves on the next line.

@jgm
Copy link
Owner

jgm commented Nov 8, 2019

It's an interaction with the superscript syntax (which you could simply turn off, -f markdown-superscript.

 ^[Autre correspondant de Démocrite.], Sergius^

is a valid superscript! Note that superscripts can't contain spaces (at the top level), to avoid false positives like this, but you've defeated this safety measure by replacing the space with a nonbreaking space (which is actually what we recommend using when you actually do want a space in a superscirpt).

@cjns1989
Copy link
Author

cjns1989 commented Nov 8, 2019

Thanks for the update… and I see your point with 'valid superscripts'…

I followed your recommendation and saw that if I compile my original MWE specifying -f markdown-superscript the problem goes away.

As an aside… I do not see this option/format documented either in the candidate release man page or in the online pandoc user guide?

CJ

@cjns1989
Copy link
Author

cjns1989 commented Nov 8, 2019

I'll have to remember that pandoc extensions may provide flexibility/solutions for certain types of of problems.

Here's my last shot of this book for small/medium size screens for you to take a peek at:

https://drive.google.com/file/d/1Rl1GsqHcm9XXzrDo21-7d_Rlb_ZHwdAw/view?usp=sharing

All in all the final product is not worse than the scanned pdf found online that I worked with.

Thanks,

CJ

@cjns1989
Copy link
Author

To document this further this is another snippet that recreates the problem:

Zozime^[], Synésius^[], Isidore^[], Philarète^[], Sergius^[],

@cjns1989
Copy link
Author

… if you add a carriage return between 'Philarète^[],' and 'Sergius^[], to place Sergius… on a separate line… the problem occurs…

… if you remove the carriage return and everything lives on the same line it does not.

The intermediate latex file contains:

Zozime\footnote{}, Synésius\footnote{}, Isidore\footnote{}, Philarète\textsuperscript{{[}{]}, Sergius}{[}{]},

So this looks like under certain circumstances the inline footnotes markup '^[' clashes with the '^…^' superscript markup.

This was created with the 2.8 release candidate.

@jgm jgm reopened this Nov 11, 2019
@jgm
Copy link
Owner

jgm commented Nov 11, 2019

ok, that's helpful. A newline should defeat the superscript just as a space does, so something should be changed.

@cjns1989
Copy link
Author

FYI… same behavior … unsurprisingly, I guess… in md to epub conversions.

I realize that one major quality of markdown is the readibility of the source due to the terse syntax. In other words you can read md files in an editor or in notepad? as if they were meant to be read in their unformatted form.

But with md files where the text is interpersed with numerous sometimes lenghthy inline footnotes the terseness of the '^[…]' actually makes the source less readable: out of the 118 such notes in this particular text I made on mistake… this resulted in epubcheck failing with a 'duplicate footnote id=' or such. Well… I spent about half an hour finding the error.

I would be tempted to suggest creating an alternative more readable inline footnote syntax that does not use the superscript '^' markup character… something that you can easily spot at a glance and that might also be easier to handle by syntax highlighting files (couldn't find one for vim & pandoc markup btw) … such as perhaps '[[[ … ]]]' or such like… making the current '^[…]' format legacy and keeping it alive for backward compatibility purposes.

But then Joe User here does not have a magic wand to help you with that. :)

@jgm
Copy link
Owner

jgm commented Nov 12, 2019

Note: I've fixed the issue in master.

@cjns1989
Copy link
Author

Hmm… I take that back… I did find a pandoc.vim syntax file on github after all.

https://github.com/vim-pandoc/vim-pandoc-syntax

@cjns1989
Copy link
Author

The pandoc.vim syntax file does a great of job of highlighting footnotes.

As to the fix on master… how do I install & test it?

@jgm
Copy link
Owner

jgm commented Nov 12, 2019

It will be in the next release, but you can also try a nightly from the Actions tab.

@cjns1989
Copy link
Author

Fixed indeed in nightly!

Not sure how I can tidy up my system now… I have the 'release candidate' in ~/local/share/bin/pandoc… today's 'nightly' in ~/bin/pandoc and the original debian pandoc 2.2 version in /usr/bin/pandoc. Does the .deb in official releases provide a statically linked standalone version of pandoc… and if so would I be able to just do a 'dpkg -i pandoc.deb' soa as to replace the debian stable 2.2 version by version 2.8 without running into dependency problems? Would that work and if so when is the next release due out?

Thanks,

CJ

@jgm
Copy link
Owner

jgm commented Nov 14, 2019

Yes, both the rc and the official release debs should have a statically linked pandoc, so this should work.

@cjns1989
Copy link
Author

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants