-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ODT and figure/table numbering #5474
Comments
This was done in #4944 -- see that issue for motivation. cc'ing @pyssling I agree that there's an issue about consistency; this could be handled, of course, by adding the translated The bigger issue is the bad interaction with pandoc-crossref, which I unfortunately hadn't considered in merging that PR. Although ultimately I'd like to have good support for cross-references in pandoc itself, this addition doesn't give you enough for cross-refs in ODT, so pandoc-crossref is still needed and can't be used because of this change. I agree that this is a major issue. Some possible options:
I think (2) is more consistent with existing behavior (e.g. |
IIRC, syntax extensions almost exclusively relate to Markdown syntax, so unless you want to somehow rework those to be reader/writer-dependent, I don't think using syntax extensions makes that much sense. That said, after a quick look at the code base, it seems that syntax extensions are slowly creeping into other readers/writers, so this might be a good idea anyway in the long run -- not entirely sure about that. (2) is certainly more straightforward. |
To answer the First question: Hope that clears up what I was trying to accomplish. I wouldn't mind having options to control these, but it's good to be aware that not enabling them breaks Libreoffice expectations, which should be documented for poor users (of which I was once one) who can't figure out why Libreoffice doesn't include their tables and figures in its indexes. I also had a quick look at pandoc-crossref, and it's worth mentioning that it is very much latex aware. Making it ODT aware might not be that bad? |
A comment on this issue: it would be helpful if adding "Figure X" was optional for ODT, as with the (2) suggestion above. In particular, org-mode already includes "Figure X" when exporting to ODT, so I end up with "Figure 1: Figure 1: [caption]", and I can't currently see a way to get pandoc or orgmode to not do this, other than downgrade pandoc. |
Maybe better to have two options, e.g. |
IMO, that's not really a reasonable expectation that whatever pandoc writes into any output format would play all that nice with that format's internal references (which list of figures/tables basically is). I mean, I don't mind hacking those in, but I really don't think that's pandoc's "primary mission" so to speak. I will concede that LaTeX is a bit of a special case, sure. But LaTeX is 100% hackable via custom pandoc templates if needed. This is not the case with ODT, so my first expectation is to get the least-embellished human-readable output possible while keeping semantic information. At least by default.
Actually, I've been trying to get away from that. Turns out, reimplementing non-negligible parts of pandoc's LaTeX writer in a filter is not a sustainable strategy in the long run. Basically currently pandoc-crossref is two filters disguised as one; "LaTeX mode" can't do everything "normal mode" can, but in some particular cases it, conversely, works better than "normal mode". It's all rather painful in practice, one basically has to prepare two different documents, one for LaTeX and another for everything else. The next major release won't have much of that mess left.
For one, see above: hacking direct format support into a filter equals reimplementing writers. Not really sustainable. For two, while I'm reasonably proficient in LaTeX to hack-in some raw LaTeX blocks where needed, I can't say I could do the same for XML-based formats like ODT or docx without spending weeks untangling their specs. Furthermore, IIRC, counters and especially references in those formats (well, certainly docx) are considerably more complicated than could be achieved with simply inserting raw blocks (which is the limit of what a filter can do). So, no, unless someone else is willing to write and maintain all the ODT-specific code in pandoc-crossref, this is not going to happen, even if technically feasible, which I'm not sure it is. |
As a reminder, we also have code blocks. Which one could also reasonably want to be numbered. So this potentially turns into at least three separate options already. And if we also at some point decide to treat display math in a similar way, this all gets a bit overwhelming, don't you think? An idea that comes to mind is to have the ability to specify which counters to enable in writer as arguments to the option, instead of having multiple options. For example (hope someone thinks of a better name): P.S. Side note, I would like to see these options also affect LaTeX/PDF output using the default template. FWIW, implementation should be rather straightforward: set some template variables and add a couple lines boiling down to |
In my not so humble opinion I'm asking myself: If pandocs mission is NOT to play nice with a formats internal references, then what is pandocs mission? @jgm , any thoughts? I sort of naively assumed we were trying to do the best we can for any given output format.
ODT is very hackable via custom templates, and it would be quite simple to make it even more so. You can almost overwrite any style using these. With a little more work and post-processing you can actually achieve extremely solid professional output, such as technical manuals.
I understand that. I haven't looked at pandoc-crossref before. Our toolchain is based on asciidoc, which is transformed into Docbook using asciidoctor, which pandoc can then nicely turn into ODT with an advanced stylesheet. I then call libreoffice in batch mode to update table of contents and other indexes which were inserted using the template. I don't mind making figure and table numbering optional, we anyway call pandoc through a wrapper. However, I don't think it makes sense to disable them by default if pandoc's mission is to provide as useful output as possible to end-users who aren't using additional tooling. Finally, for me it sounds like pandoc-crossref should work on integrating itself into pandoc if it is painful to duplicate output code for latex for example. But that's just an uninformed opinion. :-) |
This would be wildly inconsistent across different output formats, sadly. Doesn't necessarily mean that output has to be brought to the lowest common denominator, but getting overly fancy isn't really something I'd do lightly too, at least not by default. In my opinion, pandoc is at its greatest when you need to write a thing and then turn it into multiple output formats. Having different output formats behave at least somewhat consistently helps with this use-case a lot. For a single output format, I'd personally rather just write the thing in that format to begin with (except perhaps docx, because as a Linux user, I hate Word with a passion).
Yet there's no way to add/remove those fancy "Figure
Here are a few reasons besides consistency with most other output formats: It would be extremely inconvenient for pandoc-crossref users targeting ODT if it was enabled by default. @scottmartincampbell above makes another case where having it enabled by default leads to at least surprising and at worst annoying and confusing results. Another reason to disable those by default is that re-reading ODT produced by the current writer yields less-than-stellar results. Namely, "Figure N" gets read verbatim, which, if nothing else, is rather surprising behaviour when, e.g., you need to round-trip a document to ODT and back for whatever reason. And I'm not sure a simple fix exists here, since "Figure N" is just good ol' text with little semantic markup in ODT. That said, ODT reader is frankly anaemic, so it's not like this is the only problem it has, but still.
Granted. But that in itself would be extremely painful. Pandoc-crossref makes quite a few compromises to make things work, which wouldn't really be acceptable in pandoc itself. And doing this properly is, well, for one, a lot of work, for which few of us have nearly enough time, and for two, we're kinda stuck on the design phase, see discussion in #813 for insight. Honestly, the reason pandoc-crossref exists in the first place is I got tired of looking at the discussion in #813 going nowhere. |
I'm not sure I follow. Do you mean that it would be wildly inconsistent to do the best we can, or the output would be wildly inconsistent, or something else?
Unfortunately no. I agree that they should definitely be optional, but I'm not sure that default off is what you want for the sake of naive users, just because of the side-effects in Libreoffice of not including them. That's really just a judgement call though.
Why? Surely if you're already starting to do advanced filtering, then adding a few options isn't going to do much harm. But ok, it's more options.
Regarding org-mode: @scottmartincampbell does org-mode add proper XML tags for the numbers in the "Figure X" or does it just add verbatim "Figure X" to the output. Maybe org-mode could be updated for newer versions of pandoc? I actually find this a pretty good argument for having this enabled by default as other tools are apparently working around deficiencies in old versions of the ODT writer.
This isn't actually true, the "Figure N" is actually not just good old text but contains a number of XML tags, specifically: Making the ODT reader understand this shouldn't be that hard and is anyway needed to meaningfully process existing ODT's containing captions created by Libreoffice as these "Figure N" (unless the author intentionally deletes them) will always be present when adding captions.
Sorry to hear that, but I still don't find that as a compelling reason to disable features in ODT writer by default. It seems to me that this would be holding back even more development in pandoc due to a to an tangentially related issue. I.e. What you're saying is that we can't integrate pandoc-crossref into pandoc because of unconcluded discussions and therefore we should disable by default features that interfere with a non-integrated pandoc-crossref. Is that about right? |
Unless by "best we can" you mean "the most faithful reproduction of the semantic meaning of input in the output", the "look and feel" of output would be wildly inconsistent between different formats. And it doesn't seem to be what you meant, because, strictly speaking, figure/table numbering is not in the input, and neither are lists of objects.
Again, inconsistency. It's counter-intuitive that one would have to add specific command line options to pandoc invocation for one specific output format. Also, bear in mind that only real requirement for using pandoc-crossref is to invoke pandoc with
Well, apart from the number, which is an ODF counter, I mean. "Figure " part is just good ol' plain text.
Using brittle heuristics and ignoring i18n woes, sure, probably not that hard. But doing it well doesn't seem trivial, unless I'm missing something obvious (which I might be).
Never said anything to the effect you're describing. Just explained why integrating pandoc-crossref into pandoc is not something that is likely to happen soon (or at all, it's an open question whether added maintenance cost is entirely justified in this particular case) TL;DR of my position is as follows Having counters inserted in ODT but not in docx or HTML or whatever is not a great user experience and breaks compatibility with certain tools and certain workflows. Inserting counters in all output formats that remotely support something like this is, for one, unimplemented, and for two, a major behaviour change which should never be enabled by default, if one cares about compatibility at all, unless doing a major breaking release (and even then I'd think twice about that). Also, this is personal preference, but I would strongly prefer pandoc to not modify input (like add things) unless I explicitly asked it to. So, to me, hiding this feature behind an option makes a lot more sense, at least until/unless other writers catch up. Don't get me wrong, I can kinda see your point. Lists-of-stuffs in LibreOffice are tied into counters (which LibreOffice creates for figures by default), so no counters means no list of figures/etc. But I sincerely doubt all that many users either need or expect to have, say, list-of-figures working in pandoc-generated ODT. And if they actually do need that, chances are they also need a similar thing in some other output format, which doesn't support that natively (e.g. HTML) -- at which point they'd want to use pandoc-crossref or equivalent, which wouldn't work properly with ODT by default, so additional hoop-jumping would be required. So, for a small subset of users who need list-of-figures/etc in ODT working out of the box, and who don't need that in any other output format (except perhaps LaTeX) -- for those users having this feature enabled by default makes sense. For everyone else, it would be at best inconsequential (if they don't care about ODT) and at worst annoyingly confusing. Hopefully this makes my point clear. P.S. We really need to wind down on this discussion somewhat. Anyone who reads this later probably won't be happy about these walls of text. |
Here's my short view: a text-processing tool like pandoc should not modify content unnecessarily by default, or make assumptions about intent of the writer. Adding "Figure N" to the caption does that. An option that exists to better manage structural features like an index or table of contents is worthwhile, but not a good default if there are visible changes to the content. To answer the above question: I believe org-mode just adds "Figure N" to the visible text, without any tagging. I don't like this either, but there are ways to alter the output (see https://orgmode.org/manual/Labels-and-captions-in-ODT-export.html#Labels-and-captions-in-ODT-export). |
We already had inconsistency before this: LaTeX/PDF adds "Figure 1"; other formats don't. That's why I thought this change as a reasonable one (and ditto for similar, not yet implemented changes to other output formats like docx). It hadn't been done up to now because of the localization issue, but that was solved by the Translations API. That said, I think that the incompatibility with pandoc-crossref is a serious issue. pandoc-crossref is widely used, and since pandoc doesn't provide adequate cross-reference capacities, we should try not to break the pandoc-crossref workflow. Ultimately it would be desirable to build cross-referencing and counters into pandoc itself (see #813), but this is a big issue, and there are aspects to the design of pandoc-crossref that I've been hesitant to bring into pandoc (specifically the use of English word-fragments to mark different counters). I think we need to consider that a long-term improvement (and give it more emphasis), while considering here what to do in the immediate future. |
Another note on this: in LaTeX, the numbering is really essential because of the way figures and tables "float." In formats where the figures/tables are guaranteed to appear in a particular place in the text, it's not quite as important. However, when you're targeting multiple output formats, this lack of uniformity is a problem. For LaTeX/PDF output, you can't just say "as the following figure shows..." because the figure might appear somewhere else in the text. You need "as Figure 1 shows..." But with unaltered pandoc you can't achieve that in output formats other than LaTeX/PDF (for which you could use raw tex). The only current way to solve this problem is by using pandoc-crossref. |
Well, I really don't know one way or the other. My preference is for ODT documents looking as if they were created by Libreoffice. I.e. that my view of the best we can do. This also plays nicely with other Libreoffice functions like the index. I don't really see the point of trying to produce "generic" documents where formatting normally present in these documents is omitted, i.e. HTML and ODT won't look the same, no matter what we do, and neither will LaTeX/PDF. So why not go for what the user would get had they written the document in Libreoffice? That being said, I understand it's quite inconvenient for a lot of people, especially those that have written tools that assume the output is a stable interface. |
Here's a practical suggestion.
Figures will look like
or
The opendocument/odt writer will get to the AST after pandoc-crossref is done with it. [EDIT: I admit, it seems a bit hackish to make pandoc's behavior depend on a third-party filter in this way. And someone might use a [EDIT: replaced 'pandoc-citeproc' with 'pandoc-crossref'] |
Another idea along similar lines: we could set a variable |
This was added in pandoc 2.7.2, but it makes it impossible to use pandoc-crossref. So this has been rolled back for now, until we find a good solution to make this behavior optional (or a creative way to let pandoc-crossref and this feature to coexist). See #5474.
For now I'm just going to roll this back; I think it's important to be able to use pandoc-crossref, since there's no other way currently to get cross-referencing. But leaving this issue open, so we can think about the best way to add this feature as an option. I've left all the needed code in place (behind |
@jgm I'd like to revisit this issue now as I need this functionality. For now we're sticking with an older version of pandoc which doesn't disable figure numbering. What do you think is the best way forward? Should we add a commandline switch to enable figure/table/other enumeration? Or one to disable it? Or some specific detection of pandoc-crossref? I'm afraid the current state is a bit sad as it makes it impossible to generate table of figures and table of tables in libreoffice, which are really needed for technical publishing. |
I'll put it on the 2.8 milestone just so I think about it again, but no guarantees for this release; we might take it off again. |
I think it's a good candidate for handling with extensions. Probably need to add an extension specifically for this. The end result will look like My vote is for "disabled by default" for the sake of backwards-compatibility (rule of the least surprise -- in general, things should behave differently only when the user does something differently, unless we're talking really major updates). |
|
Ok, that is, a command line switch extension of the "to" command line option. I see that some extensions are already present on other formats. I'll try to follow one of those examples. +native_numbering sounds good. That's pretty much what this is. |
Ok, I created a pull request: #5765 . Hope that's what you intended more or less. |
ODT now apparently forcibly adds "Figure " and "Table " to figures and tables. While cool and all, this is an issue for me. For one, there seems to be no way to disable this behaviour (or am I missing something?). For two, as far as I can tell, no other format except LaTeX does this, which seems to go against the "write once export everywhere" idea. For three, pandoc-crossref handles numbering and cross-referencing arguably better, but having this thingamajig in ODT breaks everything (because pandoc-crossref can't know about what ODT writer does, and vice versa)
So, I have a couple questions to ask.
First: Why is it there in the first place, considering it's virtually nowhere else. Frankly feels tacked-on and weirdly inconsistent with the rest of pandoc.
Second: Do you suppose a way to explicitly enable/disable the feature would be in order?
Third: Do you think this option should be disabled or enabled by default? I strongly lean to "disabled by default" for the sake of consistency with docx/html/etc.
Thanks.
P.S. I created a thread on pandoc-discuss a few days ago, but not much's going on there.
P.P.S. Side note, I've run into another weird inconsistency with ODT writer wrt figures in table cells -- there just aren't any. If a table cell contains only a
Para
, it's unconditionally turned into a paragraph, even if should by all accounts be a figure. Should I create another issue/pr for that?The text was updated successfully, but these errors were encountered: