Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenDocument writer: Allow references for internal links #6774

Merged

Conversation

pyssling
Copy link
Contributor

This commit adds two extensions to the OpenDocument writer,
references_over_links and number_prefix_references.

The first extension, references_over_links, substitutes document
internal links for references to headers, figures and tables.
Text in references is kept consistent with the referenced header,
table or figure which is an improvement if the document is edited
after being generated by pandoc.

The second extension number_prefix_references will prefix the
header references with the number according to the style of the
referenced heading in the final document. As noted in the MANUAL.txt
the document will need to have indexes updated for these numbers
to be generated - similarly to table of contents in OpenDocument.
Figure and table references are not number prefixed as the numbers
for those are inline in the caption.

@tarleb
Copy link
Collaborator

tarleb commented Nov 3, 2020

Thank you for opening this PR, that seems to be a useful pair of features.

In order to make this a good experience for everyone, I'd like to learn a bit about your motivation and expectation for this PR: Which statements fits best?

  1. "This PR doubles as a welcome opportunity to learn more about Haskell. A detailed code review would be welcome."
  2. "The feature is the only thing that matters. Honestly don't care too much about the details, it would be great to get this merged ASAP."

Both options work for us and will cost us similar amounts of time, so feel free to indicate your choice.

@pyssling
Copy link
Contributor Author

pyssling commented Nov 3, 2020

Hi Albert,

I am in situation 1. with regards to the code. This code is already in use - but I am very open to improving it. My total Haskell experience (and knowledge) constitutes one course at university 19 years ago - and now a number of patches to pandoc. I'm sure the code is very crude from an experienced Haskell programmers perspective.

Thank you for your time.

@tarleb
Copy link
Collaborator

tarleb commented Nov 3, 2020

Great, thanks for the feedback! I'll do an in-depth review within the next few days.

Copy link
Collaborator

@tarleb tarleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a joke in the Haskell community that there are no advanced Haskellers, as everybody is intermediate forever. Hope my comments are helpful regardless.

@pyssling pyssling force-pushed the opendocument-writer-references-for-internal-links branch 4 times, most recently from 7e15ce2 to 8697ccd Compare November 7, 2020 16:31
Copy link
Collaborator

@tarleb tarleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pyssling pyssling force-pushed the opendocument-writer-references-for-internal-links branch from 8697ccd to 5f8f16c Compare November 7, 2020 20:22
@pyssling pyssling force-pushed the opendocument-writer-references-for-internal-links branch 2 times, most recently from a76e73e to 287784c Compare November 21, 2020 23:07
@pyssling
Copy link
Contributor Author

@jgm any chance you could give this one a review as well?

@jgm
Copy link
Owner

jgm commented Nov 22, 2020

I'm not sure I understand what these extensions do. Can you explain a bit more, maybe with an example?

@pyssling
Copy link
Contributor Author

Well, the first option changes from links to references - in OpenDocument a reference is something that updates with the target text, so a heading of "Introduction" will have a reference called "Introduction", and if a later editor of the generated document changes "Introduction" to "Background" then the reference will also change to "Background" which the link will not.

The second option number prefixes references to headings. So instead of in the generated text having "(see Introduction)" we would have "(see 1. Introduction)" This is especially useful for printed documents where there are no links - chapter numbers become really useful.

I've tried to describe this quite thoroughly in the manual changes.

@pyssling pyssling force-pushed the opendocument-writer-references-for-internal-links branch from 287784c to 5d98582 Compare November 23, 2020 09:28
@pyssling
Copy link
Contributor Author

@jgm I'm wondering if you've had time to consider this and whether you would like me to look at alternative approaches.

The fact is I'm not the happiest with this approach. In the original format in my case the "References" identified here are of the type "xref" which do not have a "link text". Because the Link type in pandoc expects a link text the Docbook reader constructs one, and essentially what I'm doing the OpenDocument writer is removing the Link text from identified references and turning them back into references.

This could probably be done better by creating an Xref or InternalReference or similar inline type in the pandoc-types repo and extending the AST in that way. This is a bit more work - but I suspect there are many other formats that would make better use of xrefs rather than links.

Let me know how you wish to proceed.

@jgm
Copy link
Owner

jgm commented Nov 26, 2020

So with no extension a link looks like

<text:a xlink:type="simple" xlink:href="#chapter1" office:name=""><text:span text:style-name="Definition">The
Chapter</text:span></text:a>

With references_over_links it looks like

<text:bookmark-ref text:reference-format="text" text:ref-name="chapter1">The
Chapter</text:bookmark-ref>

And with references_over_links+number_prefix_references

<text:bookmark-ref text:reference-format="number" text:ref-name="chapter1"></text:bookmark-ref><text:s /><text:bookmark-ref text:reference-format="text" text:ref-name="chapter1">The
Chapter</text:bookmark-ref>

Correct?

@pyssling
Copy link
Contributor Author

Yes - the interesting thing to keep in mind is that the contents of:

<text:a xlink:type="simple" xlink:href="#chapter1" office:name=""></text:a>

is dynamic - i.e. if it header for #chapter1 changes - the text in the reference is updated. People expect this behavior - but with Links - this isn't possible.

@jgm
Copy link
Owner

jgm commented Nov 26, 2020

So, do I understand correctly that the text provided, "The Chapter" , is just ignored in the second two cases?

@pyssling
Copy link
Contributor Author

It's a placeholder until the document is refreshed, visible but temporary. We could also skip it - and just have this empty until the document is refreshed.

@jgm
Copy link
Owner

jgm commented Nov 26, 2020

So, this means that if someone specifies a link in Markdown like

[My stuff](#mystuff)

then these extensions will ignore the description they write and substitute the actual name of the linked resource?

@pyssling
Copy link
Contributor Author

pyssling commented Nov 26, 2020

Yes - with a markdown document like this:

# Chapter My Stuffing {#mystuff}

[My stuff](#mystuff)

We end up with this output after running pandoc with -t odt+references_over_links:
image

In libreoffice we can now go the Tools menu, select Update and then Update All.

We then get this:

image

If we now update the chapter title - without doing any updates or anything we get:

image

This is the behavior we want from our documents - that cross-references in the final document update when the document is edited. Hope this makes sense.

@jgm
Copy link
Owner

jgm commented Nov 27, 2020

I think this could be made clearer in the documentation. People who understand the odt/opendocument format well will understand what you wrote, but many users won't. Saying explicitly that the link text in links to internal targets will be replaced by the text from the linked section heading (or image or?) would be helpful.

@pyssling pyssling force-pushed the opendocument-writer-references-for-internal-links branch from 5d98582 to de2b2e3 Compare November 27, 2020 08:38
@pyssling
Copy link
Contributor Author

Ok, done. Hope this is clear now.

MANUAL.txt Outdated
Comment on lines 3030 to 3066
#### Extension: `references_over_links` ####

Links to headings, figures and tables inside the document are
substituted with (cross-)references. References use the text
of the referenced document element such as the heading text,
figure or table caption. The original link text is replaced once
the generated document is refreshed.

This extension can be enabled/disabled for the following formats:

output formats
: `odt`, `opendocument`

#### Extension: `number_prefix_references` ####

Add a number prefix to references within the document that lack one.
For example a first heading of "Introduction" that is number prefixed
in the final document as "1 Introduction" will in references also be
number prefixed as "1 Introduction".

This typically requires the final document to have indexes refreshed
in a native editor such as libreoffice as pandoc is not aware of the
number style of the document.

This further assumes that references are used within the document
instead of links - see the `references_over_links` extension.

output formats
: `odt`, `opendocument`

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that a reader might think "references" refers to markdown link references or something like that. Lots of formats have things called "references."

Would using the term "crossreferences" or something like "xrefs" be clearer? Maybe the extension should be crossrefs and numbered_crossrefs? (The latter could automatically enable use of cross-references, so people wouldn't have to specify both.)

Do you think these would apply to other formats besides opendocument/docx?

It may be that these names are the best, but I think we should hash out alternatives first, because this is an API change and hard to reverse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about xrefs_for_internal_links and numbered_xrefs ? I think that should be very clear - but my perception is slanted.

This should be applicable to OpenDocument, Docx most probably, and DocBook output formats. At least.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like that better.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that occurs to me, though, is that with xrefs_for_internal_links one might expect that all internal links would become xrefs. xrefs by itself doesn't imply this and may for that reason be better (also shorter). I think you'd probably need to look in the manual to figure out what it does, but that is probably true regardless of the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - xrefs and numbered_xrefs it is. And numbered_xrefs enables xrefs. Thanks for your patience.

@pyssling pyssling force-pushed the opendocument-writer-references-for-internal-links branch from de2b2e3 to cbfa73e Compare November 28, 2020 20:48
@jgm
Copy link
Owner

jgm commented Nov 29, 2020

Just a thought: why not have the xrefs_number extension just add the number, and xrefs just add the name? Then the user could choose to have both together, or just one. The reason I mention this is that I can think of a lot of contexts where I'd like to refer back to "Section 3.3" without having the whole name of the section in the link. In fact, this is the normal case in most academic writing: this is how LaTeX's \ref works, for examle. With this revision, I'd suggest changing the extensions to something like xrefs_name and xrefs_number to make it a bit clearer what they do.

@pyssling
Copy link
Contributor Author

I guess it becomes very ODT specific then - and also a bit inconsistent.

The xrefs extension will always include name and number for figures and tables that have one. The numbered_xrefs extension only prefixes a number to the heading if heading numbers are enabled in the generated document.

This isn't quite the pick-and-choose scenario I'm afraid.
Another option would be to have 3 extensions:

xrefs - enables xrefs instead of links where supported - ODT, docx, DocBook
numbered_xrefs - ODT where there are heading numbers (and possibly Docx)
number_only_xrefs - ODT where there are heading, table and figure numbers (and possibly Docx)

This is quite hairy I'm afraid - not at all a clear cut case - ODT was not designed with this use-case in mind.

@jgm
Copy link
Owner

jgm commented Nov 30, 2020

Hm. What does the xrefs extension now return for the "name and number" for figures and tables? Is it "Figure 1" or is it "1 The caption of the figure" or what?

If the extension returns "Figure 1" for figures and the text of the section heading (e.g. "Introduction") for headings, that seems quite inconsistent. I would have expected it to work in the same way in both cases.

@pyssling pyssling force-pushed the opendocument-writer-references-for-internal-links branch from cbfa73e to f8d4584 Compare November 30, 2020 18:01
@pyssling
Copy link
Contributor Author

You were right to push on this - I've now done quite a bit of work to make this make more sense and managed to find the right XML syntax to accomplish the xrefs_name and xrefs_number extensions.

Now xrefs_name will use only the caption text for figures and tables, ignoring the Figure X: part of the caption. xrefs_number only uses the number. Enabling both will place the number before the name with a space between.

Quite a long journey - but it looks much better.

This commit adds two extensions to the OpenDocument writer,
`xrefs_name` and `xrefs_number`.

Links to headings, figures and tables inside the document are
substituted with cross-references that will use the name or caption
of the referenced item for `xrefs_name` or the number for `xrefs_number`.

For the `xrefs_number` to be useful heading numbers must be enabled
in the generated document and table and figure captions must be enabled
using for example the `native_numbering` extension.

In order for numbers and reference text to be updated the generated
document must be refreshed.
@pyssling pyssling force-pushed the opendocument-writer-references-for-internal-links branch from f8d4584 to 69f95ed Compare November 30, 2020 20:12
@pyssling
Copy link
Contributor Author

pyssling commented Dec 5, 2020

@jgm Is this good to go? Let me know if there's anything more we need.

@jgm
Copy link
Owner

jgm commented Dec 5, 2020

Looks good!

@jgm jgm merged commit c161893 into jgm:master Dec 5, 2020
@pyssling
Copy link
Contributor Author

pyssling commented Dec 5, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants