Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Heading Handling Overrides Implicit Link by Heading Title #8300

Closed
ipetraka opened this issue Sep 13, 2022 · 3 comments
Closed

Duplicate Heading Handling Overrides Implicit Link by Heading Title #8300

ipetraka opened this issue Sep 13, 2022 · 3 comments
Labels

Comments

@ipetraka
Copy link

When duplicate headings are found in the document, Pandoc will serialise heading IDs to avoid duplicate IDs in the output document. This by and large works well, but seems to me there is a blind spot when the user takes the time to avoid duplications themselves, via explicit heading IDs. It seems that Pandoc's duplicate detection essentially ignores that condition for the purposes of implicit cross-references, if the first of these duplicate headings is left alone:

# Test Links

* Link to first [Heading].
* Link to first [Heading](#heading), again.
* Link to second [Heading](#ref1).

## Heading

Duplicate one.

## Heading {#ref1}

Duplicate two.

## Heading {#ref2}

Duplicate three.

The result for this, using basic pandoc -o test.html input.md is:

<h1 id="test-links">Test Links</h1>
<ul>
<li>Link to first <a href="#ref2">Heading</a>.</li>
<li>Link to first <a href="#heading">Heading</a>, again.</li>
<li>Link to second <a href="#ref1">Heading</a>.</li>
</ul>
<h2 id="heading">Heading</h2>
<p>Duplicate one.</p>
<h2 id="ref1">Heading</h2>
<p>Duplicate two.</p>
<h2 id="ref2">Heading</h2>
<p>Duplicate three.</p>

Technically, there is no duplicate for #heading because we are supplying our own IDs to all of the rest. There should not be a conceptual reason for [Heading] to point anywhere other than #heading, which does exist and is not duplicated anywhere else. But these implicit links are instead attaching to whatever explicit ID was assigned last to a visually duplicated heading (as opposed to an internal auto-generated ID clash). To me that doesn't make conceptual sense, as it seems the cross-ref code is ignoring the actual heading topology of the document. This feels like a fallback condition that might occur if there is no actual #heading anchor in the document. Here is what I would expect to see on that line, in excerpt:

<li>Link to first <a href="#heading">Heading</a>.</li>

Whether this is intentional or not, it would be good to know, as our software currently handles duplicate headings for the user under certain conditions, and it does so using a method similar to the above. This used to work, but no longer does.

Since we are able to tell precisely which duplicate heading our users pointed the link to, this was an ideal solution for us (and when we put it together, I don't think Pandoc serialised, resulting in duplicate IDs if you didn't explicitely declare them, but I might be misremembering).

We could easily switch to explicitely IDing every heading that is a duplicate in a global (non-top-down) sense, but I wanted to file this report first since the current behaviour doesn't feel right. And if it is a bug in Pandoc that gets fixed, we can leave things as they are on our end. For the same reason top-down reckoning is desirable in Pandoc, it is for us as well as we don't have to go back and fix a prior heading when encountering a duplicate in the heading tree.

Pandoc version?
Linux / pandoc 2.19.2

@ipetraka ipetraka added the bug label Sep 13, 2022
@jgm
Copy link
Owner

jgm commented Sep 14, 2022

The manual says:

If there are multiple headings with identical text, the corresponding
reference will link to the first one only, and you will need to use explicit
links to link to the others, as described above.

That isn't the same rule you're describing, but it looks as if pandoc doesn't follow this rule either (it links to the last one rather than the first). So that's a bug (or we could change the documentation).

I wouldn't want to prevent implicit header references from linking to headers with explicitly supplied ids.
(Supplying IDs should not prevent you from using the implicit header reference feature.)

jgm added a commit that referenced this issue Sep 14, 2022
Documentation says that when more than one heading has the same text,
an implicit reference `[Heading text][]` refers to the first one.
Previously pandoc linked to the last one instead. This patch
makes pandoc conform to the documented behavior.

See #8300.
@jgm
Copy link
Owner

jgm commented Sep 14, 2022

Closing this - I fixed the bug noted above. (I'm not going to change it to behave in the way you described; behavior after the bug fix is intended.)

@jgm jgm closed this as completed Sep 14, 2022
@ipetraka
Copy link
Author

Thanks for the update. I will test once a build is available and have our team adjust the software if necessary to work with the newer approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants