-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown reader: class="uri" added for non-autolink hyperlinks #4913
Comments
Yes. When we added the class="uri" feature, links did
not have attributes, so the only way to do it was to
check for identity between the link text and the URL.
But now that links have attributes, the class could
be added directly by the Markdown reader, and that's
probably what we should do.
(The Markdown writer would have to be sensitive to this too.)
|
In the Markdown writer case, one thing you'd need to decide is whether autolinking should be done (1) only if |
Chris Jerdonek <notifications@github.com> writes:
In the Markdown writer case, one thing you'd need to decide is whether autolinking should be done (1) only if `class="uri"` is present, or (2) opportunistically (i.e. whenever possible). Currently, it seems to be doing it opportunistically. Another possibility would be to make this behavior configurable.
If we require class="uri" in the Link element before
rendering as an autolink, then we will need to modify
some of the other readers as well to ensure that this
class is added when appropriate. (For example,
presumably the HTML reader should add it when link
text = url.) Otherwise we'll get less idiomatic
html -> markdown translations. Of course, this means
that markdown -> html -> markdown won't in general
be a round-trip.
|
It seems to me like the HTML reader should preserve whether This proposal would also handle cases like |
Chris Jerdonek <notifications@github.com> writes:
It seems to me like the HTML reader should preserve
whether `class="uri"` was present (and similarly for
other readers).
HTML in the wild (not produced by pandoc) is not
likely ever to have this class. So if we required it,
we'd rarely get markdown autolinks in html -> md
conversion.
Note: round tripping markdown -> markdown is not a
design goal. There are ever so many pieces of
information that are lost in converting to the
AST (e.g. whether you use `-` or `*` for a bullet
list).
I think it's probably acceptable if
`[https://pandoc.org](https://pandoc.org)` turns into an
autolink.
|
This can all be addressed by doing the check / auto-conversion when writing the Markdown. I was just advocating that the HTML (and Markdown) readers preserve things faithfully (and providing an option via extension to disable the auto-conversion when writing the Markdown). Doing the auto-conversion when writing the Markdown would also result in idiomatic Markdown in a wider range of scenarios. It would also confine the code to one place (the Markdown writer), instead of having to do the check in every reader. |
I'm lost. I think I misinterpreted your original issue, jumping to conclusions, so let's back up. Your original issue is this:
gets rendered by the HTML writer with a class="uri". That's clearly undesirable. We should fix that by having the HTML writer check not just for identity between the link text and the URL, but that the URL is an absolute URL. Is there anything else you think should be changed, or will that do it? |
There are two issues I mentioned in my first post. The first (for the markdown -> html case) is that for examples like this--
I don't think The second (for the html -> markdown case) is that if Both of the above issues could be addressed by reading faithfully so the AST is always an accurate representation, and confining the special logic of checking the link text and url to only the Markdown writer. |
OK. Well, if we keep the #1501 idea of having class="uri" so that autolinks can be specially styled, then we face a decision about how the HTML writer is supposed to tell whether a
I hope that makes it clear what the options are. Of course, you might have in mind modifying the AST types so that |
There is a third and I think simpler option, which is for the HTML writer to use only the presence of the "uri" class to identify autolinks. With this approach, the Markdown reader would add the uri class when it parses a link formatted as an autolink, and the HTML writer wouldn't need to do anything -- just pass the class attribute through as is. It wouldn't be necessary for the writer to check the link text or URL since the Markdown reader would only be adding the "uri" class for actual (valid) autolinks. I'm not suggesting that the suggestion behind #1501 be reversed -- just that it be implemented differently. I take the #1501 suggestion ("autolinks should have a class when converted to html") to mean that Markdown of the form I think the idea of rendering HTML links as autolinks whenever possible shouldn't be the default behavior because doing so loses whether the link was originally styled as an autolink. |
I don't understand. The HTML writer doesn't need to identify autolinks, since there's no distinction in HTML syntax between autolinks and other links. The issue I raised under (2) above is not with the HTML writer, it's with the HTML reader. Maybe you mean the reader?
This concerns the HTML reader, presumably? If all HTML were generated by pandoc, it would make sense for the HTML reader never to add the "uri" class -- just to pass it through if it's present. But most HTML has not been generated by pandoc. For example, an autolink on GitHub will be rendered in HTML as <a href=http://example.com rel="nofollow">http://example.com</a> So on your proposal, converting this HTML to markdown would give you the unidiomatic
|
Sorry this has been hard to communicate about. Perhaps it's because we're coming at this from different preconceptions / ways of thinking about it.
My last comment was focused primarily on the HTML writer (AST -> HTML) step (because that's what I thought your comment was about), and in particular whether the HTML writer should ever add the "uri" class (Pandoc's way of indicating an autolink in HTML). I was suggesting that the HTML writer never add the uri class if not already present in the AST, but simply pass it through as is. That way an HTML link would only have the uri class going from Markdown to HTML if the Markdown reader added it (i.e. if it had the form of an autolink For the other direction (HTML -> AST -> Markdown), my proposal was that the HTML reader (HTML -> AST) pass the uri class as is. And yes, this means that HTML "in the wild" would normally not have the uri class after reading to the AST. For the Markdown writer (AST -> Markdown), I was suggesting that it could render links as autolinks opportunistically by default (i.e. whenever possible), which would give you the idiomatic Markdown that you prefer. Perhaps the difference in what I'm suggesting is that the special logic checking that the link text matches the URL would be confined to the Markdown writer, and not in any of the readers. One advantage of my proposal is that all of the autolink logic would be restricted to the Markdown reader (adding the uri class depending on how it's formatted in the Markdown) and writer (applying the link-text heuristic). All of the other readers and writers would simply pass things through as is. Another aspect of my proposal (of lesser importance, perhaps down the road) was to allow disabling the Markdown writer's opportunistic autolinking by disabling an |
Okay. That means the Markdown reader needs to add this class for autolinks (currently it is added by the HTML writer).
So on your proposal, for HTML -> Markdown, these would both be rendered as autolinks:
What about something like
Presumably that shouldn't be rendered as an autolink, as this would lose information. So the Markdown writer's test could be: (a) link text matches URL, (b) URL is absolute, (c) no attribute besides possibly a "uri" class. |
With the most recent Pandoc (version 2.3), when converting a hyperlink from Pandoc Markdown to HTML, Pandoc can add
class="uri"
even when the Markdown link isn't an autolink (or isn't even allowed to be). For example ("try it" link):becomes
You can see the effect this has on round-tripping below. With the above example--
you get--
See the related issue #1501 (where the feature was first added) and #4615 (re: email addresses).
The text was updated successfully, but these errors were encountered: