Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mention that link reference definitions are constructed from paragraphs #605

Open
wooorm opened this issue Sep 11, 2019 · 9 comments
Open

Comments

@wooorm
Copy link
Contributor

wooorm commented Sep 11, 2019

Problem

According to the spec text:

[
␠␠␠␠# a
␠␠␠␠b
␠␠␠␠]:
␠␠␠␠example.com '
␠␠␠␠line1
␠␠␠␠...
␠␠␠␠'

...is fine: it’s a proper link reference definition. This lead me to believe that true streaming, as noted in § Appendix ¶ Phase 1, wouldn‘t work because if the last apostrophe wasn’t there, we’d need to backtrack (to the start because the opening apostrophe is on the line of the destination, if it was on its own line, the definition would be valid but we’d still need to backtrack to parse the title again).

To my surprise, the following is not a link reference definition (note one less space before a)

[
␠␠␠# a
␠␠␠␠b
␠␠␠␠]:
␠␠␠␠example.com '
␠␠␠␠line1
␠␠␠␠...
␠␠␠␠'

...# a is now a heading! Only then did I see that the Appendix contains:

Reference link definitions are detected when a paragraph is closed; the accumulated text lines are parsed to see if they begin with one or more reference link definitions. Any remainder becomes a normal paragraph.

Solution

I think it’s good to mention in the main text that link reference definitions are created from paragraphs, and include a test for it. Not entirely sure how to describe this though. This will also help prevent blank lines that are currently possible in labels (GH-586)

Extra

As paragraph lines are made into actual paragraphs and definitions, setext heading lines come into play, so relating to GH-395, I think the following may also be interesting to expand upon:

[foo]: /url
'alpha
=
bravo'

[foo]

Dingus:

<h1>'alpha</h1>
<p>bravo'</p>
<p><a href="/url">foo</a></p>
@jgm
Copy link
Member

jgm commented Sep 11, 2019

The reference parser does construct these from paragraphs (similarly setext headers). That's an implementation detail, though. If we didn't care about efficiency, we could simply have a separate block parser for these and backtrack.

@wooorm
Copy link
Contributor Author

wooorm commented Sep 11, 2019

Implementation details should indeed be in the appendix, agreed, but what my issue is more about, is that there’s nothing in the spec arguing for, taking a maybe more clear example, why:

[
# alpha
]: https://example.com

[# alpha][]

Yields a heading.

@jgm
Copy link
Member

jgm commented Sep 12, 2019

Yes, I agree that more needs to be said about reference link definitions.
I'm just not sure talking about "paragraphs" is the best way to do it.

@wooorm
Copy link
Contributor Author

wooorm commented Sep 12, 2019

I can’t see an easy solution.

One way would be to use “interrupting content” instead of “interrupting paragraphs”:

An indented code block cannot interrupt a paragraph a content line. (This allows hanging indents and the like.)

ATX headings need not be separated from surrounding content by blank lines, and they can interrupt paragraphs content lines:

...and then both definition “lines” and paragraphs fall into that category? 🤔

@jgm
Copy link
Member

jgm commented Sep 12, 2019

Another alternative would be just to say "interrupt a paragraph or a link reference definition."

@wooorm
Copy link
Contributor Author

wooorm commented Oct 1, 2019

Yeah, maybe that’s good!
I’m not so sure about the word paragraph, as setext headings are made from that construct, but as they are headings, they aren‘t really paragraphs

@jgm
Copy link
Member

jgm commented Oct 1, 2019

setext headings are made from that construct

That's just how they're handled in the reference implementation (for parsing efficiency). As far as the spec goes, they have nothing to do with paragraphs.

@wooorm
Copy link
Contributor Author

wooorm commented Jul 4, 2020

Another point of confusion for me, I don‘t understand the interplay between paragraphs/setext headings/definitions:

E.g.,:

[a]: b
    content?

a
=
    content?

Yields:

content?

a

content?

What gives that there can be code after a setext heading, but not a definition? I was expecting both content?s to be paragraphs.

@vassudanagunta
Copy link

vassudanagunta commented Jul 6, 2020

It seems to me the discussion above assumes that that CommonMark.js / Dingus behavior is the spec and thus the spec needs to be updated to conform to that behavior. I would suggest that this is the wrong way to look at it (with the one exception of maintaining backward compatibility that should be maintained, since that is a CommonMark spec goal).

For example, I'm working on an implementation of the CommonMark spec. It passes all the tests, yet does NOT treat the # alpha in @wooorm's example as a heading. It interprets it as the label of a link ref def.

As far as the spec goes, they have nothing to do with paragraphs.

The reference parser does construct these from paragraphs (similarly setext headers). That's an implementation detail, though. If we didn't care about efficiency, we could simply have a separate block parser for these and backtrack.

This is what my implementation does.

Given Markdown's principles (reader oriented), to me the way one decides is by asking: What does the following look like to most readers?

[
# alpha
]: https://example.com

[# alpha][]

Though at the end of the day, it's an unimportant corner case. If the author of the above Markdown cared about the reader, they would not write something so unnecessary! The line breaks serve no purpose.

But also, by that same note, any inefficiency resulting from rules that would require backtracking (is look-ahead considered backtracking?) would only affect such corner cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants