Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link ref. def. + backslash + EOL/EOF #583

Open
mity opened this issue May 2, 2019 · 6 comments
Open

Link ref. def. + backslash + EOL/EOF #583

mity opened this issue May 2, 2019 · 6 comments

Comments

@mity
Copy link

mity commented May 2, 2019

(Distilled from pulldown-cmark/pulldown-cmark#287)

If the following is followed by \n or EOF, is it a valid link reference definition, or not?

[foo]: url\

That raises two questions:

  1. Does \ followed by EOF form a hard break as when it is followed by \n or not?
    (If yes, see (2). If no, it should be ref. link def. and \ is literal char in the URL.)

  2. Can link ref. def. be ended with a hard break instead of a soft break?
    (Here my intuition says "no" but...)

...imho none of these questions is covered by CommonMark specs, as of 0.29.

@mity mity changed the title Link ref. def. + slash + EOF Link ref. def. + slash + EOL/EOF May 2, 2019
@mity mity changed the title Link ref. def. + slash + EOL/EOF Link ref. def. + backslash + EOL/EOF May 2, 2019
@jgm
Copy link
Member

jgm commented May 3, 2019

The syntax \ + NEWLINE for a hard line break is only in place for parsing inlines. In this context we're not parsing inlines; we're parsing a link destination and/or title (which is regular string content).

So yes, it's a URL with a literal backslash.

I'd say this is covered by the spec, insofar as the hard line break syntax is defined in the inline parsing section. There's therefore no more reason to expect that \ + NEWLINE will create a hard break in this context than to expect that *foo* will create italics in

[bar]: *foo*

@mity
Copy link
Author

mity commented May 3, 2019

I would agree. What made me yesterday think it might be a hard break is that:

  1. Paragraph may start with a hard break (at least cmark currently allows it):

    $ printf '\\\nfoo' | ./cmark
    <p><br />
    foo</p>
    
  2. I was wondering whether it is good to interpret the backslash as part of the URL or not depending on what's on the following line(s).

But maybe we should rather ban the hard break on the beginning of the paragraph? cmark does not allow it either on the end of the paragraph (and if we would allow that, then the baskslash in the original example should become just sequence of inlines consisting of nothing but the hardbreak.)

Also I've just noticed this inconsistency:

$ printf '\\\nfoo' | ./cmark
<p><br />
foo</p>

$ printf '[a]: u\\\nfoo\n\n[a]' | ./cmark
<p>foo</p>
<p><a href="u%5C">a</a></p>

@mity
Copy link
Author

mity commented May 3, 2019

And also this inconsistency:

$ printf '\\\nfoo' | ./cmark
<p><br />
foo</p>

$ printf '  \nfoo' | ./cmark
<p>foo</p>

@jgm
Copy link
Member

jgm commented May 3, 2019

I don't really understand why you think these are inconsistencies.
Spec for link reference definition says:

A link reference definition consists of a link label, indented up to three spaces, followed by a colon (:), optional whitespace (including up to one line ending), a link destination, optional whitespace (including up to one line ending), and an optional link title, which if it is present must be separated from the link destination by whitespace. No further non-whitespace characters may occur on the line.

In (numbering the lines)

[a]: u\
foo

[a]

the first line meets the condition for a link reference definition. So it is parsed as such.
The second line is a paragraph. The fourth line is another paragraph with a link. All that is just as the spec prescribes.

In

\
foo

you don't meet the conditions for any other kind of block, so it's parsed as a paragraph with inline content. The first line is a hard break, the rest a string.

@mity
Copy link
Author

mity commented May 3, 2019

Why is the hard break allowed at the beginning of a paragraph but not at its end? Why is the hard break disallowed at the beginning of a paragraph if a link ref. def. precedes? Why only the baskslash-encoded hard break works at the beginning of the paragraph but the double-space one does not?

I don't say it is necessarily against the current specs, but it definitely is peculiar.

Imho stating that hard break can only be inside a paragraph would remove all those strange situations. And it might make sense: What is the hard break at the beginning of the paragraph or at its end good for?

@jgm
Copy link
Member

jgm commented May 3, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants