Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move processing to output function #1227

Merged
merged 5 commits into from
Apr 18, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 16 additions & 64 deletions lib/marked.js
Original file line number Diff line number Diff line change
Expand Up @@ -554,68 +554,9 @@ inline.normal = merge({}, inline);
inline.pedantic = merge({}, inline.normal, {
strong: /^__(?=\S)([\s\S]*?\S)__(?!_)|^\*\*(?=\S)([\s\S]*?\S)\*\*(?!\*)/,
em: /^_(?=\S)([\s\S]*?\S)_(?!_)|^\*(?=\S)([\s\S]*?\S)\*(?!\*)/,
/* Original link re: /^!?\[(label)\]\(\s*<?([\s\S]*?)>?(?:\s+(['"][\s\S]*?['"]))?\s*\)/
* This captures the spec reasonably well but is vulnerable to REDOS.
* Instead we use a custom parser that follows the RegExp.exec semantics. */
link: {
exec: function (s) {
// [TEXT](DESTINATION)
var generalLinkRe = edit(/^!?\[(label)\]\((.*?)\)/)
.replace('label', inline._label)
.getRegex();

// destination: DESTINATION from generalLinkRe
// returns [destination, title]: no angle-brackets on destination, no quotes on title
function splitIntoDestinationAndTitle (destination) {
function unwrapAngleBrackets (str) {
if (str.match(/^<.*>$/)) {
str = str.slice(1, -1);
}
return str;
}

// Valid DESTINATIONs, in decreasing specificity.
var destinationAndTitleRe = /^([^'"(]*[^\s])\s+(['"(].*['")])/;
var destinationRe = /^(<?[\s\S]*>?)/;
var parsingRegexes = [destinationAndTitleRe, destinationRe];

var match = false;
for (var i = 0; i < parsingRegexes.length; i++) {
match = parsingRegexes[i].exec(destination);
if (match) {
break;
}
}

if (!match) {
return null;
}

var dest = match[1];
var title = match[2] || ''; // Not all parsingRegexes have 2 groups.

// Format dest.
dest = dest.trim();
dest = unwrapAngleBrackets(dest);

return [dest, title];
}

var fullMatch = generalLinkRe.exec(s);
if (!fullMatch) {
return null;
}

var text = fullMatch[1];
var destination = fullMatch[2];

var destinationAndTitle = splitIntoDestinationAndTitle(destination);
if (!destinationAndTitle) {
return null;
}
return [fullMatch[0], text, destinationAndTitle[0], destinationAndTitle[1]];
}
},
link: edit(/^!?\[(label)\]\((.*?)\)/)
.replace('label', inline._label)
.getRegex(),
reflink: edit(/^!?\[(label)\]\s*\[([^\]]*)\]/)
.replace('label', inline._label)
.getRegex()
Expand Down Expand Up @@ -762,8 +703,19 @@ InlineLexer.prototype.output = function(src) {
src = src.substring(cap[0].length);
this.inLink = true;
href = cap[2];
href = href[0] === '<' ? href.substring(1, href.length - 1) : href;
title = cap[3] ? cap[3].substring(1, cap[3].length - 1) : cap[3];
if (this.options.pedantic) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't test for pedantic much, and I don't see it well documented anywhere. The only docs I could find are in the man page.

Can you explain when I should test for pedantic in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this code right, without pedantic we will set href but not title. If my understanding is correct, why is this behavior desirable?

Copy link
Member

@styfle styfle Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pedantic flag means follow the original 2004 spec from John Gruber (Daring Fireball).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The regexes are grouped based on the options. The link regex that you fixed was in the pedantic group

    inline.pedantic = merge({}, inline.normal, {
  2. The title is set in the else clause. The regex for a non-pedantic link is /^!?\[(label)\]\(href(?:\s+(title))?\s*\)/ which is extremely complex when constructed and probably also error prone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the actual non-pedantic link regex:

/^!?\[((?:\[[^\[\]]*\]|\\[\[\]]?|`[^`]*`|[^\[\]\\])*?)\]\(\s*(<(?:\\[<>]?|[^\s<>\\])*>|(?:\\[()]?|\([^\s\x00-\x1f()\\]*\)|[^\s\x00-\x1f()\\])*?)(?:\s+("(?:\\"?|[^"\\])*"|'(?:\\'?|[^'\\])*'|\((?:\\\)?|[^)\\])*\)))?\s*\)/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link regex is checking for a title. If no title is found then the href is already set to the whole string inside the parentheses and there is no title.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex that you used before when there was no title (/^(<?[\s\S]*>?)/) literally matches any string and since href is already set to the whole string there is no need to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further down we then set the title field in the returned object based on the (undefined?) value of title. Do we want to set a default value for title, e.g. ''?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents would be an empty value of expected type. Avoid null and undefined checks and possibilities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying. Yes title should probably be defined as null or ''

looks like commonmark doesn't differentiate between the title being an empty string or no title: demo

so I will set it to an empty string.

link = /^([^'"]*[^\s])\s+(['"])(.*)\2/.exec(href);

if (link) {
href = link[1];
title = link[3];
} else {
title = '';
}
} else {
title = cap[3] ? cap[3].slice(1, -1) : '';
}
href = href.trim().replace(/^<([\s\S]*)>$/, '$1');
out += this.outputLink(cap, {
href: InlineLexer.escapes(href),
title: InlineLexer.escapes(title)
Expand Down
1 change: 1 addition & 0 deletions test/new/link_lt.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p><a href="%3Ctest">URL</a></p>
1 change: 1 addition & 0 deletions test/new/link_lt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[URL](<test)