-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewrote the escape function to escape all markdown characters #242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! 💫 I love the use of reduce
!
I have made quite a few comments, so I hope this doesn't come across as ungrateful…
Overall, I like this approach. As I mentioned before, I think the library should escape aggressively to ensure that valid markdown is outputted. However, I don't think it is necessary to escape all markdown characters in any context (sorry, I should've mention this before). Escaping *
, _
, and `
in all contexts makes sense, but for the other characters, I think we can be a little less aggressive so that the output isn't overloaded with \
.
As I said, I hope the comments aren't too overwhelming. Let me know what you think, and thanks again!
src/turndown.js
Outdated
@@ -6,6 +6,24 @@ import Node from './node' | |||
var reduce = Array.prototype.reduce | |||
var leadingNewLinesRegExp = /^\n*/ | |||
var trailingNewLinesRegExp = /\n*$/ | |||
const markdownReplacements = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps var escapes = [
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/turndown.js
Outdated
[/-/g, '\\-'], | ||
[/\+/g, '\\+'], | ||
[/=/g, '\\='], | ||
[/#/g, '\\#'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#
is only used for atx-style headings, so we can be more specific and reduce the number of escapes. Something like:
[/^(\W* {0,3})(#{1,6} )/gm, '$1\\$2']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified this to
[/^(#{1,6})/g, '\\$1']
produces an escaped headline like this:
### headline
= \### headline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a single backslash if sufficient: https://spec.commonmark.org/dingus/?text=%5C%23%23%23%20Not%20a%20heading%0A
src/turndown.js
Outdated
[/=/g, '\\='], | ||
[/#/g, '\\#'], | ||
[/`/g, '\\`'], | ||
[/~/g, '\\~'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~
can only be used for code blocks, so to reduce the number of escapes, something like:
[/^(\W* {0,3})~~~/gm, '$1\\~~~']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also be used for a strikethrough effect:
~~strikethrough~~
= strikethrough
In this case shall we stick with the aggressive escaping for ~?
src/turndown.js
Outdated
[/#/g, '\\#'], | ||
[/`/g, '\\`'], | ||
[/~/g, '\\~'], | ||
[/\|/g, '\\|'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pipes (|
) are used for tables, this should probably be part of turndown-plugin-gfm, and so can be removed
src/turndown.js
Outdated
[/~/g, '\\~'], | ||
[/\|/g, '\\|'], | ||
[/\(/g, '\\('], | ||
[/\)/g, '\\)'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think (
or )
need to be escaped if [
and ]
are escaped, so we can remove these two lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/index.html
Outdated
<pre class="expected">You can use * for multiplication: 1.5 * 3 = 4.5</pre> | ||
<div class="case" data-name="escaping ="> | ||
<div class="input">A sentence containing =</div> | ||
<pre class="expected">A sentence containing \=</pre> | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case can probably be removed
test/index.html
Outdated
<div class="input">42 > 1</div> | ||
<pre class="expected">42 > 1</pre> | ||
<pre class="expected">42 \> 1</pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case can be reverted
test/index.html
Outdated
<div class="case" data-name="escaping parentheses"> | ||
<div class="input">(A sentence containing)</div> | ||
<pre class="expected">\(A sentence containing\)</pre> | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case can be removed
test/index.html
Outdated
<div class="case" data-name="escaping |"> | ||
<div class="input">A sentence containing |</div> | ||
<pre class="expected">A sentence containing \|</pre> | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case can be removed
test/index.html
Outdated
<div class="case" data-name="escaping ~"> | ||
<div class="input">A sentence containing ~</div> | ||
<pre class="expected">A sentence containing \~</pre> | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should altered to only test a case where a ~
could be converted to markdown
Hi @domchristie, thanks for the comments. Hopefully I will get a chance to look at this PR at some point this week and make changes. 👍 |
Hi @domchristie. I work with @ayusaf1992 and I've been looking through the PR. Your feedback seems to be mostly around not escaping certain characters so aggressively where they are only valid markdown in very specific circumstances and that makes sense to me. I've left a few replies here and there for further clarification but broadly speaking I think we'll try to find some time to make these changes this week :-). Thanks for all your help! |
…n valid markdown.
src/turndown.js
Outdated
[/\\/g, '\\\\'], | ||
[/\*/g, '\\*'], | ||
[/-/g, '\\-'], | ||
[/\+/g, '\\+'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Hi @domchristie. In my recent commit I've made most of the changes discussed above. Outstanding issues are:
We're keen to get this work merged as we have other priorities demanding our time. For that reason I'm thinking that the approach here to deal with That just leaves Cheers. |
Hi @Paul-Ladyman, thanks so much for your work on this. Sorry it has taken a couple of days to get back to you (I have been busy working as well as enjoying the rather nice weather!) Please see my comments below:
By default, I don't think that Turndown should escape non-commonmark characters, so it shouldn't escape pipes. Tildes are an interesting case due to the way they are used in other markdown flavours. I think the ultimate solution to this would be to add an "escape" API (
Yes, I think this is fine for now. I can work on implementing function replacements later.
I'm not keen on As I mentioned, I think we can improve the escaping feature, but in the meantime, you may want to override the var oldEscape = TurndownService.prototype.escape
TurndownService.prototype.escape = function (string) {
var escapes = [
[/[^\\]~/g, '\\~'], // only escape ~ that have not already been escaped
[/\|/g, '\\|']
]
string = string.replace(/_/g, '_')
string = oldEscape(string)
return escapes.reduce(function (accumulator, escape) {
return accumulator.replace(escape[0], escape[1])
}, string)
} |
- Don't escape | here as its not commonmark - Use regular escaping for _ rather than HTML entities
Hi @domchristie, Thanks again for your feedback! Thinking more about pipes the gfm table syntax is so specific that I don't think we (my colleagues and I) should be immediately worried about our users recreating it. And I think its a good idea to keep the gfm stuff to the plugin so I've removed it from this PR. Updating the plugin may not be something we do ourselves in the short term however. I didn't realise that the use of tilde for strikethrough was part of gfm. I've updated the regex to I've also changed the Think that about covers it :-). Cheers! |
Thank you! I really appreciate the time you and your colleagues (@ayusaf1992 & @olih) have put into this. Out of interest, are you using it for anything interesting? Let me know if you are able to do so :)
👍
👍
👍 The changes will be in the next version which will be released soon. Thanks again! |
Released in v5.0.0 |
Hi @domchristie, Nice to see this got released. Sorry I didn't answer your question before. @ayusaf1992, @olih and I work for the BBC. We're implementing a new internal article editor that journalists working in for example BBC Sport or BBC News will use to write and publish articles to their respective sites. The editor uses SlateJS and we use Markdown to represent Slate's rich text "marks" in a more common format. We're actually thinking of moving away from Markdown as we have some requirements that don't seem to be a standard part of any spec. For example representing the difference between text written in a left-to-right language and text in a right-to-left language. In any case we were happy to help out with this bug fix and pleased it ended up getting released :-). Cheers, |
in order to fix issue #233