This repository has been archived by the owner on Dec 15, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 107
Refactor grammar in order to improve readability and address a bug in syntax highlighting of Atom and GitHub #240
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
One of the problems at GitHub that we saw when rending a markdown file with some code blocks is that when there was an unclosed block, it will highlight the rest of the remaining text as bold, overriding any highlights that should take effect. This test change is based on a refactoring that was done to group together the patterns and also change the regex for bold to match the one used in `burodepeper/language-markdown` which is also an atom package for syntax highlighting
…reedy One of the problems at GitHub that we saw when rending a markdown file with some code blocks is that when there was an unclosed block, it will highlight the rest of the remaining text as bold, overriding any highlights that should take effect. This test change is based on a refactoring that was done to group together the patterns and also change the regex for bold to match the one used in `burodepeper/language-markdown` which is also an atom package for syntax highlighting
…greedy One of the problems at GitHub that we saw when rending a markdown file with some code blocks is that when there was an unclosed block, it will highlight the rest of the remaining text as italic, overriding any highlights that should take effect. This test change is based on a refactoring that was done to group together the patterns and also change the regex for bold to match the one used in `burodepeper/language-markdown` which is also an atom package for syntax highlighting
…greedy One of the problems at GitHub that we saw when rending a markdown file with some code blocks is that when there was an unclosed block, it will highlight the rest of the remaining text as italic, overriding any highlights that should take effect. This test change is based on a refactoring that was done to group together the patterns and also change the regex for bold to match the one used in `burodepeper/language-markdown` which is also an atom package for syntax highlighting This test change also has a little conflicting test because `is*italic*` stylizes the italic word while `not_italic_` does not. I think this is expected behavior given the other uses of `_`. This is also ready the behavior in GitHub production right now. ![prod-format](https://user-images.githubusercontent.com/1518902/83576959-d4a1de00-a4e7-11ea-802f-0ff7064a9309.png)
The refactor to group together patterns has shown that it will glob the text after the `>` indicator as one token. The test now reflects that change
The refactor of the grammar now expects a punctuation scope for an italic marker match. This just adds that to the expectation of the test
The refactor of the grammar now expects a punctuation scope for a bold marker match. This just adds that to the expectation of the test
This adds additional context to the text that is parsed that matches bold italics in HTML. This comes after the refactor of the grammar into categories
We are removing this file because we will be replacing it with a JSON file that is refactored into groups. This will allow for better reading of the the file and additions if needed. This refactor is not purely cosmetic, the refactor addresses bug that was raised by a GitHub customer about syntax highlighting not working for a specific language
This change groups patterns into categories. It then composes these into patterns that can be mixed and matched. This is not purely as asthetic refactor, this addresses a limitation that was found by a GitHub customer. They reported that if they had a `__` (double underscore), which is a valid character in objective c,inside a code block, it will override the rest of the file and highlight it as bold. This change carries over some regular expressions for bold characters from the `burodepeper/language-markdown` grammar which is also an atom package. This new regular expression is not as _greedy_ as the previous one so it does not override the rest of the file. Please see the description of this PR for a more detailed explantion of the process.
maxbrunsfeld
approved these changes
Jun 3, 2020
It looks like you've tested this very thoroughly, and this fixes a really bad bug. I think at this point we should 🚢 . |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
It was brought to the attention of GitHub in issue#144037 that a customer was seeing that syntax highlighting did not work for a specific language.
Here we see that the
__
is overriding the syntax of the rest of the file by making everything after this bold. At first we thought this was a problem in theobjective c
language however, after more testing in both Atom and GitHub we see that is actually a problem with the markdown syntax highlighting.The screenshot shows us that if we have an unclosed delimiter for bold or italic it will greedily find the closing delimiter and if it does not, it will scope the tokens that it see as bold or italic thus overriding the syntax highlighting.
Description of the Change
Prior to this change, the grammar file was a
CSON
file that had all the patterns to match in one list; making it very difficult to parse through and identify where the problem was.During my investigation, I stumbled on two things
They both used the grouped collection of patterns that I am introducing in the change. So my first change was to do just a cosmetic refactor. I separated the patterns into groups influenced by the groups that were used in the two packages/extensions that I mentioned above.
After doing that and making sure the tests still pass (unchanged) I then made changes to fix the behavior that was reported. In order to fix that behavior I introduced/replaced the patterns for bold and italic. These were mostly patterned off the
language-markdown
package since it was already supported in Atom 🎉Testing
I was able to test both in Atom and GitHub
Atom
With the new grammar, I overrode the definition of the package and reloaded the editor.
GitHub
GitHub uses linguist to highlight the files that you see online. Linguist uses the
language-gfm
package for markdown parsing and syntax highlighting. There is an an app (Lighshow) that one can use to paste in the grammar for testing and it will highlight the block of text that you pass in.We see the comparison of using the old grammar and new grammar
Markdown and GitHub Flavored Markdown
I wanted to make sure that we still correctly highlight basic markdown. So I made a quick test using Lightshow and used the examples found in Mastering Markdown
Body of the text I used
This is an
tag
This is an
tag
This is an
tag
This text will be italic
This will also be italic
This text will be bold
This will also be bold
This will be bold italic
This is also bold italic
™ ™ ³
™ ™ ³
isitalic
not_italic_
bold
and bold
You can combine them
Format:
http://github.com - automatic!
GitHub
As Kanye West said:
I think you should use an
<addr>
element here instead.def foo():
if not bar:
return True
tagssupported16c999e8c71134401a78d4d46435517b2271d6ac
mojombo@16c999e8c71134401a78d4d46435517b2271d6ac
mojombo/github-flavored-markdown@16c999e
#1
mojombo#1
mojombo/github-flavored-markdown#1
Mention @nickborromeo
http://www.github.com/
thisthatThis a screen recording of the comparison, on the left is the new grammar and on the right is GitHub in production right now.
Alternate Designs
An alternative is to keep the
CSON
file and just stick in the new rules/patterns to fix the behavior. This would be the quicker option however, will make the file even less readable in my opinion.Benefits
The grammar is now easier to follow. In editors that support folding, it is quiet easy now to navigate through the blocks since they are grouped and have an indicator of what the patterns in the group are for.
Possible Drawbacks
The tests that are not very exhaustive. I am not 100 percent sure that we have feature parity, at least for the ones that count. However, I think this out weighs the current behavior we have and we can always iterate on as issue come up (if any).
Applicable Issues
Acknowledgments
I would like to thank @maxbrunsfeld who paired with me while investigating this and help me with questions I had to better understand the processes and services used. 🙇