Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Improving the italic and bold styles #89

Closed
wants to merge 2 commits into from

Conversation

IonicaBizau
Copy link
Contributor

This will fix #84. Since I'm not a regex guru I got some inspiration from the marked package.

  • Italic style fix

  • Bold style fix

  • Combined (?)

  • Handle styles on multiple lines. Example:

    *Lorem
    ipsum
    dolor
    sit
    amet*

Before

image

After

image

Waiting for feedback if the direction is good, then I will continue. 😄

/cc @izuzak

@mnquintana
Copy link
Contributor

This is a good start! Eventually you'll want to add specs for this.

(Also, 💄 is deprecated in favor of 🎨 for code structure / formatting changes, and this isn't really a code structure / formatting change – it's really a 🐛 )

@IonicaBizau
Copy link
Contributor Author

Indeed, the art is much better than a lipstick. 😄

OK! Will make sure that the tests are passing and I will also try to add the specs for this.

@tpoisot
Copy link
Contributor

tpoisot commented Apr 15, 2015

Really cool. I think that having the italic/bold marks with a different syntax style than the italic/bold content is useful.

@IonicaBizau
Copy link
Contributor Author

I fixed the bold and bold + italic styles as well:

image

Is it possible to pass the regex flags in this format (would need to pass m for multiline)? Or how to fix the multiline thing (the last two lines from the screenshot are supposed to have bold style)?

Also, I'm not sure where to start fixing the failing tests. 💚

@tpoisot
Copy link
Contributor

tpoisot commented Apr 15, 2015

I think you should capture the * and _ too, so that they can be separately styled.

And the tests are in the spec/folder

@IonicaBizau
Copy link
Contributor Author

I think you should capture the * and _ too, so that they can be separately styled.

That's already done, but not separately, but I was asking how can I apply bold style on the last two rows? I guess it's related to the m (multiline) regex flag.

I will try to fix the failing tests and then writing the new specs on this issue.

@IonicaBizau
Copy link
Contributor Author

Is it possible to log only the first failing test when running apm test?

@izuzak
Copy link
Contributor

izuzak commented Apr 17, 2015

Is it possible to log only the first failing test when running apm test?

I don't know if that's possible (see jasmine/jasmine#414), but you can focus on a specific describe or it block by prefixing it with "f". See https://github.com/atom/jasmine-focused#using. Does that help perhaps?

@IonicaBizau
Copy link
Contributor Author

Yes, that's a good solution. Thanks! 👍

Also, is it possible to merge repetitive snippets like the following ones, into one?

{
  'match': '...',
  'captures': 
    '2': 
       'name': 'markup.bold.italic.gfm'
    '3': 
       'name': 'markup.bold.italic.gfm'
    '4': 
       'name': 'markup.bold.italic.gfm'
}

...maybe something like this:

{
  'match': '...',
  'captures': 
    '2': 
    '3': 
    '4': 
       'name': 'markup.bold.italic.gfm'
}

I'm not very familiar with the cson format.

@izuzak
Copy link
Contributor

izuzak commented Apr 17, 2015

I don't think that's possible.

'begin': '(?<=^|[^\\w\\d_])__(?!$|_|\\s)'
'end': '(?<!^|\\s)__*_(?=$|[^\\w|\\d])'
'name': 'markup.bold.gfm'
'match': '^___([\\s\\S]+?)___(?!_)|^\\*\\*\\*([\\s\\S]+?)\\*\\*\\*(?!\\*)'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently have 'match': '(?<=^|[^\\w\\d\\*])(\\*\\*\\*)([\\s\\S]+?)(\\*\\*\\*)(?=$|[^\\w|\\d])', and I the tokens are:

[
    {
        "value": "this is ",
        "scopes": [
            "source.gfm"
        ],
        "bufferDelta": 8,
        "hasPairedCharacter": false,
        "screenDelta": 8
    },
    {
        "value": "***bold italic***",
        "scopes": [
            "source.gfm",
            "markup.bold.italic.gfm"
        ],
        "bufferDelta": 17,
        "hasPairedCharacter": false,
        "screenDelta": 17
    },
    {
        "value": " text",
        "scopes": [
            "source.gfm"
        ],
        "bufferDelta": 5,
        "hasPairedCharacter": false,
        "screenDelta": 5
    }
]

I expected that by doing (\\*\\*\\*) these will be grouped in another element ({ value: "***" }). The regex101 tester works fine in this direction:

image

How can I catch these characters into a separate group when using apm test? Also, what tool is used for parsing this regex?

@leipert
Copy link

leipert commented Apr 19, 2015

The main problem I found: mixed emphasis are handled badly, as well as some cases of escaped * and _.

I created this test markdown, below you can see how github renders it. As you can see some github rendered stuff is not consistent.

### Emphasis

#### Bold `_`

+ __a__
+ __ab__
+ __abc__
+ __bbc___
+ __x + y + z__

NOT:

+ as __danach__fasd
+ as __danach____fasd
+ davor__asd__ as
+ in__a__word
+ in 1__8__99 numbers
+ __ abc __
+ __ abc__
+ __abc __

MULTILINE:
__sadsad
asdsadasd
__

#### Italic `_`

+ _a_
+ _ab_
+ _abc_
+ _bbc__
+ _x + y + z_

NOT:

+ as _danach_fasd
+ as _danach___fasd
+ davor_asd_ as
+ in_a_word
+ in 1_8_99 numbers
+ _ abc _
+ _ abc_
+ _abc _

#### Bold `*`

+ **a**
+ **ab**
+ **\\**
+ in**a**word
+ in 1**8**99 numbers
+ as **danach**fasd
+ davor**asd** as
+ **abc**
+ **bbc***
+ **x + y + z**

NOT:

+ **\**
+ ** **, *****, **abc*\*
+ ** abc **
+ ** abc**
+ **abc **

#### Italic `*`

+ *a*
+ *ab*
+ *\\*
+ in*a*word
+ as *danach*fasd
+ davor*asd* as
+ in 1*8*99 numbers
+ *abc*
+ *bbc**
+ *x + y + z*

NOT:

+ *\*, * *, ***, *abc*
+ * abc *
+ * abc*
+ *abc *

#### Mixed

Works:

this is ***bold italic*** text

***BoldItalic***
___BoldItalic___
**Bold *BoldItalic* Bold**
__Bold *BoldItalic* Bold__
_Italic **ItalicBold** Italic_
**Bold _BoldItalic_ Bold**
*Italic __ItalicBold__ Italic*
__Bold _BoldItalic_ Bold__

Works not:

*Italic **ItalicBold** Italic*

_Italic __ItalicBold__ Italic_



Should not work:

_\_, _ _, ____
__\__, __ __, _____

__asd \*neither is this\* asd__

**asd \_neither is this\_ asd**

__asd \**neither is this*\* asd__

__asd \_neither is this\_ asd__

#### Rest

This text is _emphasized with underscores_, and this
is *emphasized with asterisks*

This is **strong emphasis** and __with underscores__.

Working:
***bold italic***

Not Working:
*italic **bold italic** italic*
**bold *bold italic* bold**

This is * not emphasized *, and \*neither is this\*.

Emphasis

Bold _

  • a
  • ab
  • abc
  • bbc_
  • x + y + z

NOT:

  • as __danach__fasd
  • as __danach____fasd
  • davor__asd__ as
  • in__a__word
  • in 1__8__99 numbers
  • __ abc __
  • __ abc__
  • __abc __

MULTILINE:
__sadsad
asdsadasd
__

Italic _

  • a
  • ab
  • abc
  • bbc_
  • x + y + z

NOT:

  • as _danach_fasd
  • as _danach___fasd
  • davor_asd_ as
  • in_a_word
  • in 1_8_99 numbers
  • _ abc _
  • _ abc_
  • _abc _

Bold *

  • a
  • ab
  • **
  • inaword
  • in 1899 numbers
  • as danachfasd
  • davorasd as
  • abc
  • bbc*
  • x + y + z

NOT:

  • **
  • ** **, ****_, *abc_
  • ** abc **
  • * abc*
  • *abc *

Italic *

  • a
  • ab
  • __
  • in_a_word
  • as _danach_fasd
  • davor_asd_ as
  • in 1_8_99 numbers
  • abc
  • bbc*
  • x + y + z

NOT:

  • __, * , *__, *abc
  • * abc *
  • * abc*
  • *abc *

Mixed

Works:

this is _bold italic_ text

_BoldItalic_
_BoldItalic_
Bold BoldItalic Bold
Bold BoldItalic Bold
Italic ItalicBold Italic
Bold BoldItalic Bold
Italic ItalicBold Italic
Bold BoldItalic Bold

Works not:

_Italic _ItalicBold* Italic*

Italic ItalicBold Italic

Should not work:

__, _ _, ____
****, __ , ___

asd neither is this asd

asd neither is this asd

asd neither is this asd

asd neither is this asd

Rest

This text is emphasized with underscores, and this
is emphasized with asterisks

This is strong emphasis and with underscores.

Working:
_bold italic_

Not Working:
_italic _bold italic* italic*
bold bold italic bold

This is * not emphasized _, and _neither is this*.

@leipert
Copy link

leipert commented Jun 11, 2015

Please have a look at my comment in #44. There is simply no satisfying way to implement multiline emphasized/bold text with the current grammar, as far as I can see it.

@burodepeper
Copy link

Hey @leipert (and others),

I've been looking into this as well (see #120, different thing, but got me thinking about Markdown), and I think multiline emphasis is just not part of the philosophy behind Markdown. From that same philosophy, I believe headings, for example, shouldn't be able to contain inline markup. Markdown is about simplicity. If you want complex markup, use HTML.

I haven't looked at all your examples above, but how about we write specs for how we believe the simple version of Markup should be implemented, and ignore the weird edge cases?

@leipert
Copy link

leipert commented Aug 30, 2015

@burodepeper You are right, markdown is all about simplicity, but it is also really expressive. That it is the reason why you are able to compile it to nice looking HTML, PDF or even Powerpoint with tools like pandoc.

Even the original MD definition by John Gruber is allowing you advanced stuff like inline HTML https://daringfireball.net/projects/markdown/syntax. The original spec does not forbid you to use something like this:

# This is a *fancy* header

This is the reason why most markdown renderers allow you to write inline emphasis.

Furthermore the syntax highlighting should reflect the actual render. It makes no sense, that the language-gfm package shows you another syntax than the markdown-preview output.

On a side note:
After dealing several hours (something like 80) with Atoms rendering while creating language-pfm, I came to several conclusions:

  1. Syntax highlighting with Regex sucks. Especially markdown with Atoms Grammar Regex Parsers, as markdown syntax heavily relies on empty lines, previous lines and new paragraphs, which cannot be detected with the current Grammar Parser.
  2. There are too many different markdown flavors. I hope, that http://spec.commonmark.org/ will overcome this in the future.
  3. The only reliable syntax highlighting of markdown can be done with help of Syntax Trees. Hopefully this is somewhat possible in the future, as markdown is not really made for line by line parsing.

@burodepeper
Copy link

Furthermore the syntax highlighting should reflect the actual render. It makes no sense, that the language-gfm package shows you another syntax than the markdown-preview output.

I don't agree on that. I believe Markdown is about nothing more than semantics (as html is supposed to be). Aren't you supposed to be able to semantically read a Markdown document without any syntax highlighting? I've started using Markdown in my emails, and people understand marking words with asterisks, and specifying headers and <hr>s.

Syntax highlighting with Regex sucks.

That's something I do agree on! ; )

I hope, that http://spec.commonmark.org/ will overcome this in the future.

I've heard about it, but haven't really looked into it. Putting it on my list right now.


In short, I use markdown files a lot, so I don't mind spending time to improve the language-gfm package. And since it's Github Flavored, who's in charge of deciding the level of simplicity?

@leipert
Copy link

leipert commented Aug 30, 2015

I also use markdown files a lot, which brought me to participate in these three packages:

  • markdown-preview-plus. An awesome Markdown Preview (new release coming tonight)
  • language-pfm. Pandoc Flavored Markdown, which is a fork of this package and works out a lot of the quirks, while maintaing compability to this package.
  • linter-markdown. Markdown Linter

Sorry for the advertisement, if you find any useful code in language-pfm, I am happy to start writing PRs for language-gfm.

@burodepeper
Copy link

Thanks, I'm going to take a look at them.

@tpoisot
Copy link
Contributor

tpoisot commented Aug 31, 2015

@burodepeper I don't think your point on nested emphasis or emphasis in headers not being within the markdown spirit is right. In fact, if you look at the original markdown implementation, # a *b* is parsed as <h1>a <em>b</em></h1>. And emphasis is semantics!

I would also be happy if CommonMark gained more prevalence. Github flavored markdown is already more or less CommonMark compliant, by the way.

@IonicaBizau
Copy link
Contributor Author

I'm going to close this since I just feel I'm not ready to understand this regex black magic yet. 😄 🔮

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected italic style in github.com diff views
6 participants