Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markup parsers should only consider markup on word boundaries #49

Open
ixmatus opened this issue Nov 22, 2018 · 16 comments
Open

Markup parsers should only consider markup on word boundaries #49

ixmatus opened this issue Nov 22, 2018 · 16 comments

Comments

@ixmatus
Copy link
Owner

ixmatus commented Nov 22, 2018

Discovered while testing the HyperLink parser. The parser will incorrectly parse the following:

/[[https://orgmode.org/manual/Link-format.html][The Org Manual: Link format]]/

... as:

Right [Paragraph [Italic [Plain "[[https:"],Italic [Plain "orgmode.org"],Plain "manual",Italic [Plain "Link-format.html][The Org Manual: Link format]]"]]]

This should be easy to fix since formatting markup is only treated as such if the beginning sentinel character is preceded by whitespace and followed by a non-whitespace character.

@ixmatus ixmatus changed the title Recursive markup parser parses URLs as italics Markup parsers should only consider markup for a word Nov 25, 2018
@ixmatus
Copy link
Owner Author

ixmatus commented Nov 26, 2018

CC: @zhujinxuan (only CC'ing to let you know, I have a branch with a fix already in-place I haven't pushed it yet because I want to add more thorough tests).

@zhujinxuan
Copy link
Contributor

Hi, I think we shall write Hyperlink parser like the LaTeX parser. The elements inside the [] shall be considered as Text rather than Markup Text

@ixmatus
Copy link
Owner Author

ixmatus commented Nov 26, 2018

@zhujinxuan I agree with you. However this is still a problem with the markup parser as it considers some text as "marked up" that org-mode's fontification does not. I have the latter fixed and I will also implement your suggestion.

@zhujinxuan
Copy link
Contributor

@ixmatus I think we can guard that by typing. If we define

data Markup a = LaTeX Text

Then we will not need to worry about whether the content of LaTeX is parsed as markup.

@ixmatus
Copy link
Owner Author

ixmatus commented Nov 26, 2018

I don't think that's a problem. I mean that:

/http://someurl.com//

... is parsed incorrectly. Disregarding org-mode hyperlink markup syntax, we expect /http://someurl.com// to parse to an Italic [ Plain "http://someurl.com/" ] however it parses into the following:

Paragraph
[ Italic [ Plain "http:"]
, Italic [Plain "someurl.com"]
, Plain "/"
]

@ixmatus
Copy link
Owner Author

ixmatus commented Nov 26, 2018

As an example, some of the tests demonstrate incorrect behavior too, for instance:

*text *

... should not parse as Bold [ Plain "text" ] but as Plain "*text *".

@zhujinxuan
Copy link
Contributor

@ixmatus Do you have a document of orgmode markup syntax? It seems many corner cases are not documented in https://orgmode.org/manual/Markup.html

@zhujinxuan
Copy link
Contributor

@ixmatus I agree. I tested in emacs-org. I am wondering shall we consider * test * as marked?
screenshot 2018-11-26 16 59 16

@ixmatus
Copy link
Owner Author

ixmatus commented Nov 27, 2018

No, I don't think we should. I think we should follow org-mode's fontification behavior and treat * text * as plain text (that is what my stashed change does now).

@ixmatus
Copy link
Owner Author

ixmatus commented Dec 3, 2018

I'm finding lots of corner cases (by adding tests) that we didn't account for in the markup parser that I need to resolve before I can push up my work.

@zhujinxuan
Copy link
Contributor

@ixmatus Can you open up a PR with some of the tests?

@ixmatus
Copy link
Owner Author

ixmatus commented Dec 3, 2018

I will when I clean up some of my experiments :) I will probably get to it throughout the week, no stress!

@ixmatus ixmatus changed the title Markup parsers should only consider markup for a word Markup parsers should only consider markup on word boundaries Dec 4, 2018
@ixmatus
Copy link
Owner Author

ixmatus commented Dec 18, 2018

I haven't had much free-time to finish this up but I do have free-time coming up for the holidays and I will be working on this then.

@zhujinxuan
Copy link
Contributor

@ixmatus Hi, are you working on this recently? If not, I will begin the fix in next Sat (Mar 30th)

@ixmatus
Copy link
Owner Author

ixmatus commented Mar 24, 2019

@zhujinxuan my real life and job have taken an intense turn. I won't get to this until late May now so if you're able to then that would be great.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants