Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse HTML recursively #358

Open
RobertDober opened this issue Jun 22, 2020 · 3 comments
Open

Parse HTML recursively #358

RobertDober opened this issue Jun 22, 2020 · 3 comments

Comments

@RobertDober
Copy link
Collaborator

RobertDober commented Jun 22, 2020

This would superseed #356 and is inspired by #353.

Basic idea

let the scanner be a little be more intelligent and scan the following line

<div> hello<br /> <span lang="greek">αλφα</span></div>

as

   [
     OpenTag{div, []}, Text{hello}, VoidTag{br}, 
     OpenTag{span, [{lang, greek}]} Text{αλφα} CloseTag{span}
     CloseTag{div}]
@RobertDober
Copy link
Collaborator Author

RobertDober commented Jun 22, 2020

Notes:

  • Scanning will become slower, but parsing should become faster, given that scanning is done in parallel while parsing is not this should not have a negative impact of overall performance.

  • Right now the scanner uses some rgxen that might be subject to DoS so this might be another advantage of rewriting the scanner here, as splitting into junks by ~r{<[^>]+>}, seems totally safe.

  • Failing to close a tag somewhere will totally alter the rest of the output, however the error messages will indicate what to do.

RobertDober added a commit that referenced this issue Jun 22, 2020
We'll keep it here from now, as in 1.5 this might become obsolete due to #358
RobertDober added a commit that referenced this issue Jun 22, 2020
We'll keep it here from now, as in 1.5 this might become obsolete due to #358
@RobertDober
Copy link
Collaborator Author

Here as an example what to strive for: GFM output

which is

<span><em>a</em>hello
<hr /><strong>b</strong></span>

@RobertDober
Copy link
Collaborator Author

will be implemented with RobertDober/earmark_parser#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant