bug parsing HTML (<pre><code>) #356

eksperimental · 2020-06-22T02:58:56Z

Two semantically equal expressions, but the second one fails.

iex(1)> string = """
...(1)> <pre><code>
...(1)> 1 & 2
...(1)> 1 > 2
...(1)> </code>
...(1)> </pre>
...(1)> """
iex(2)> Earmark.as_ast(string)
{:ok,
 [
   {"pre", [], ["<code>", "1 & 2", "1 > 2", "</code>"],
    %{meta: %{verbatim: true}}}
 ], []}

iex(4)> string = """
...(4)> <pre><code>
...(4)> 1 & 2
...(4)> 1 > 2
...(4)> </code></pre>
...(4)> """
iex(5)> Earmark.as_ast(string)
{:error,
 [
   {"pre", [], ["<code>", "1 & 2", "1 > 2", "</code></pre>"],
    %{meta: %{verbatim: true}}}
 ], [{:warning, 1, "Failed to find closing <pre>"}]}

RobertDober · 2020-06-22T07:33:33Z

Status Quo

Here is what kind of HTML Earmark supports, and I will update the documentation which is not good (was even missing lately)

Oneline HTML Tags

   <tag...>{content}</tag>{suffix}

which will render

   {"tag", [], ["content"], %{verbatim: true}} # 1.4.6 format

One level of a block

<tag>
    {content}
</tag>

{"tag", [], [content], %{verbatim: true}}

where both, <tag> and </tag> must be on their own line (original definition by Dave).
However your first example works as the result of permissive parsing, so maybe to avoid regressions I will rephrase that accordingly in the documentation.

So I will take two actions,

definitely add the above paragraph to the documentation --> 1.4.6
investigate about the second example (rule would be: opening tag must be on start of line, closing tag must be on end of line)

Ok with you?

eksperimental · 2020-06-22T08:29:59Z

I just found that when testing it, and thought I would be good to report it. there are no worries about regressions.
Thanks for the info.

RobertDober · 2020-06-22T09:43:18Z

You have just named the game, I believe that all the issues you brought up are very valid and while investigating I have some hopes to recursively parse HTML with cleaner code, but not sure yet, however this cannot go into 1.4.6. but I will try to treat HTML nicely (against my will 😉) in 1.5
simply because of GFM.

We'll keep it here from now, as in 1.5 this might become obsolete due to #358

eksperimental · 2020-06-26T02:02:16Z

Would it be possible to have an option to leave a copy of the original HTML element in the metadata whenever vertabim: true?
I think it will be useful in case we want to delegate to an specialized library, such as Floki to deal with the HTML parsing.
Well, I'm experimenting with that idea in ExDoc.
Thank you.

RobertDober · 2020-06-26T06:48:22Z

Do you mean

   {"div", [{"class", "elixir"}] [best code ever] %{verbatim: true}}

--->

   {"div", [{"class", "elixir"}] [best code ever] %{verbatim: true, html: ~s[<div class="elixir">best code ever</div>]}}

sure sounds like a sound idea to me.

eksperimental · 2020-06-26T13:29:22Z

yes. exactly that!

RobertDober · 2020-07-01T15:30:39Z

This issue should be obsoleted by #358 (which is RobertDober/earmark_parser#7) and the Verbatim Annotation Part is implemented by RobertDober/earmark_parser#8

RobertDober mentioned this issue Jun 22, 2020

Document how HTML parsing works #357

Closed

RobertDober self-assigned this Jun 22, 2020

RobertDober added the enhancement label Jun 22, 2020

RobertDober added this to the 1.5 milestone Jun 22, 2020

RobertDober added the under investigation label Jun 22, 2020

RobertDober mentioned this issue Jun 22, 2020

Parse HTML recursively #358

Open

RobertDober added a commit that referenced this issue Jun 22, 2020

Refs: #356;

8afc39f

We'll keep it here from now, as in 1.5 this might become obsolete due to #358

RobertDober added a commit that referenced this issue Jun 22, 2020

Refs: #356;

b6606e6

We'll keep it here from now, as in 1.5 this might become obsolete due to #358

eksperimental mentioned this issue Jun 26, 2020

Introduce Markdown.AST elixir-lang/ex_doc#1196

Closed

RobertDober mentioned this issue Jul 1, 2020

Implement an Option that annotates the AST with the parsed verbatim text (HTML tags only?) RobertDober/earmark_parser#8

Closed

RobertDober added the EarmarkParserIssue label Jul 1, 2020

RobertDober mentioned this issue Jul 19, 2020

Parse HTML recursively RobertDober/earmark_parser#7

Open

Eiji7 mentioned this issue May 13, 2021

Consider adding HTML kbd styles livebook-dev/livebook#269

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug parsing HTML (<pre><code>) #356

bug parsing HTML (<pre><code>) #356

eksperimental commented Jun 22, 2020

RobertDober commented Jun 22, 2020 •

edited

Loading

eksperimental commented Jun 22, 2020

RobertDober commented Jun 22, 2020

eksperimental commented Jun 26, 2020 •

edited

Loading

RobertDober commented Jun 26, 2020

eksperimental commented Jun 26, 2020

RobertDober commented Jul 1, 2020

bug parsing HTML (<pre><code>) #356

bug parsing HTML (<pre><code>) #356

Comments

eksperimental commented Jun 22, 2020

RobertDober commented Jun 22, 2020 • edited Loading

Status Quo

eksperimental commented Jun 22, 2020

RobertDober commented Jun 22, 2020

eksperimental commented Jun 26, 2020 • edited Loading

RobertDober commented Jun 26, 2020

eksperimental commented Jun 26, 2020

RobertDober commented Jul 1, 2020

RobertDober commented Jun 22, 2020 •

edited

Loading

eksperimental commented Jun 26, 2020 •

edited

Loading