Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug parsing HTML (<pre><code>) #356

Open
eksperimental opened this issue Jun 22, 2020 · 7 comments
Open

bug parsing HTML (<pre><code>) #356

eksperimental opened this issue Jun 22, 2020 · 7 comments

Comments

@eksperimental
Copy link
Contributor

Two semantically equal expressions, but the second one fails.

iex(1)> string = """
...(1)> <pre><code>
...(1)> 1 & 2
...(1)> 1 > 2
...(1)> </code>
...(1)> </pre>
...(1)> """
iex(2)> Earmark.as_ast(string)
{:ok,
 [
   {"pre", [], ["<code>", "1 & 2", "1 > 2", "</code>"],
    %{meta: %{verbatim: true}}}
 ], []}
iex(4)> string = """
...(4)> <pre><code>
...(4)> 1 & 2
...(4)> 1 > 2
...(4)> </code></pre>
...(4)> """
iex(5)> Earmark.as_ast(string)
{:error,
 [
   {"pre", [], ["<code>", "1 & 2", "1 > 2", "</code></pre>"],
    %{meta: %{verbatim: true}}}
 ], [{:warning, 1, "Failed to find closing <pre>"}]}
@RobertDober
Copy link
Collaborator

RobertDober commented Jun 22, 2020

Status Quo

Here is what kind of HTML Earmark supports, and I will update the documentation which is not good (was even missing lately)

  • Oneline HTML Tags
   <tag...>{content}</tag>{suffix}

which will render

   {"tag", [], ["content"], %{verbatim: true}} # 1.4.6 format
  • One level of a block
<tag>
    {content}
</tag>
{"tag", [], [content], %{verbatim: true}}

where both, <tag> and </tag> must be on their own line (original definition by Dave).
However your first example works as the result of permissive parsing, so maybe to avoid regressions I will rephrase that accordingly in the documentation.

So I will take two actions,

  • definitely add the above paragraph to the documentation --> 1.4.6

  • investigate about the second example (rule would be: opening tag must be on start of line, closing tag must be on end of line)

Ok with you?

@eksperimental
Copy link
Contributor Author

I just found that when testing it, and thought I would be good to report it. there are no worries about regressions.
Thanks for the info.

@RobertDober
Copy link
Collaborator

You have just named the game, I believe that all the issues you brought up are very valid and while investigating I have some hopes to recursively parse HTML with cleaner code, but not sure yet, however this cannot go into 1.4.6. but I will try to treat HTML nicely (against my will 😉) in 1.5
simply because of GFM.

RobertDober added a commit that referenced this issue Jun 22, 2020
We'll keep it here from now, as in 1.5 this might become obsolete due to #358
RobertDober added a commit that referenced this issue Jun 22, 2020
We'll keep it here from now, as in 1.5 this might become obsolete due to #358
@eksperimental
Copy link
Contributor Author

eksperimental commented Jun 26, 2020

Would it be possible to have an option to leave a copy of the original HTML element in the metadata whenever vertabim: true?
I think it will be useful in case we want to delegate to an specialized library, such as Floki to deal with the HTML parsing.
Well, I'm experimenting with that idea in ExDoc.
Thank you.

@RobertDober
Copy link
Collaborator

Do you mean

   {"div", [{"class", "elixir"}] [best code ever] %{verbatim: true}}

--->

   {"div", [{"class", "elixir"}] [best code ever] %{verbatim: true, html: ~s[<div class="elixir">best code ever</div>]}}

sure sounds like a sound idea to me.

@eksperimental
Copy link
Contributor Author

yes. exactly that!

@RobertDober
Copy link
Collaborator

This issue should be obsoleted by #358 (which is RobertDober/earmark_parser#7) and the Verbatim Annotation Part is implemented by RobertDober/earmark_parser#8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants