Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raw HTML blocks are not supported #8

Open
rekado opened this issue Jan 19, 2018 · 6 comments
Open

raw HTML blocks are not supported #8

rekado opened this issue Jan 19, 2018 · 6 comments

Comments

@rekado
Copy link

rekado commented Jan 19, 2018

The CommonMark Spec recognizes HTML blocks, i.e. "a group of lines that is treated as raw HTML (and will not be escaped in HTML output)." See http://spec.commonmark.org/0.26/#html-blocks.

Guile-commonmark does not seem to support these kinds of blocks.

@rekado rekado changed the title raw HTML block are not supported raw HTML blocks are not supported Jan 19, 2018
@OrangeShark
Copy link
Owner

OrangeShark commented Jan 20, 2018 via email

@co-dan
Copy link

co-dan commented Jun 24, 2019

Hi @OrangeShark! Is there any progress on this? I think it would be nice to support raw html in some way or another. I guess you are not very keen on avoiding sxml altogether?

@rekado
Copy link
Author

rekado commented Apr 15, 2020

I suppose a simple way around the problem is to parse inline HTML and print it as text if it is invalid. This would make certain use cases for raw HTML blocks impossible (such as generating head and tail fragments to wrap some other content), but it seems like a small loss compared to not having any HTML block support.

@humanitiesNerd
Copy link

guile-lib contains a "pragmatic" html parser

I wonder if it could be of help, here

It "attempts to recover structure"

"The HtmlPrag parsing behavior is permissive in that it accepts erroneous HTML, handling several classes of HTML syntax errors gracefully, without yielding a parse error."

A point of doubt for me is this one:

"Note that valid XHTML input is of course better handled by a validating XML parser like [SSAX]."

I wonder if guile-commonmark could switch to a parser or another depending on the correctness of the material at hand

@jnschaeffer
Copy link

I just ran into this myself. @OrangeShark are there any plans to add support for raw HTML? I'm happy to get involved if it's just a matter of developer time.

@davexunit
Copy link

Soooo I've been working through the problems here this past week and I think I am close to solutions.

Since the CommonMark format allows embedding any arbitrary HTML, the means that the resulting AST does not reflect the shape of the HTML node tree, in the general case. So, as noted above, you cannot directly convert a CommonMark AST to SXML when block/inline HTML nodes are present. You have to serialize to HTML first and then parse that.

I propose the following:

  1. Add support for parsing block and inline HTML
  2. Allow conversion of CommonMark AST to HTML text by providing a commonmark->html procedure in a new (commonmark html) module
  3. For compatibility reasons, convert raw HTML nodes to simple text nodes in commonmark->sxml

I think item 3 is particularly important because it will allow guile-commonmark to continue to work as it does today, without support for embedded HTML. The new commonmark->html interface will allow users to directly serialize to HTML (which is enough for many use-cases) or use their preferred HTML parser to convert it to SXML, such as guile-lib's (htmlprag) (which is what I'd want to do in Haunt). This avoids adding dependencies to guile-commonmark and punts on the complicated subject of HTML parsing (it's a user problem!)

I have a WIP branch that can parse block and inline HTML that is close-but-not-quite compliant with the spec. I also have a commonmark->html serializer. I'm working on adding a bunch of test cases from the CommonMark spec and tweaking code as I find issues. I hope to open a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants