-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raw HTML blocks are not supported #8
Comments
One issue with supporting raw HTML described by the CommonMark spec is
the requirement of malformed HTML which there is no way to convert to
sxml. For example:
<foo>
<bar>
</foo>
</bar>
Will require the output to be exactly the same. We can not support this
in sxml as we can only create valid XML.
When I originally started the project, I recall the spec mentioning
HTML blocks do not need to be supported when the output is something
other than HTML. So I did not bother implementing HTML blocks or inline
HTML.
So my options are to either avoid using sxml and following the
CommonMark spec and output HTML or to go off the spec and only allow
balanced HTML nodes in HTML blocks and inline HTML. I believe I want to
pursue both options, but with more focus on the sxml output. It is on my
todo list for guile-commonmark after I update this project to the latest
version of the spec.
|
Hi @OrangeShark! Is there any progress on this? I think it would be nice to support raw html in some way or another. I guess you are not very keen on avoiding sxml altogether? |
I suppose a simple way around the problem is to parse inline HTML and print it as text if it is invalid. This would make certain use cases for raw HTML blocks impossible (such as generating head and tail fragments to wrap some other content), but it seems like a small loss compared to not having any HTML block support. |
guile-lib contains a "pragmatic" html parser I wonder if it could be of help, here It "attempts to recover structure" "The HtmlPrag parsing behavior is permissive in that it accepts erroneous HTML, handling several classes of HTML syntax errors gracefully, without yielding a parse error." A point of doubt for me is this one: "Note that valid XHTML input is of course better handled by a validating XML parser like [SSAX]." I wonder if guile-commonmark could switch to a parser or another depending on the correctness of the material at hand |
I just ran into this myself. @OrangeShark are there any plans to add support for raw HTML? I'm happy to get involved if it's just a matter of developer time. |
Soooo I've been working through the problems here this past week and I think I am close to solutions. Since the CommonMark format allows embedding any arbitrary HTML, the means that the resulting AST does not reflect the shape of the HTML node tree, in the general case. So, as noted above, you cannot directly convert a CommonMark AST to SXML when block/inline HTML nodes are present. You have to serialize to HTML first and then parse that. I propose the following:
I think item 3 is particularly important because it will allow guile-commonmark to continue to work as it does today, without support for embedded HTML. The new I have a WIP branch that can parse block and inline HTML that is close-but-not-quite compliant with the spec. I also have a |
The CommonMark Spec recognizes HTML blocks, i.e. "a group of lines that is treated as raw HTML (and will not be escaped in HTML output)." See http://spec.commonmark.org/0.26/#html-blocks.
Guile-commonmark does not seem to support these kinds of blocks.
The text was updated successfully, but these errors were encountered: