Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
At Spritely, we'd really like it if we could embed arbitrary HTML in our Markdown files that we use in our Haunt website. It's also a longstanding issue with guile-commonmark first reported in 2018: #8
The fundamental difficulty, as I understand it, is that since the CommonMark format allows embedding any arbitrary HTML (even garbage), the resulting CommonMark AST does not necessarily reflect the shape of the HTML node tree. So, you cannot directly convert a CommonMark AST to SXML when block/inline HTML nodes are present. You have to serialize to HTML first and then use an HTML to SXML parser.
This pull request does the following:
Adds new
html-block
andinline-html
node types in(commonmark node)
.Adds support for parsing block and inline HTML to
(commonmark blocks)
and(commonmark inlines)
.Adds support for direct conversion of CommonMark AST to HTML text with a new
commonmark->html
procedure in a new(commonmark html)
module.For compatibility with existing behavior, HTML nodes are converted to simple text nodes in
commonmark->sxml
, which means they will be escaped in the output as if they weren't parsed in the first place.I think item 4 is particularly important because it will allow guile-commonmark to continue to work as it does today, without support for embedded HTML. The new
commonmark->html
interface will allow users to directly serialize to HTML (which is enough for many use-cases) or use their preferred HTML parser to convert it to SXML, such as guile-lib's(htmlprag)
(which is what I'd want to do with Haunt). This avoids adding dependencies to guile-commonmark and punts on the complicated subject of HTML parsing.The test suite file I added incorporates all 64 tests of inline/block HTML included in the CommonMark specification. Additionally, I tested that my fork of guile-commonmark can successfully parse all of the existing Spritely blog posts, serialize them to HTML, and then parse them again using
html->shtml
in(htmlprag)
.(The test suite in general is not green, though. There are tests failing on master. I have not made the situation worse, in any case.)