chore(cmark): upgrade pulldown-cmark to 0.11 #108

waynr · 2024-07-05T20:24:10Z

Hi! I found this project when I encountered the bug described in denoland/deno#19458.

My understanding is that in order for deno fmt not to mangle frontmatter, dprint-plugin-markdown needs to update to pulldown-cmark 0.10 or later in order to incorporate the Tag::MetadataBlock variant into the ast builder.

This PR is not quite finished since I wanted to get some early feedback about if this work is even desirable and what all needs to happen for it to be merged. I also imagine that this is the kind of work that a maintainer may have already begun working on locally so I thought I might save myself some time by not duplicating any existing efforts.

Here are what I imageine the next steps will be if I do continue this work:

fix test_issue26_with_carriage_return_line_feeds, which seems to be broken by this
add corresponding variants for Tag::HtmlBlock and Tag::MetadataBlock to the Node enum (and write parsers for them)
add corresponding variants for Event::InlineMath, Event::DisplayMath, Event::InlineHtml to the Node enum
write tests to validate each of the above

It's not clear to me yet if there is anything to be done beyond merely parsing these new Tag and Event variants -- is there any special consideration needed include formatting them in the output?

CLAassistant · 2024-07-05T20:24:16Z

All committers have signed the CLA.

…ut from raw string

…d content

This reverts commit fb55388.

this prevents test cases from panicking and hanging

…print

…arker and content" This reverts commit f603645.

…marker and content" This reverts commit 0e0e148.

…ithout Paragraph

waynr · 2024-07-08T05:14:21Z

I've just pushed a bunch of commits for this, but it's still a work in progress.

The pulldown-cmark changes seem to be somewhat non-trivial. Eg. the introduction of HtmlBlock, and what seems to be a new possibility that TaskListMarker gets parsed not as a direct descendent of an Item but may actually be subordinate to a Paragraph contained by the Item.

So I keep running up against a couple different issues:

weird edge cases involving the ignore directives
- eg \n is interpreted as a
```
10..34 Start(HtmlBlock)
10..32 Html(Borrowed(""))
33..34 Html(Borrowed("\n"))
10..34 End(HtmlBlock)
```
  This is problematic because the Html(Borrowed("\n") is eventually just disappeared by gen_html since the inner string is stripped out. I believe that prior to this pulldown-cmark update this would have been interpreted as Html(Borrowed("\n")).
  This is just one example, I've spent a lot of time tinkering with gen_nodes and the indexing involved in counting newlines in the original text.
Speaking of gen_nodes, I find the logic in there that references the original text to be somewhat confusing. It feels out of place to be referencing the original text at the point of generating formatted output rather than collecting whatever is needed when parsing and stuffing it into the AST nodes that need it.

Anyway, this isn't meant as a criticism of your code which I do find very coherent and easy to navigate overall. Just describing some problems I've run into.

I do have a couple suggestions for a refactor which I would be willing to do myself if you're open to it:

Introduce generation::common::ast_nodes::Node variants to represent each of the ignore directives part.
- These would be produced when parsing Html and HtmlBlocks.
- There would likely be IgnoreFile, IgnoreBlock (with text field containing the raw text to be preserved), and IgnoreLine variants.
- I believe this would enable a generation::generate::gen_nodes to be greatly simplified.
Capture preceeding and following newline information from the original text at the parsing stage. Newlines could then be stored in their own generation::common::ast_nodes::Node type.
- This would simplify logic in the formatted output generator that involves counting newlines in some range. I have a vague notion about how generation::generate::gen_nodes could be simplified by creating a moving window iterator over nodes.
- I haven't had a chance yet to really see where the all the file_text field of the generation::gen_types::Context type is used but I think that could potentially be removed if it's not needed for logic that involves counting newlines in a given range.

If you're open to this kind of contribution, what I would likely to is open a separate PR to do the refactor without also pulling in pulldown-cmark since that really muddies the waters. My hope is that the refactor makes the dependency updates PR much easier to accomplish.

dsherret · 2024-08-02T15:54:20Z

@waynr I looked into this one a bit and got a little bit further (it's getting to the spec tests now, but a few are failing).

In this case, I think it's just easiest to parse the HtmlBlock as Html like before and I pushed a commit that does that. I think we can save any big architectural changes for later.

Would you be willing to add the missing variants? If not, I can take a look later (I'm done on this for now, just took a brief look).

dsherret

This is actually good enough.

chore(cmark): upgrade pulldown-cmark to 0.11

bb48152

refactor(format_text): remove custom yaml header parser

fb55388

waynr mentioned this pull request Jul 6, 2024

"spec" tests are nearly impossible to debug due to bespoke testrunner #109

Closed

waynr added 13 commits July 6, 2024 12:56

refactor: handle YamlHeader parsing via pulldown-cmark, generate outp…

e71786b

…ut from raw string

fix: gen_task_list_marker now needs to insert space between marker an…

f603645

…d content

fix: need separate generator for yaml/pluses generator

d13c1cc

Revert "refactor(format_text): remove custom yaml header parser"

97698fb

This reverts commit fb55388.

fix: pass full file text to parse_cmark_ast

c914478

fix: gen_html can't push newlines to be printed directly or something

643d547

this prevents test cases from panicking and hanging

fix: indexing bug in check for newline count

241e659

fix: don't add newline when there are no subsequent node elements to …

dda5cdd

…print

Revert "fix: gen_task_list_marker now needs to insert space between m…

0e0e148

…arker and content" This reverts commit f603645.

Reapply "fix: gen_task_list_marker now needs to insert space between …

dcc6e10

…marker and content" This reverts commit 0e0e148.

fix: move TaskListMarker handling from Item to Paragraph

c67b13a

fix: actually, TaskListMarker can still be subordinate to list Item w…

73c8e62

…ithout Paragraph

fix: handle HtmlBlock as a collection of nodes

0fd26f3

dsherret added 3 commits August 2, 2024 16:30

Merge branch 'main' into upgrade-dep/pulldown-cmark

1275443

Revert some changes

6449c26

parse html block as just html

dfed3d9

dsherret added 2 commits August 2, 2024 18:02

fix some tests

6c99444

all tests passing

f0e3dff

dsherret approved these changes Aug 9, 2024

View reviewed changes

dsherret merged commit d303d7d into dprint:main Aug 9, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(cmark): upgrade pulldown-cmark to 0.11 #108

chore(cmark): upgrade pulldown-cmark to 0.11 #108

waynr commented Jul 5, 2024

CLAassistant commented Jul 5, 2024 •

edited

Loading

waynr commented Jul 8, 2024

dsherret commented Aug 2, 2024

dsherret left a comment

chore(cmark): upgrade pulldown-cmark to 0.11 #108

chore(cmark): upgrade pulldown-cmark to 0.11 #108

Conversation

waynr commented Jul 5, 2024

CLAassistant commented Jul 5, 2024 • edited Loading

waynr commented Jul 8, 2024

dsherret commented Aug 2, 2024

dsherret left a comment

Choose a reason for hiding this comment

CLAassistant commented Jul 5, 2024 •

edited

Loading