Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(cmark): upgrade pulldown-cmark to 0.11 #108

Merged
merged 20 commits into from
Aug 9, 2024

Conversation

waynr
Copy link
Contributor

@waynr waynr commented Jul 5, 2024

Hi! I found this project when I encountered the bug described in denoland/deno#19458.

My understanding is that in order for deno fmt not to mangle frontmatter, dprint-plugin-markdown needs to update to pulldown-cmark 0.10 or later in order to incorporate the Tag::MetadataBlock variant into the ast builder.

This PR is not quite finished since I wanted to get some early feedback about if this work is even desirable and what all needs to happen for it to be merged. I also imagine that this is the kind of work that a maintainer may have already begun working on locally so I thought I might save myself some time by not duplicating any existing efforts.

Here are what I imageine the next steps will be if I do continue this work:

  • fix test_issue26_with_carriage_return_line_feeds, which seems to be broken by this
  • add corresponding variants for Tag::HtmlBlock and Tag::MetadataBlock to the Node enum (and write parsers for them)
  • add corresponding variants for Event::InlineMath, Event::DisplayMath, Event::InlineHtml to the Node enum
  • write tests to validate each of the above

It's not clear to me yet if there is anything to be done beyond merely parsing these new Tag and Event variants -- is there any special consideration needed include formatting them in the output?

@CLAassistant
Copy link

CLAassistant commented Jul 5, 2024

CLA assistant check
All committers have signed the CLA.

@waynr
Copy link
Contributor Author

waynr commented Jul 8, 2024

I've just pushed a bunch of commits for this, but it's still a work in progress.

The pulldown-cmark changes seem to be somewhat non-trivial. Eg. the introduction of HtmlBlock, and what seems to be a new possibility that TaskListMarker gets parsed not as a direct descendent of an Item but may actually be subordinate to a Paragraph contained by the Item.

So I keep running up against a couple different issues:

  • weird edge cases involving the ignore directives
    • eg <!-- dprint-ignore -->\n is interpreted as a
      10..34 Start(HtmlBlock)
      10..32 Html(Borrowed("<!-- dprint-ignore -->"))
      33..34 Html(Borrowed("\n"))
      10..34 End(HtmlBlock)
      
      This is problematic because the Html(Borrowed("\n") is eventually just disappeared by gen_html since the inner string is stripped out. I believe that prior to this pulldown-cmark update this would have been interpreted as Html(Borrowed("<!-- dprint-ignore -->\n")).
      This is just one example, I've spent a lot of time tinkering with gen_nodes and the indexing involved in counting newlines in the original text.
  • Speaking of gen_nodes, I find the logic in there that references the original text to be somewhat confusing. It feels out of place to be referencing the original text at the point of generating formatted output rather than collecting whatever is needed when parsing and stuffing it into the AST nodes that need it.

Anyway, this isn't meant as a criticism of your code which I do find very coherent and easy to navigate overall. Just describing some problems I've run into.

I do have a couple suggestions for a refactor which I would be willing to do myself if you're open to it:

  • Introduce generation::common::ast_nodes::Node variants to represent each of the ignore directives part.
    • These would be produced when parsing Html and HtmlBlocks.
    • There would likely be IgnoreFile, IgnoreBlock (with text field containing the raw text to be preserved), and IgnoreLine variants.
    • I believe this would enable a generation::generate::gen_nodes to be greatly simplified.
  • Capture preceeding and following newline information from the original text at the parsing stage. Newlines could then be stored in their own generation::common::ast_nodes::Node type.
    • This would simplify logic in the formatted output generator that involves counting newlines in some range. I have a vague notion about how generation::generate::gen_nodes could be simplified by creating a moving window iterator over nodes.
    • I haven't had a chance yet to really see where the all the file_text field of the generation::gen_types::Context type is used but I think that could potentially be removed if it's not needed for logic that involves counting newlines in a given range.

If you're open to this kind of contribution, what I would likely to is open a separate PR to do the refactor without also pulling in pulldown-cmark since that really muddies the waters. My hope is that the refactor makes the dependency updates PR much easier to accomplish.

@dsherret
Copy link
Member

dsherret commented Aug 2, 2024

@waynr I looked into this one a bit and got a little bit further (it's getting to the spec tests now, but a few are failing).

In this case, I think it's just easiest to parse the HtmlBlock as Html like before and I pushed a commit that does that. I think we can save any big architectural changes for later.

Would you be willing to add the missing variants? If not, I can take a look later (I'm done on this for now, just took a brief look).

Copy link
Member

@dsherret dsherret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually good enough.

@dsherret dsherret merged commit d303d7d into dprint:main Aug 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants