Replace pegdown with modern markdown parser #81

sirthias · 2016-12-13T13:54:21Z

paradox currently builds upon pegdown as the underlying markdown parser. Although pegdown has a number of serious issues it comes with quite a large feature-set and has long been one of the few "go-to" markdown-parsing solutions on the JVM.

Unfortunately, pegdown is essentially unmaintained with crucial bugs not being fixed. Also its parsing performance is relatively bad. Parser runtime can sometimes even become exponential, which means that the parser either appears to "hang" completely or abort processing after a time-out.

These deficiencies, along with the availability of newer, more modern and better maintained alternatives, should be sufficient motivation to consider a switch to another underlying parser solution.

I'd recommend we look at commonmark-java and flexmark-java and evaluate, which one works better.
From his involvement with pegdown I know @vsch (the author of flexmark-java) as being very friendly and responsive, so I'd trust him with responsible maintenance for the foreseeable future.
commonmark-java on the other hand is maintained by Atlassian, which is certainly not bad either.

My impression is that either alternative will provide a much better foundation for all future work on paradox than pegdown.

The text was updated successfully, but these errors were encountered:

jonas · 2016-12-13T14:17:42Z

Any thought on Laika @sirthias ? I saw you played with it at one point. It's directive format is a big different, but it has a PDF renderer.

Also it looks like flexmark-java requires Java 8. Right now Paradox only requires Java 7 AFAIK.

vsch · 2016-12-13T14:23:32Z

@jonas, I am the author of flexmark-java and in discussion with @sirthias about pegdown end of life maintenance.

Java 8 language level features will be removed from flexmark-java in the next few days since it also prevents flexmark-java from supporting android. So this is not going to be an issue.

If you do decide to migrate to flexmark-java and are interested I can offer active help in the migration of paradox to flexmark-java since I had to make this trip for my plugin: https://github.com/vsch/idea-multimarkdown, which was intimately dependent on pegdown's every quirk and feature.

sirthias · 2016-12-13T14:30:08Z

Yes, I've played with Laika. It's not a bad project either, but it has a lot fewer users than commonmark-java or flexmark-java.

For example, flexmark-java is the basis of the Markdown Navigator plugin for Jetbrains IDEs, with a significant number of commercial users and almost 2 mio downloads last year.
So it's bound to be quite a bit more mature than Laika, and with better prospects for stable maintenance going forward.

Also, quite importantly, flexmark matches pegdown in terms of features-set (like tables, footnotes, etc.), which Laika doesn't.

jonas · 2016-12-13T14:39:17Z

@vsch I don't know if the Java version matters, but noted it because of Paradox's sbt plugin. Related with #14, do you have any thoughts on the complexity of implementing a Latex or PDF renderer for flexmark-java?

@sirthias Makes sense.

vsch · 2016-12-13T14:46:13Z

@jonas, Java version matters for android which is a show stopper for some pegdown users who are experiencing issues with ASM dependency in pegdown. So I have to address this. Lamda use can be easily replaced with anonymous classes and in IntelliJ IDEA carries no extra cost in typing or source code real-estate since the IDE collapses them to lambda looking code for display.

I haven't looked at difficulty of rendering PDF or Latex but don't see an issue since it is a matter of walking the AST and generating the required output. Flexmark AST is source based like pegdown, but unlike pegdown is a linked list based with parent link. If anything, doing it for flexmark will be an order of magnitude simpler plus it would be a very nice addition to its feature set.

Are there any existing open source implementations for Markdown to PDF and Latex converters that I can use as a stepping stone since I have not worked with either format before?

sirthias · 2016-12-13T14:49:30Z

@vsch Laika implements a PDF from MD generator whose implementation might be interesting. It's pure Scala though.

vsch · 2016-12-13T14:51:05Z

@sirthias, thanks, I will take a look at it. I am no expert in Scala but learned to be conversant with it out of necessity.

jonas · 2016-12-13T15:15:28Z

I'd be interested in helping out with PDF support if Paradox makes the move. Laika uses https://xmlgraphics.apache.org/ which should in theory keep everything JVM-based although I've never had much luck with XML-FO. I know that @eed3si9n has used Pandoc to generate PDFs via Latex by simply concatenating all Markdown files together. This however requires that each file is written with this in mind. In any case I don't think PDF support is a blocker for moving to a more modern parser.

@vsch One last question: I don't know if flexmark-java have support for custom directives, but in any case would it be possible to extend the parser to support the current directive syntax?

@ref:[Inline directive](b.md)

@@ snip [Leaf block directive](../scala/Obj.scala)

@@@ note
Container block directive
@@@

vsch · 2016-12-13T15:35:20Z

@jonas, any help will be greatly appreciated. It has been just me and the crickets behind me, working on this like mad, for the last year. 😄

flexmark does not have custom directives but has a very flexible extension API. I don't see it as being hard to add. If you can give me what you need with some constraints, if any exist for above elements then I can add it as an extension.

I am assuming that the above examples are additions on top of markdown, not affecting markdown syntax processing then they can be easily added as a node-processor, an extension that walks the AST and: removes, adds, modifies, re-arranges or creates new nodes based on whatever the code decides to do.

If the @@@ note can contain any markdown text spanning multiple paragraphs then it would require a block parser extension that would take care of splitting the content out of the stream so its markers do not accidentally span markdown inlines or other markdown block elements. Either way a standard, easy extension to implement.

If you have any options you would like to see in the extension such as rendering options, parsing options, please let me know. My aim in flexmark is to offer up configuration options for every extension that eliminates the need to write code for 99% of use cases.

sirthias · 2016-12-14T10:04:12Z

Just FYI here: I have just added a deprecation note to pegdown's README officially recommending flexmark-java as the best replacement.

eed3si9n · 2016-12-14T17:48:41Z

@sirthias Thanks for starting this discussion. You're the markdown parsing expert, so if you're saying flexmark-java is the way to go, I'm all for it.

vsch · 2016-12-21T03:34:14Z

@jonas, flexmark-java Java language level has been downgraded to 7.

Convenience class to convert pegdown extension flags to flexmark-java config options added. It makes migration easier. https://github.com/vsch/flexmark-java#pegdown-migration-helper

If you can give me the specs for custom directives I will add the extension to the next release.

jroper · 2017-04-07T01:36:53Z

I'd just like to say, while I'm definitely not against migrating, I don't think the current scenario is that grim. We've been using pegdown for many years now serving documentation on playframework.com. All markdown is parsed and rendered on the fly on each request to the documentation. Performance has never been an issue for our needs, and issues such as hanging have never manifested as being a problem. Aside from the initial contribution I made to support extensions, I don't think we've ever had a need for upstream changes or bugfixes in pegdown that I'm aware of.

Aside from support for custom extensions, whatever library we select will need to have the following features:

The ability to walk the markdown AST for purposes other than HTML rendering. Neither Play nor Lagom have migrated to Paradox yet, but eventually we will, and one feature we will be porting is our validation code. This walks the pegdown AST and validates things like internal links (including anchor links), validates extensions (that source code snippets exist), detects orphan pages, enforces certain linking conventions (eg, enforce that javadoc links are to the frames version, ie index.html?com/example/MyClass.html) and even validates external links. This validation (excluding external links) is run by CI and is we consider crucial to maintaining the integrity of our documentation.
The ability to customise HTML rendering, such as apply a custom heading anchor link convention, or the ability to modify links and anchor names to work when rendering all docs as a single HTML page (for conversion to PDF). Pegdown supports this by extending the ToHtmlSerializer and overriding specific methods.
GFM support, including like handling of line breaks.
Table support.

andyczerwonka · 2017-08-12T13:37:22Z

All, as I started down to hunt for solutions for #98 I quickly found myself here while researching markdown parsers. As per @jroper comment above, the ability to support PDF is clearly impacted by which markdown parser is chosen. Shall we close #98 as I think this item covers it?

pvlugter · 2017-08-19T00:37:11Z

Switching to flexmark-java sounds good to me too. And the feature set sounds great: AST with post processing support, detailed source positions in the AST, extensible at multiple levels, faster parser.

dwijnand · 2017-09-29T08:44:41Z

Switching to a parser that allows generating a PDF would solve #14.

wsargent · 2018-05-26T15:15:25Z

I'm also interested in using code fences (aka "verbatim") with scalafiddle out of the box by specifying it as "scala scalafiddle", which is something that should be available according to the commonmark spec: https://spec.commonmark.org/0.27/#example-111

See jrblevin/markdown-mode#184 for an example of this kind of usage.

I am aware there's a FiddleDirective, I just like using tut inline blocks more, and right now I have to do the following:

<div data-scalafiddle>
<pre>
def sum(a: Int, b: Int) = a + b

println(sum(2, 2))
</pre>
</div>

and add integration.js by hand.

jonas · 2018-08-21T17:31:06Z

I will start looking at this over the coming weeks.

Vladimir and I have discussed meeting in person if something blocks my progress. Thanks a lot @vsch for showing this level of support!

pvlugter · 2018-09-11T22:13:11Z

Awesome. Thanks @jonas and @vcsh.

nafg · 2022-01-31T00:53:20Z

Any updates?

nafg · 2022-01-31T00:59:14Z

I suggest adopting Ornate's markdown handling

jonas mentioned this issue Jan 2, 2017

Merge Scala and Java documentation akka/akka-http#527

Closed

andyczerwonka mentioned this issue Aug 12, 2017

Support EPUB generation #98

Closed

xuwei-k mentioned this issue Aug 17, 2018

support JDK 10 #235

Closed

raboof added the help wanted label Aug 17, 2018

raboof mentioned this issue Oct 9, 2018

Todo List in markdown support. #261

Open

pvlugter mentioned this issue Oct 18, 2020

Sublists are not rendered as expected #452

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace pegdown with modern markdown parser #81

Replace pegdown with modern markdown parser #81

sirthias commented Dec 13, 2016

jonas commented Dec 13, 2016

vsch commented Dec 13, 2016

sirthias commented Dec 13, 2016 •

edited

Loading

jonas commented Dec 13, 2016

vsch commented Dec 13, 2016

sirthias commented Dec 13, 2016

vsch commented Dec 13, 2016

jonas commented Dec 13, 2016 •

edited

Loading

vsch commented Dec 13, 2016

sirthias commented Dec 14, 2016 •

edited

Loading

eed3si9n commented Dec 14, 2016

vsch commented Dec 21, 2016

jroper commented Apr 7, 2017

andyczerwonka commented Aug 12, 2017 •

edited

Loading

pvlugter commented Aug 19, 2017

dwijnand commented Sep 29, 2017

wsargent commented May 26, 2018 •

edited

Loading

jonas commented Aug 21, 2018

pvlugter commented Sep 11, 2018

nafg commented Jan 31, 2022

nafg commented Jan 31, 2022

Replace pegdown with modern markdown parser #81

Replace pegdown with modern markdown parser #81

Comments

sirthias commented Dec 13, 2016

jonas commented Dec 13, 2016

vsch commented Dec 13, 2016

sirthias commented Dec 13, 2016 • edited Loading

jonas commented Dec 13, 2016

vsch commented Dec 13, 2016

sirthias commented Dec 13, 2016

vsch commented Dec 13, 2016

jonas commented Dec 13, 2016 • edited Loading

vsch commented Dec 13, 2016

sirthias commented Dec 14, 2016 • edited Loading

eed3si9n commented Dec 14, 2016

vsch commented Dec 21, 2016

jroper commented Apr 7, 2017

andyczerwonka commented Aug 12, 2017 • edited Loading

pvlugter commented Aug 19, 2017

dwijnand commented Sep 29, 2017

wsargent commented May 26, 2018 • edited Loading

jonas commented Aug 21, 2018

pvlugter commented Sep 11, 2018

nafg commented Jan 31, 2022

nafg commented Jan 31, 2022

sirthias commented Dec 13, 2016 •

edited

Loading

jonas commented Dec 13, 2016 •

edited

Loading

sirthias commented Dec 14, 2016 •

edited

Loading

andyczerwonka commented Aug 12, 2017 •

edited

Loading

wsargent commented May 26, 2018 •

edited

Loading