Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creole reader #3994

Merged
merged 49 commits into from
Oct 26, 2017
Merged

Creole reader #3994

merged 49 commits into from
Oct 26, 2017

Conversation

swilde
Copy link
Contributor

@swilde swilde commented Oct 25, 2017

I wrote a reader for Creole 1.0 http://www.wikicreole.org/wiki/Creole1.0 including a bunch of tests.
It is feature complete, not very thoroughly tested yet, though.

swilde and others added 30 commits September 11, 2017 11:34
No real functionality besides preliminary bold and italics yet.
No real functionality besides preliminary bold and italics yet.
So far this covers only things the parser already supports.
Fails test, as sublists should not be embedded in a list item!
Covering ordered and unordered tests, even mixed.  Tests for
formatting in list items still missing...
Used the opportunity for a little refactoring.
@jgm
Copy link
Owner

jgm commented Oct 25, 2017

Fantastic!

Have you tested this much on real-world creole? I tried it out on the source of http://www.wikicreole.org/wiki/Creole1.0 (which I'm assuming is creole), and I noticed some
differences between the HTML pandoc generated and the page that displays on the wiki.

For example:

  • some of the code samples are showing up as inline code, where they have code blocks on the wiki
  • Sample output: "italics" (instead of actually being in italics)
  • Missing space: "ableto cross paragraphs"
  • Missing emphasis in "__blog-style vs. wiki-style__", "__must__"
  • [long discussion|Talk.Change Linebreak Markup Proposal] not linkified
  • Sample output for table doesn't show up as a table (maybe because there's not a blank line before it?)

I'd suggest running this against a number of real-world creole texts and comparing your parser's output to the output of a standard creole parser. (I note that the creole wiki's sandbox says it's not a reference implementation of creole -- so maybe the above are bugs in it rather than in your reader -- but presumably there is a reference implementation somewhere!)

What you have looks close to being ready to merge, and if some of the kinks can be ironed out soon enough, we might even be able to include this in the upcoming 2.0 release.

@jgm
Copy link
Owner

jgm commented Oct 25, 2017

PS. In addition to the highly focused tests you have, you might add some tests with multiple blocks, since it's important to test the transitions between blocks. Or you could add an "old" style reader (golden) test, like mediawiki-reader.wiki/mediawiki-reader.native (put files in test/ and add to Test.Pandoc.Old), to supplement the tests you have.

@swilde
Copy link
Contributor Author

swilde commented Oct 26, 2017

Thanks for your reply.

Have you tested this much on real-world creole?

Yes. I tried it with a few creole documents from the company I'm
working for.

I tried it out on the source of
http://www.wikicreole.org/wiki/Creole1.0 (which I'm assuming is
creole), and I noticed some differences between the HTML pandoc
generated and the page that displays on the wiki.

I'll check them out and make fixes where I think they are appropriate.
(Might take some time, this is a spare time project of mine).

I'd suggest running this against a number of real-world creole texts
and comparing your parser's output to the output of a standard
creole parser. (I note that the creole wiki's sandbox says it's not
a reference implementation of creole -- so maybe the above are bugs
in it rather than in your reader -- but presumably there is a
reference implementation somewhere!)

There is no reference implementation I'm aware of, and to my best
knowledge there is no official reference implementation. Quite
contrary: the implementation of the official wiki is known to be not
compliant to Creole 1.0.

So the only official reference for Creole 1.0 is the textual
description on http://www.wikicreole.org/wiki/Creole1.0
(which is is kind of "wobbily" as all informal specifications by
example tend to be).

As a reference test case I used creole1.0test.txt from
http://www.wikicreole.org/wiki/Creole1.0TestCases as pointed out on
the site the text file is preferred as to avoid onfusion caused by
bugs in this wiki engine.

What you have looks close to being ready to merge, and if some of
the kinks can be ironed out soon enough, we might even be able to
include this in the upcoming 2.0 release.

Thant really sounds great. Thanks!

@swilde
Copy link
Contributor Author

swilde commented Oct 26, 2017

Hi,

I looked into the problems you reported:

  • some of the code samples are showing up as inline code, where they
    have code blocks on the wiki

This is the case e.g. with the description of bold and italic. But
I'd argue that my rendering is correct.

For example, the wiki source reads

Creole:
{{{ **bold** }}}

Recommended XHTML:
{{{ <strong>bold</strong> }}}

The standard says on NoWiki (reformatted, verbatim text):

This works inline or as a block. [...] As a block, the three curly
braces should be on one line by itself to open and another line of
three curly braces should be on a line by itself to close, without
leading spaces.

So {{{ **bold** }}} is inline nowiki, as a block it would have to
be:

{{{
**bold**
}}}
  • Sample output: "italics" (instead of actually being in italics)

It's ''italics'' to be precise, and that's simply what is in the wiki
source. This is no valid Creole markup, it is the Wiki Syntax used by
JSPWiki, which is the base technology used on the Wiki.

Same is true for __bold__.

  • Missing space: "ableto cross paragraphs"

Right. I fixed this.

  • Missing emphasis in "__blog-style vs. wiki-style__", "__must__"

Once again, this is not Creole but JSPWiki markup.

  • [long discussion|Talk.Change Linebreak Markup Proposal] not linkified

Same here: Creole link would be [[foo bar|Link Name here]] (double brackets).

  • Sample output for table doesn't show up as a table (maybe because
    there's not a blank line before it?)

There are two issues:

  1. Once again it's no valid Creole, table headers are marked with |=
    not || in Creole.

  2. You are right, my parser required a blank line to separate tables
    from preceding paragraphs. I changed that now.

Regarding the additional Tests. I'll look into that the next days...

What else would be required for the reader to get merged?

cheers
sascha

@jgm jgm merged commit 66fd324 into jgm:master Oct 26, 2017
@jgm
Copy link
Owner

jgm commented Oct 26, 2017

I think it's already good to merge. More tests would be good, but we can merge those in later.

@jgm
Copy link
Owner

jgm commented Oct 26, 2017

Thank you!

@mb21 mb21 mentioned this pull request Nov 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants