Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom Lua readers #7669

Closed
jgm opened this issue Nov 5, 2021 · 4 comments
Closed

Add custom Lua readers #7669

jgm opened this issue Nov 5, 2021 · 4 comments

Comments

@jgm
Copy link
Owner

jgm commented Nov 5, 2021

These would be similar to custom Lua writers. Possible interface design:

function Reader(input, reader_options)
  -- parse input (string) into blocks, meta
  -- use lpeg for parsing and pandoc.XXX functions to create AST elements
  return pandoc.Pandoc(blocks, meta)
end

A custom reader would be called by using the name of a lua script as the reader, e.g.

pandoc -f special.lua -t html

Readers would be expected to define a Reader function that takes a string and reader options as input and returns a Pandoc AST.

@jgm jgm added the enhancement label Nov 5, 2021
@jgm
Copy link
Owner Author

jgm commented Nov 5, 2021

A prototype is up in the custom-reader branch.
This version just uses one parameter, input, leaving reader options available through a global variable as with lua filters.

@jgm
Copy link
Owner Author

jgm commented Nov 5, 2021

OK, this is fun. Sample custom reader which works with this branch!

-- A sample custom reader for a very simple markup language.
-- This parses a document into paragraphs separated by blank lines.
-- This is _{italic} and this is *{boldface}
-- This is an escaped special character: \_, \*, \{, \}
-- == text makes a level-2 heading
-- That's it!

-- For better performance we put these functions in local variables:
local P, S, R, Cf, Cc, Ct, V, Cs, Cg, Cb, B =
  lpeg.P, lpeg.S, lpeg.R, lpeg.Cf, lpeg.Cc, lpeg.Ct, lpeg.V,
  lpeg.Cs, lpeg.Cg, lpeg.Cb, lpeg.B

local whitespacechar = S(" \t\r\n")
local specialchar = S("_*{}\\")
local escapedchar = P"\\" * specialchar
         / function (x) return string.sub(x,2) end
local wordchar = (P(1) - (whitespacechar + specialchar)) + escapedchar
local spacechar = S(" \t")
local newline = P"\r"^-1 * P"\n"
local blanklines = newline * spacechar^0 * newline^1
local endline = newline - blanklines

-- Grammar
G = P{ "Pandoc",
  Pandoc = blanklines^-1 * Ct(V"Block"^0) / pandoc.Pandoc;
  Block = V"Header" + V"Para";
  Para = Ct(V"Inline"^1) * blanklines^-1 / pandoc.Para;
  Header = Ct(Cg(P("=")^1 / function(x) return #x end, "length")
             * spacechar^1
             * Cg(Ct(V"Inline"^0), "contents")
             * blanklines^-1) /
             function(res) return pandoc.Header(res.length, res.contents) end;
  Inline = V"Emph" + V"Str" + V"Space" + V"SoftBreak" + V"Special" ;
  Str = wordchar^1 / pandoc.Str;
  Space = spacechar^1 / pandoc.Space;
  SoftBreak = endline / pandoc.SoftBreak;
  Emph = Ct(P"_{" * Cg(Ct((V"Inline" - P"}")^1), "contents") * P"}") /
          function(res) return pandoc.Emph(res.contents) end;
  Special = specialchar / pandoc.Str;
}

function Reader(input)
  return lpeg.match(G, input)
end

@jgm
Copy link
Owner Author

jgm commented Nov 5, 2021

See PR #7671

jgm added a commit that referenced this issue Nov 6, 2021
New module Text.Pandoc.Readers.Custom, exporting
readCustom [API change].

Users can now do `-f myreader.lua` and pandoc will treat the
script myreader.lua as a custom reader, which parses an input
string to a pandoc AST, using the pandoc module defined for
Lua filters.

A sample custom reader can be found in data/reader.lua.

Closes #7669.
jgm added a commit that referenced this issue Nov 6, 2021
New module Text.Pandoc.Readers.Custom, exporting
readCustom [API change].

Users can now do `-f myreader.lua` and pandoc will treat the
script myreader.lua as a custom reader, which parses an input
string to a pandoc AST, using the pandoc module defined for
Lua filters.

A sample custom reader can be found in data/reader.lua.

Closes #7669.
@jgm jgm closed this as completed in ee2f002 Nov 6, 2021
@jgm
Copy link
Owner Author

jgm commented Nov 8, 2021

I've now added a fully functional Creole 1.0 parser as an example (data/creole.lua). It's less than 200 lines of Lua code -- and this custom reader outperforms pandoc's official creole reader!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant