Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lua filters #3514

Merged
merged 10 commits into from
Mar 20, 2017
Merged

Lua filters #3514

merged 10 commits into from
Mar 20, 2017

Conversation

tarleb
Copy link
Collaborator

@tarleb tarleb commented Mar 19, 2017

Provide basic support for lua filters.

This needs more work, but is already usable in its current state.

Closes: #3395

@jgm
Copy link
Owner

jgm commented Mar 19, 2017

I'm excited about this!
Looks like you need an import of Control.Applicative in one place (ghc 7.8):

src/Scripting/Lua/Aeson.hs:167:60:
    Not in scope: ‘*>’

Also, it would be good to add the --lua-filter option to MANUAL.txt and provide some brief documentation on how to write filters (with an example).

@jgm
Copy link
Owner

jgm commented Mar 19, 2017

Have you done any benchmarks with a realistic filter on a realistic document, comparing python and lua, say?

@tarleb
Copy link
Collaborator Author

tarleb commented Mar 19, 2017

I forgot about documentation; I'll amend the PR tomorrow.

The performance tests that I did are very rough. I didn't get around yet to implement lazy-loading of element content, so filtering blocks with a lot of content is rather slow. Here are my comparisons with panflute, using a real-world Markdown manuscript as input:

Element checked in filter occurences in doc lua filter panflute
Str 5794 0.89s 1.1s
Para 88 1.05s 1.19s
Plain 132 0.48s 1.15s
Str and Para 5926 1.5s 1.2s

For comparison: filtering this document with /bin/cat takes 0.55s.

Performance suffers for elements containing a lot of content (lazy serialization should fix this) and if elements are serialized repeatedly.

Converting to and from lua through the lua stack is rather slow. The performance advantage of lua filters stems from the fact that elements can be serialized selectively.

@tarleb
Copy link
Collaborator Author

tarleb commented Mar 19, 2017

I should also mention the things that would be nice-to-have but haven't been implemented yet:

  • Element deletion by returning nil from a function.
  • Accepting lists of elements (Inlines and Blocks) as return values.
  • Pushing element content to lua on demand only.
  • Meta-tables for all elements.
  • Nicer error messages.

I don't think any of these are blockers, but all of them impact usability.

@tarleb tarleb force-pushed the lua-filters branch 4 times, most recently from e1bf5f3 to 0f771ed Compare March 20, 2017 13:09
@jgm jgm merged commit f2f6851 into jgm:master Mar 20, 2017
@tarleb tarleb deleted the lua-filters branch March 20, 2017 14:32
@jgm
Copy link
Owner

jgm commented Mar 20, 2017

Started playing with this a bit. First thing I tried to do was create a caps.lua that capitalizes strings.
First tried

return {
  { Str = function (inline)
      return pandoc.Str(string.upper(inline.c))
  end,
  }
}

This failed when I ran it on a Markdown file with non-ascii characters.

pandoc: Cannot decode byte '\x99': Data.Text.Internal.Encoding.decodeUtf8: Invalid UTF-8 stream

First reaction: too bad hslua packages lua 5.1, not 5.2 which has a built-in utf8 library. (Is there a reason for that?) Second reaction: no problem, I'll just install luautf8 with luarocks and use this:

local utf8 = require("lua-utf8")
return {
  { Str = function (inline)
      return pandoc.Str(utf8.upper(inline.c))
  end,
  }
}

But this didn't work.

PANIC: unprotected error in call to Lua API ([string "caps.lua"]:1: module 'lua-utf8' not found:

If I open a lua repl, I can require("lua-utf8") with no problems. I guess this is due to a difference in the way our lua environment is set up. Is it possible to set it up in such a way that users can take advantage of libraries installed via lua rocks?

@tarleb
Copy link
Collaborator Author

tarleb commented Mar 20, 2017

I'll need to look deeper into this. Regarding the lua version: the reason hslua ships with 5.1 is that this is the last version compatible with LuaJIT. I'd like to see a new hslua release that's using 5.3, but this would make hslua unusable for people who rely on LuaJIT. On the other hand, pandoc seems to be the only active project on hackage that is using hslua. Citing osa1:

LuaJIT is the reason why I didn't update the library to use a more recent Lua. At least in my use case, I prefer 5.1 with LuaJIT over 5.2 or even 5.3 (which IMHO has more useful additions over Lua 5.1 like integers, new division operator etc.).

I'm no longer a user though, so I don't have a vote on this. FWIW I had a branch (I can't find it right now) that allowed both 5.1 and 5.2 with a Cabal flag, but it had awful amount of CPPs. So it's certainly possible, but it'll be painful for the maintainers.

See also the previously mentioned road map proposal hslua/hslua#46.

I did a quick test requiring penlight (as I had that installed), and it worked after running eval "$(luarocks path)", so maybe external unicode libraries are still an option.

jgm pushed a commit that referenced this pull request Mar 21, 2017
* Add `--lua-filter` option.  This works like `--filter` but takes pathnames of special lua filters and uses the lua interpreter baked into pandoc, so that no external interpreter is needed.  Note that lua filters are all applied after regular filters, regardless of their position on the command line.
* Add Text.Pandoc.Lua, exporting `runLuaFilter`.  Add `pandoc.lua` to data files.
* Add private module Text.Pandoc.Lua.PandocModule to supply the default lua module.
* Add Tests.Lua to tests.
* Add data/pandoc.lua, the lua module pandoc imports when processing its lua filters.
* Document in MANUAL.txt.
@tarleb
Copy link
Collaborator Author

tarleb commented Mar 21, 2017

Pure lua modules seem to work find, but C modules cause problems. I was able to load slnunicode and identified some potential problems along the way:

  • Paths: both LUA_PATH and LUA_CPATH must be set.
  • The lua lib, when compiled with hslua, does not contain some symbols like lua_gettop. From what I understand, these are omitted by the linker and therefor not present in the generated lib. This can be worked-around by compiling hslua with the system-lua flag, which causes the shared library provided by the distribution to be used instead.
  • The last step of the previous point might require to change the name of the Extra-Libraries entry in hslua.cabal and to specify --extra-lib-dirs and --extra-include-dirs as necessary.

I'll have to figure out a way of compiling hslua such that no symbols get lost in the process.

@tarleb
Copy link
Collaborator Author

tarleb commented Mar 21, 2017

The alternative might be to replicate the utf8 module using haskell. It would likely be less performant, though.

@jgm
Copy link
Owner

jgm commented Mar 22, 2017 via email

@tarleb
Copy link
Collaborator Author

tarleb commented Mar 22, 2017

Ok, found the compiler flags to get it working on linux. Not sure if the same approach also works for windows and OSX, I need to do more testing. I hope for the fix to become available later this week.

Do you plan on distributing slnunicode as part of pandoc, compile it into the binary, or should users get it separately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants