Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lua custom writers can only access metadata variables in one place #4957

Closed
Mihara opened this issue Oct 7, 2018 · 6 comments
Closed

Lua custom writers can only access metadata variables in one place #4957

Mihara opened this issue Oct 7, 2018 · 6 comments

Comments

@Mihara
Copy link

Mihara commented Oct 7, 2018

This is not a question, but rather, a feature request.

I'm working on a custom writer for an obscure format, which requires a lot of configuration variables, tweakable per-document, normally passed through metadata, because I don't have anywhere else to put them.

The problematic part about that is that the tweakables have to define the behavior of inlines, i.e. if a particular metadata variable is true, Str needs to behave in a certain way, while if it's not, it needs to behave in an entirely different way.

Based on my very limited understanding of Haskell, I am more or less convinced that by the time the first Lua function is called, the AST is complete, and all the metadata variables should be known to something. Nevertheless, the only way to get them in the custom writer code appears to be as parameters to the Doc Lua function. I can't even intercept Meta to track the variables as they come in through the tree and save them somewhere the inlines can get at it, like the documentation says I could do with a Lua-based filter, because Custom.hs does not call a function with that name.

Doc, however, runs too late in the process, after every inline has already completed its work. This necessitates a really crude solution: Inlines leave extra markup in place of the markup detected, which Doc then has to act upon through gsub.

I've been up and down the pandoc internals exposed to Lua code, but couldn't find a way to acquire the metadata when it's actually needed short of trying to walk the tree myself, and it doesn't seem like Custom.hs will let me do that, either.

Could something be done about that?

@tarleb
Copy link
Collaborator

tarleb commented Oct 7, 2018

If you are ok with a hacky solution involving Lua filters, consider the following approach.

As you noted, the first thing converted by the custom writer will be the first block. This is a fact we can (mis)use to signal any data that is needed by the writer. Probably the best suited as a messenger is the RawBlock element, as it is usually rare and doesn't contain nested inlines. We can add a writer function like this:

function RawBlock (format, str)
  print(format, str)
  if format == 'meta-hack' and str == 'yes' then
    _G.Str = function (s) return 'yes' end
    return ''
  end
  return str
end

This will replace the Str function that's defined on the top level with one which replaces all strings with just yes.

So now we just need to prepend a RawBlock with the information we want. This can be done using a Lua filter.

function Pandoc (el)
  local blocks = el.blocks
  if el.meta['meta-hack'] then
    table.insert(el.blocks, 1, pandoc.RawBlock('meta-hack', 'yes'))
  end
  return el
end

This will prepend a signaling RawBlock iff the metadata contains a (truthy) value of name meta-hack.

Another method to pass information to the writer is to use environment variables, but I can't think of a good way to set them from metadata.

@tarleb
Copy link
Collaborator

tarleb commented Oct 7, 2018

As for the feature request: we could think about exposing the custom writer functionality to Lua filters, so the filter would do all of the work, i.e. filtering and creating the final output.

@jgm
Copy link
Owner

jgm commented Oct 8, 2018 via email

@Mihara
Copy link
Author

Mihara commented Oct 8, 2018

If you are ok with a hacky solution involving Lua filters, consider the following approach.

That's a clever solution indeed, but in this particular case, will only be practical if I can make the custom writer and the filter coexist in the same file, (delivery reasons) and I can't just take over the RawBlock -- it is getting used in certain documents, so I think I would still need to imitate something like an HTML comment and recognize it, and then I would still need to parse the values inside the RawBlock even if I give it a unique format identifier -- which could be problematic, because not all of them are simple toggles... Thanks for the tip, though, I'll try to make it work.

I'm not sure what kind of a feature would solve this conclusively. An exposed getMeta of some kind would be easiest to work with, because I have no clue how a filter that has access to custom writer functionality might work at all.

@tarleb
Copy link
Collaborator

tarleb commented Oct 8, 2018

Note that you have the full pandoc module at your disposal, including the read function. So if you can somehow put all the info in YAML format into a RawBlock at the beginning of your doc, then it would be easy to get structured values from that. E.g., for Markdown input

``` {=mymetadata}
---
foo: bar
---
```

Then read the block in the custom writer

local meta

function RawBlock(format, str)
  if format == 'mymetadata' then
    meta = pandoc.read(str).meta
    print(meta.foo)
  end
  return ''
end

Adding getMeta would indeed be simple enough. Maybe we can generalize this and make the full document available somehow? I was thinking along the lines of a global constant PANDOC_DOCUMENT with lazy accessors, or a function Setup which, when defined, gets a Pandoc object as input.

@Mihara
Copy link
Author

Mihara commented Oct 8, 2018

So if you can somehow put all the info in YAML format into a RawBlock at the beginning of your doc, then it would be easy to get structured values from that.

Unfortunately, I can't get away that easily, because the current build pipeline involves multiple meta blocks -- a global one and occasional per-document blocks which override the values. (It's a book's worth of text which must render into three disparate formats, one of which is exotic...)

Right now, I'm trying to muck through with serializing the meta values into something I could load() in the custom writer, with limited success. It's certainly a better idea than endless gsub, though, not having to do that at all would be ideal.

tarleb added a commit to tarleb/pandoc that referenced this issue Oct 13, 2018
Custom writers can specify an optional `Setup` function. The function
takes the full Pandoc document as input and should not return any value.
Users can use this function to configure the writer depending on the
given document's content or its metadata.

Closes: jgm#4957
tarleb added a commit to tarleb/pandoc that referenced this issue Oct 13, 2018
Custom writers can specify an optional `Setup` function. The function
takes the full Pandoc document as input and should not return any value.
Users can use this function to configure the writer depending on the
given document's content or its metadata.

Closes: jgm#4957
@jgm jgm closed this as completed in #4967 Oct 14, 2018
jgm pushed a commit that referenced this issue Oct 14, 2018
)

Custom writers can specify an optional `Setup` function. The function
takes the full Pandoc document as input and should not return any value.
Users can use this function to configure the writer depending on the
given document's content or its metadata.

data/sample.lua: add sample use of Setup function.
The change allows to control the image format used to encode the image
produced from dot code.

Closes #4957
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants