Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pandoc support, refactor external helpers #4060

Merged
merged 1 commit into from
Nov 30, 2017

Conversation

betaveros
Copy link
Contributor

@betaveros betaveros commented Nov 7, 2017

Recognize the Pandoc format under the file extension .pandoc or .pdc,
and shell out to pandoc as an external helper to format Pandoc content.

Add a configuration option, externalHelperArguments, that can be used to
override the additional arguments passed to external helpers.

Refactor out repeated code with external helpers. Change the error
output formatting. I did not see any of the external helpers print the
string "<input>" to represent stdin as a file; just prepending the file
name to error output is more general and doesn't sacrifice that much in
terms of readability.

@bep
Copy link
Member

bep commented Nov 7, 2017

A quick question: Is it a remote possibility to do "shell injection" via os/exec so I could configure it to do

pandoc --strip-comments && rm -rf /

In the above I'm mostly worried about the people on the forum etc. that asks me questions of the type "can you clone my site and see what the problem is?" ... If I would start worrying about some malware type of situations, I would stop doing that.

Note that I would expect the golang team to protect against this. But I would like to be really sure.

@betaveros
Copy link
Contributor Author

I don't think shell injection can be achieved in exec.Command, but the broader point is well-taken; if you're worried about malicious config files we shouldn't allow the user to pass arbitrary arguments to external helpers, since these helpers might not be designed to be safe under those circumstances. For example, it looks like pandoc takes an option --filter=PROGRAM that will call an arbitrary executable (perhaps, a malicious shell script elsewhere in the repository), so an attacker with full control over pandoc's arguments could do a lot of damage even without shell injection.

It's unfortunate though, most of the allure of using pandoc is getting access to all of its options and I don't think there is a universal set that will make everybody who wants to use pandoc happy (I for one haven't even decided which of the math output formats I want for my static site yet). I could whitelist some pandoc options, but it would either be woefully incomplete or extremely bloated for an external helper that I understand probably isn't a central use case Hugo wants to support. Still, I guess an incomplete whitelist is better than nothing.

(If you're worried about malicious config files in general, I'll also note in passing that it looks like attackers can also specify arbitrary executables in the configuration option newContentEditor, so they could also cause a malicious script to be invoked on hugo new.)

@bep
Copy link
Member

bep commented Nov 7, 2017

If you're worried about malicious config files in general

I will make a note about the newContentEditor, but that command is at least not part of the main Hugo build.

@bep
Copy link
Member

bep commented Nov 7, 2017

I will think about this a little, about the cost vs value vs added support etc. This isn't "core Hugo", so any new additions in this department should come cheaply, and this currently does not look like it.

@betaveros
Copy link
Contributor Author

I dropped the options for now to keep things simple. If there's demand for external options we (or somebody else) can revisit this later.

Recognize the Pandoc format under the file extension .pandoc or .pdc,
and shell out to pandoc as an external helper to format Pandoc content.

Refactor out repeated code with external helpers. Change the error
output formatting. I did not see any of the external helpers print the
string "<input>" to represent stdin as a file; just prepending the file
name to error output is more general and doesn't sacrifice that much in
terms of readability.
@betaveros
Copy link
Contributor Author

Actually I'm going to go out on a limb and make --mathjax the single option to pandoc, since the default math option is rather lame, MathJax is somewhat officially recommended in the Hugo documentation, and the output markup looks fine for rendering with other LaTeX libraries if people want to customize that.

@betaveros
Copy link
Contributor Author

Any updates? The PR as it currently stands should be fairly cheap; it doesn't use pandoc any differently from the way it previously used asciidoc/asciidoctor/rst2html.

@betaveros
Copy link
Contributor Author

@bep any updates on the status of this PR, or anything else you'd like to be changed?

@bep
Copy link
Member

bep commented Nov 22, 2017

It looks very good, I will merge it in "a couple of days"; I will probably have to get a 0.31.1 patch release out the window, and try to keep the master branch as quiet as possible right after a release.

@shawnohare
Copy link

I stumbled upon this PR because I was interested in adding the feature as well. Thanks for putting in the effort to do this!

I wonder if the current references to "pandoc" (especially in the documentation) should be changed to more explicitly reference "pandoc markdown", since pandoc itself is not a format, per se.

@bep bep merged commit e69da7a into gohugoio:master Nov 30, 2017
@adityam
Copy link

adityam commented Jan 4, 2018

Is it possible to reconsider allowing options to pandoc (either via config.toml or via the options in each .pdc file)? In particular, I am thinking of the possibility of adding bibliography files and citation style. This will make it much easier to create course notes using hugo.

If you are reluctant to add arbitrary flags, one option is to simply add --filter pandoc-citeproc flag to pandoc.

@rdwatters
Copy link
Contributor

rdwatters commented Jan 4, 2018

@betaveros

I’m curious if you see utility in pandoc’s MD-to-Word functionality (what I admittedly have to use Pandoc for most at work) for Hugo sites. I could see some very cool COPE workflows for users who need to create PDF versions of content as well. I’ve been woefully absent in the Hugo world for the past few months. Thanks for your work on this. I might be an edge case, but I’m really excited about this feature!

@dreamcat4
Copy link

dreamcat4 commented Jan 23, 2018

Documentation here:

https://gohugo.io/content-management/formats/#additional-formats-through-external-helpers

This is a refactor to inlude panadoc support, still with default arguments - Hugo version 0.32 and higher.

@dreamcat4
Copy link

dreamcat4 commented Jan 23, 2018

Ah, it looks like you dropped this feature from the PR:

Add a configuration option, externalHelperArguments, that can be used to
override the additional arguments passed to external helpers.

Well I was actually very much looking forward to seeing that, for passing custom asciidoc default arguments. This is a valuable improvement (for asciidoc users).

I dropped the options for now to keep things simple. If there's demand for external options we (or somebody else) can revisit this later.

Yes I need this / would like to see this.

Is it possible to reconsider allowing options to pandoc (either via config.toml or via the options in each .pdc file)? In particular, I am thinking of the possibility of adding bibliography files and citation style. This will make it much easier to create course notes using hugo.

This is a similar / same reasons why asciidoctor users also need this externalHelperArguments override string in config.toml global configuration.

@betaveros do you still want to make a new PR for the externalHelperArguments feature? As a seperate PR ? @bep Can we have further discussions about this feature? Do you see the value in it?

@valouille
Copy link

valouille commented Feb 16, 2018

Hello,

For my usecase, I certainly see value in this feature, because I would like to be able to pass options to asciidoctor like -r asciidoctor-diagram for the generation of plantUML diagrams. Currently, I'm getting this error :

ERROR 2018/02/16 10:46:15 post/2018-01-012-doc.adoc: asciidoctor: WARNING: <stdin>: line 20: invalid style for literal block: plantuml

@bilke
Copy link

bilke commented Feb 22, 2018

@bep Would you allow the argument --filter pandoc-citeproc?

One had to check if the executable pandoc-citeproc is found in the path and if yes this argument can be added. I would make a PR if you agree.

cc: @adityam

@raghur
Copy link

raghur commented Aug 6, 2018

I've run into this as well - need to customize args passed to asciidoctor - attributes, plugins etc and right now there's no way to do it short of putting everything in each asciidoc file

Also, is something like PR 4310 planned? It doesn't say why it was closed (just points to this issue)

@kirubakaran
Copy link

@shawnohare Re your #4060 (comment)

I use org-mode. So I pass --from=org flag myself, as a temporary solution, as I've described here: https://kirubakaran.com/blog/org-pandoc-hugo/#update

I think that inferring from format from the filename would be a good solution. Example: foo.org.pandoc to send --from=org when calling Pandoc. Thoughts?

@ankitp94
Copy link

Is it possible to reconsider allowing options to pandoc (either via config.toml or via the options in each .pdc file)? In particular, I am thinking of the possibility of adding bibliography files and citation style. This will make it much easier to create course notes using hugo.

If you are reluctant to add arbitrary flags, one option is to simply add --filter pandoc-citeproc flag to pandoc.

This would be certainly helpful!

@adityam
Copy link

adityam commented May 30, 2019

I use the following hack to get pandoc filters to work:

In my .zshrc add:

alias hugo='PATH=$PWD/bin:$PATH hugo'

In my project directory, create a file bin/pandoc (with execute permissions):

#!/bin/sh

CSL=acm-siggraph

if [ -f /usr/bin/pandoc ]; then
  PANDOC=/usr/bin/pandoc
fi

if [ -f /usr/local/bin/pandoc ]; then
  PANDOC=/usr/local/bin/pandoc
fi

# Removed 
#  --filter pandoc-eqnos \
# because equation numbering is handled by mathjax
$PANDOC --filter pandoc-numbering \
        --bibliography=content/references.bib \
        --csl=csl/$CSL.csl \
        --mathjax \
        --metadata link-citations=true \
        --from markdown+fenced_divs+pipe_tables \
        "$@"

You can adapt the flags to pandoc as per your requirement.

@RMStoica-zivver
Copy link

A quick question: Is it a remote possibility to do "shell injection" via os/exec

Can this be averted by passing options to pandoc via the API, as opposed to shell invocation?

@SylvainGuieu
Copy link

I do not get the "pandoc format". For me the the format is markdown (or whatever) and pandoc is the renderer. Sounds for me like saying the "Safari format" when talking about rendering html.

Is there a way to setup pandoc without changing file extension or page front end for all the site ?

@rikhuijzer
Copy link

rikhuijzer commented May 9, 2020

The answer by @adityam is great. It basically overrides pandoc so that it is called with some extra flags. However, the solution also requires a global modification of hugo and the path variable which can be a bit tricky in certain situations. To avoid the global modification one can wrap the hugo executable. This can be done by adding the following two files and making them executable (chmod +x <filename>).

Set the file <project root>/hugo to

#!/bin/bash
PANDOC_ORIGINAL=$(which pandoc)
export PANDOC_ORIGINAL

# Overrides pandoc.
PATH=$PWD/bin:$PATH

hugo "$@"

and set the file <project root>/bin/pandoc to

#!/bin/bash
$PANDOC_ORIGINAL --bibliography=content/references.bib \
  "$@"

A site can now be served with the Pandoc overrides by calling ./hugo serve from <project root> instead of hugo serve.

@RMStoica-zivver
Copy link

@rikhuijzer it would be nice if we didn't have to resort to bash trickery though. it's an answer, not a solution

@rikhuijzer
Copy link

rikhuijzer commented May 11, 2020

@RMStoica-zivver Why do you aim this critique to me exactly? Actually, I get that this is not a priority. I assume that a typical user would like to build the Hugo website via CI. The default Hugo Docker image does not include pandoc [1]. So you will always have to create a build image yourself with Pandoc and Hugo. Switching to the Netlify build image [2] is okay but you don't have any Pandoc extensions there, such as pandoc-citeproc [2]. For a typical user I would say lets not use Pandoc and keep the build simple. An atypical user can copy the "bash trickery" listed above. About the "bash trickery". It is fairly straightforward, and I think that Hugo is quite properly set up that it even allows for such "trickery". They could have hard-coded Pandoc.

[1] https://hub.docker.com/r/jojomi/hugo/dockerfile
[2] https://github.com/netlify/build-image/blob/xenial/Dockerfile

@RMStoica-zivver
Copy link

@rikhuijzer it was not a critique of course, just a statement of fact - that the answer by @adytiam is a workaround, not a solution to the problem of hugo not passing arguments to pandoc or any other external renderer.
As for the discussion on typical or atypical users, I would refrain, since I do not know how many use pandoc/asciidoctor or would like to, but perhaps you know more.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.