Pandoc is an amazing universal document converter. Unfortunately, it just has a command-line interface. In this project, we enable the usage of Pandoc via a RESTful HTTP API, provide a mapping of Pandoc type identifiers to common media types, and wrap everything in a docker container, so that it can be easily used/deployed.
The API only allows POST requests. The data to be converted must be
passed in the request body. The header field Content-Type specifies
the input type and the header field Accept specifies the output type.
Since Pandoc uses its own type identifiers for input and output format, we
created a mapping between the Pandoc identifiers and
the corresponding media types. For instance, the Pandoc identifier html
maps to the media type text/html.
The mapping is incomplete since there does not exist
a media type for every format supported by Pandoc. Therefore, you can
also use the Pandoc identifiers in Content-Type and Accept but this
is not compliant with the HTTP specification. To be compliant, we support
the usage of application/x. as a prefix in front of a Pandoc identifier.
This prefix is the official media type tree for unregistered types.
To simplify the usage of this project, we wrapped everything into a docker container that can easily be deployed on any machine.
Pandoc uses latex to create pdfs. Since the latex dependencies add roughly 2gb to the docker image, we decided to create two images:
dwolters/pandoc-http:latestdoes not include latex and is therefore unable to create pdfs (uncompressed ~700mb, compressed ~280mb). The:latesttag is added by default if no tag is specified.dwolters/pandoc-http:latexincludes latex and be used to create pdfs (uncompressed ~2.7gb, compressed ~2gb). It takes a while to build or pull this image.
You can build the image yourself:
docker build -t dwolters/pandoc-http .Or install it via docker hub:
docker pull dwolters/pandoc-httpAfterwards, you can start the container:
docker run -d -p 8080:80 --name my-pandoc-http dwolters/pandoc-httpWithin the container the HTTP API is reachable on port 80. In the command above the HTTP API is bound to port 8080 of the docker host.
You can stop and remove the container if it is not needed anymore:
docker stop my-pandoc-http
docker rm my-pandoc-httpIn order to use this project without using the docker container, you first
must install Pandoc and add it to your PATH.
Alternatively, you can set the PANDOC env variable to define the location of your pandoc executable.
Afterwards, clone the repository and switch to the proper directory:
git clone https://github.com/dwolters/pandoc-http
cd pandoc-httpInstall the dependencies:
npm install
And finally, you can start the HTTP API for Pandoc:
node server.js
The API can run on a different port by setting the PORT environment variable, e.g., on port 8080:
PORT=8080 node server.js
Assuming the API listens on port 8080, you can test it by using curl. The following command shows how to convert html into markdown using our HTTP API for Pandoc:
curl -s -H "Content-Type: text/html" -H "Accept: text/markdown" --data "<h1>My Headline</h1>" http://localhost:8080/
curl -s -H "Content-Type: text/html" -H "Accept: docx" --data "<h1>My Headline</h1>" http://localhost:8080/ > file.docx
curl -s -H "Content-Type: docx" -H "Accept: text/markdown" --data-binary "@file.docx" http://localhost:8080/
Please note that in this example the pandoc identifier for docx files is used. The correct media type would be application/vnd.openxmlformats-officedocument.wordprocessingml.document.
The script generate-swagger-spec.js automatically generates the Swagger description for this service based on the supported input and output formats (listed by pandoc --list-[input|output]-formats respectively). The Swagger description can be generated in both YAML and JSON format. The npm scripts generate-swagger-json and generate-swagger-yaml can be used to output the generated description into a file with a fixed filename (pandoc.swagger.json or pandoc.swagger.yaml respectively). To save the description into a file with custom filename, run
node generate-swagger-spec.js [--json|--yaml] > your-filename-here.extThe Dockerfile is partially based on the Dockerfile of vpetersson's pandoc container.