pandoc 2.0
[Scroll to the end for the binary packages, or better yet, go to the pandoc 2.0.1 packages]
New features
-
New output format
ms
(groff ms). Complete support, including tables, math, syntax highlighting, and PDF bookmarks. The writer uses texmath’s new eqn writer to convert math to eqn format, so a ms file produced with this writer should be processed withgroff -ms -e
if it contains math. -
New output format
jats
(Journal Article Tag Suite). This is an XML format used in archiving and publishing articles. Note that a URI-encoded CSL stylesheet (data/jats.csl
) is added automatically unless a stylesheet is specified using--css
. -
New output format
gfm
(GitHub-flavored CommonMark) (#3841). This uses bindings to GitHub’s fork of cmark, so it should parse gfm exactly as GitHub does (excepting certain postprocessing steps, involving notifications, emojis, etc.).markdown_github
has been deprecated in favor ofgfm
. -
New output format
muse
(Emacs Muse) (Alexander Krotov, #3489). -
New input format
gfm
(GitHub-flavored CommonMark) (#3841). This uses bindings to GitHub’s fork of cmark.markdown_github
has been deprecated in favor ofgfm
. -
New input format
muse
(Emacs Muse) reader (Alexander Krotov, #3620). -
New input format
tikiwiki
(TikiWiki markup) (rlpowell, #3800). -
New input format
vimwiki
(Vimwiki markup) (Yuchen Pei, #3705). Note that there is a new data file,data/vimwiki.css
, which can be used to display the HTML produced by this reader and pandoc’s HTML writer in the style of vimwiki’s own HTML export. -
New input format
creole
(Creole 1.0) (#3994, Sascha Wilde). -
New syntax for Divs, with
fenced_divs
extension enabled by default (#168). This gives an attractive, plain-text way to create containers for block-level content. -
Added new syntax for including raw content in any output format, enabled by the
raw_attribute
extension (which is on by default formarkdown
andmultimarkdown
). The syntax is the same as for fenced code blocks or code inlines, only with{=FORMAT}
for attributes, whereFORMAT
is the name of the output format (e.g.,ms
,html
). -
Implement multicolumn support for slide formats (#1710). The structure expected is:
:::::::::::::: {.columns} ::: {.column width="40%"} contents... ::: ::: {.column width="60%"} contents... ::: ::::::::::::::
Support has been added for beamer and all HTML slide formats.
-
Allows line comments in templates, beginning with
$--
(#3806). (Requires doctemplates 0.2.1.) -
Add
--eol=crlf|lf|native
flag and writer option to control line endings (Stefan Dresselhaus, #3663, #2097). -
Add
--log
option to save log messages in JSON format to a file (#3392). -
Add
--request-header
option, to set request headers when pandoc makes HTTP requests to fetch external resources. For example:--request-header User-Agent:blah
. -
Added lua filters (Albert Krewinkel, #3514). The new
--lua-filter
option works like--filter
but takes pathnames of special lua filters and uses the lua interpreter baked into pandoc, so that no external interpreter is needed. Note that lua filters are all applied after regular filters, regardless of their position on the command line. For documentation of lua filters, seedoc/lua-filters.md
. -
Set
PANDOC_READER_OPTIONS
in environment where filters are run. This contains a JSON representation ofReaderOptions
, so filters can access it. -
Support creation of pdf via groff
ms
and pdfroff.pandoc -t ms -o output.pdf input.txt
. -
Support for PDF generation via HTML and
weasyprint
orprince
(Mauro Bieg, #3909).pandoc -t html5 -o output.pdf --pdf-engine=prince
. -
Added
--epub-subdirectory
option (#3720). This specifies the subdirectory in the OCF container that holds the EPUB specific content. We now put all EPUB related content in anEPUB/
subdirectory by default (later this will be configurable).mimetype META-INF/ com.apple.ibooks.display-options.xml container.xml EPUB/ <<--configurable-->> fonts/ <<--static-->> font.otf media/ <<--static-->> cover.jpg fig1.jpg styles/ <<--static-->> stylesheet.css content.opf toc.ncx text/ <<--static-->> ch001.xhtml
-
Added
--resource-path=SEARCHPATH
command line option (#852). SEARCHPATH is separated by the usual character, depending on OS (: on unix, ; on windows). Default resource path is just working directory. However, the working directory must be explicitly specified if the--resource-path
option is used. -
Added –abbreviations=FILE option for custom abbreviations file (#256). Dfault abbreviations file (
data/abbreviations
) contains a list of strings that will be recognized by pandoc’s Markdown parser as abbreviations. (A nonbreaking space will be inserted after the period, preventing a sentence space in formats like LaTeX.) Users can override the default by putting a file abbreviations in their user data directory (~/.pandoc
on *nix). -
Allow a theme file as argument to
--highlight-style
. Also include a sample,default.theme
, indata/
. -
Allow
--syntax-definition
option for dynamic loading of syntax highlighting definitions (#3334). -
Lists in
markdown
by default now use the CommonMark variable nesting rules (#3511). The indentation required for a block-level item to be included in a list item is no longer fixed, but is determined by the first line of the list item. To be included in the list item, a block must be indented to the level of the first non-space content after the list marker. Exception: if are 5 or more spaces after the list marker, then the content is interpreted as an indented code block, and continuation paragraphs must be indented two spaces beyond the end of the list marker. See the CommonMark spec for more details and examples.Documents that adhere to the four-space rule should, in most cases, be parsed the same way by the new rules. Here are some examples of texts that will be parsed differently:
- a - b
will be parsed as a list item with a sublist; under the four-space rule, it would be a list with two items.
- a code
Here we have an indented code block under the list item, even though it is only indented six spaces from the margin, because it is four spaces past the point where a continuation paragraph could begin. With the four-space rule, this would be a regular paragraph rather than a code block.
- a code
Here the code block will start with two spaces, whereas under the four-space rule, it would start with
code
. With the four-space rule, indented code under a list item always must be indented eight spaces from the margin, while the new rules require only that it be indented four spaces from the beginning of the first non-space text after the list marker (here,a
).This change was motivated by a slew of bug reports from people who expected lists to work differently (#3125, #2367, #2575, #2210, #1990, #1137, #744, #172, #137, #128) and by the growing prevalance of CommonMark (now used by GitHub, for example). Those who prefer the old behavior can use
-f markdown+four_space_rule
. -
Added
four_space_rule
extension. This triggers the old pandoc parsing rule for content nested under list items (the “four space rule”). -
Added
spaced_reference_links
extension (#2602). It allows whitespace between the two parts of a reference link: e.g.[a] [b] [b]: url
This was previously enabled by default; now it is now forbidden by default.
-
Add
space_in_atx_header
extension (#3512). This is enabled by default in pandoc and GitHub markdown but not the other flavors. This requirse a space between the opening #’s and the header text in ATX headers (as CommonMark does but many other implementations do not). This is desirable to avoid falsely capturing things ilke#hashtag
or
#5
-
Add
sourcefile
andoutputfile
template variables (Roland Hieber, #3431). -
Allow ibooks-specific metadata in epubs (#2693). You can now have the following fields in your YAML metadata, and it will be treated appropriately in the generated EPUB:
ibooks: version: 1.3.4 specified-fonts: false ipad-orientation-lock: portrait-only iphone-orientation-lock: landscape-only binding: true scroll-axis: vertical
Behavior changes
-
Reader functions no longer presuppose that CRs have been stripped from the input. (They strip CRs themselves, before parsing, to simplify the parsers.)
-
Added support for translations (localization) (#3559). Currently this only affects the LaTeX reader, for things like
\figurename
. Translation data files for 46 languages can be found indata/translations
. -
Make
--ascii
work with DocBook output too. -
Rename
--latex-engine
to--pdf-engine
, and--latex-engine-opt
to--pdf-engine-opt
. -
Removed
--parse-raw
andreaderParseRaw
. These were confusing. Now we rely on the+raw_tex
or+raw_html
extension with latex or html input. Thus, instead of--parse-raw -f latex
we use-f latex+raw_tex
, and instead of--parse-raw -f html
we use-f html+raw_html
. -
With
--filter
R filters are now recognized, even if they are not executable (#3940, #3941, Andrie de Vries). -
Support SVG in PDF output, converting with
rsvg2pdf
(#1793). -
Make epub an alias for epub3, not epub2.
-
Removed
--epub-stylesheet
; use--css
instead (#3472, #847). Multiple stylesheets may be used. Stylesheets will be taken both from--css
and from thestylesheet
metadata field (which can contain either a file path or a list of them). -
--mathml
and MathML in HTMLMathMethod no longer take an argument. The argument was for a bridge JavaScript that used to be necessary in 2004. We have removed the script already. -
--katex
improvements. The latest version is used, and the autoload script is loaded by default. -
Change MathJax CDN default since old one is shutting down (#3544). Note: The new URL requires a version number, which we’ll have to update manually in subsequent pandoc releases in order to take advantage of mathjax improvements.
-
--self-contained
: don’t incorporate elements withdata-external="1"
(#2656). You can leave an external link as it is by adding the attribute data-external=“1” to the element. Pandoc will then not try to incorporate its content when--self-contained
is used. This is similar to a feature already supported by the EPUB writer. -
Allow
--extract-media
to work with non-binary input formats (#1583, #2289). If--extract-media
is supplied with a non-binary input format, pandoc will attempt to extract the contents of all linked images, whether in local files, data: uris, or external uris. They will be named based on the sha1 hash of the contents. -
Make
papersize: a4
work regardless of the case ofa4
. It is converted toa4
in LaTeX andA4
in ConTeXt. -
Make
east_asian_line_breaks
affect all readers/writers (#3703). -
Underlined elements are now treated consistently by readers (#2270, hftf); they are always put in a Span with class
underline
. This allows the user to treat them differently from other emphasis, using a filter. Docx, Org, Textile, Txt2Tags, and HTML readers have been changed. -
Improved behavior of
auto_identifiers
when there are explicit ids (#1745). Previously only autogenerated ids were added to the list of header identifiers in state, so explicit ids weren’t taken into account when generating unique identifiers. Duplicated identifiers could result. This simple fix ensures that explicitly given identifiers are also taken into account. -
Use
table-of-contents
for contents of toc, maketoc
a boolean (#2872). Changed markdown, rtf, and HTML-based templates accordingly. This allows you to settoc: true
in the metadata; this previously produced strange results in some output formats. For backwards compatibility,toc
is still set to the toc contents. But it is recommended that you update templates to usetable-of-contents
for the toc contents andtoc
for a boolean flag. -
Change behavior with binary format output to stdout. Previously, for binary formats, output to stdout was disabled unless we could detect that the output was being piped (and not sent to the terminal). Unfortunately, such detection is not possible on Windows, leaving windows users no way to pipe binary output. So we have changed the behavior in the following way:
- Output to stdout is allowed when it can be determined that the output is being piped (on non-Windows platforms).
- If the
-o
option is not used, binary output is never sent to stdout by default; instead, an error is raised. - If
-o -
is used, binary output is sent to stdout, regardless of whether it is being piped. This works on Windows too.
-
Better error behavior: uses of
error
have been replaced by raising ofPandocError
, which can be trapped and handled by the calling program. -
Removed
hard_line_breaks
extension frommarkdown_github
(#3594). GitHub has two Markdown modes, one for long-form documents like READMEs and one for short things like issue coments. In issue comments, a line break is treated as a hard line break. In README, wikis, etc., it is treated as a space as in regular Markdown. Since pandoc is more likely to be used to convert long-form documents from GitHub Markdown,-hard_line_breaks
is a better default. -
Include
backtick_code_blocks
extension inmardkown_mmd
(#3637). -
Escape
MetaString
values (as added with-M/--metadata
flag) (#3792). Previously they would be transmitted to the template without any escaping. Note that--M title='*foo*'
yields a different result from--- title: *foo* ---
In the latter case, we have emphasis; in the former case, just a string with literal asterisks (which will be escaped in formats, like Markdown, that require it).
-
Allow
em
,cm
,in
for image height/width in HTML, LaTeX (#3450). -
HTML writer: Insert
data-
in front of unsupported attributes. Thus, a span with attributefoo
gets written to HTML5 withdata-foo
, so it is valid HTML5. HTML4 is not affected. This will allow us to use custom attributes in pandoc without producing invalid HTML. (With help from Wandmalfarbe, #3817.) -
Plain writer: improved super/subscript rendering. We now handle more non-digit characters for which there are sub/superscripted unicode characters. When unicode sub/superscripted characters are not available, we use
_(..)
or^(..)
(#3518). -
Docbook, JATS, TEI writers: print INFO message when omitting interior header (#3750). This only applies to section headers inside list items, e.g., which were otherwise silently omitted.
-
Change to
--reference-links
in Markdown writer (#3701). With--reference-location
ofsection
orblock
, pandoc will now repeat references that have been used in earlier sections. The Markdown reader has also been modified, so that exactly repeated references do not generate a warning, only references with the same label but different targets. The idea is that, with references after every block, one might want to repeat references sometimes. -
ODT/OpenDocument writer:
-
Docx writer:
-
Change
FigureWithCaption
toCaptionedFigure
(iandol, #3658). -
Use
Table
rather thanTable Normal
for table style (#3275).Table Normal
is the default table style and can’t be modified. -
Pass through comments (#2994). We assume that comments are defined as parsed by the docx reader:
I want I left a comment.some text to have a comment on it.
We assume also that the id attributes are unique and properly matched between comment-start and comment-end.
-
Bookmark improvements. Bookmark start/end now surrounds content rather than preceding it. Bookmarks generated for Div with id (jgm/pandoc-citeproc#205).
-
Add
keywords
metadata to docx document properties (Ian).
-
RST writer: support unknown interpreted text roles by parsing them as
Span
withrole
attributes (#3407). This way they can be manipulated in the AST. -
HTML writer:
- Line block: Use class instead of style attribute (#1623). We now issue
<div class="line-block">
and include a default definition forline-block
in the default templates, instead of hard-coding astyle
on the div. - Add class
footnoteBack
to footnote back references (Timm Albers). This allows for easier CSS styling. - Render SmallCaps as span with smallcaps class (#1592), rather than using a style attribute directly. This gives the user more flexibility in styling small caps in CSS.
- With reveal.js we use
data-src
instead ofsrc
for images for lazy loading. - Special-case
.stretch
class for images in reveal.js (#1291). Now in reveal.js, an image with classstretch
in a paragraph by itself will stretch to fill the whole screen, with no caption or figure environment.
- Line block: Use class instead of style attribute (#1623). We now issue
-
Added warnings for non-rendered blocks to writers.
-
Writers now raise an error on template failure.
-
When creating a PDF via LaTeX, warn if the font is missing some characters (#3742).
-
Remove initial check for PDF-creating program (#3819). Instead, just try running it and raise the exception if it isn’t found at that point. This improves things for users of Cygwin on Windows, where the executable won’t be found by
findExecutable
unless.exe
is added. The same exception is raised as before, but at a later point. -
Readers issue warning for duplicate header identifiers (#1745). Autogenerated header identifiers are given suffixes so as not to clash with previously used header identifiers. But they may still coincide with an explicit identifier that is given for a header later in the document, or with an identifier on a div, span, link, or image. We now issue a warning in this case, so users can supply an explicit identifier.
-
CommonMark reader now supports
emoji
,hard_line_breaks
,smart
, andraw_html
extensions. -
Markdown reader:
-
Don’t allow backslash + newline to affect block structure (#3730). Note that as a result of this change, the following, which formerly produced a header with two lines separated by a line break, will now produce a header followed by a paragraph:
# Hi
thereThis may affect some existing documents that relied on this undocumented and unintended behavior. This change makes pandoc more consistent with other Markdown implementations, and with itself (since the two-space version of a line break doesn’t work inside ATX headers, and neither version works inside Setext headers).
-
-
Org reader (Albert Krewinkel, unless noted):
- Support
table.el
tables (#3314). - Support macros (#3401).
- Support the
#+INCLUDE:
file inclusion mechanism (#3510). Recognized include types areexample
,export
,src
, and normal org file inclusion. Advanced features like line numbers and level selection are not implemented yet. - Interpret more meta value as inlines. The values of the following meta variables are now interpreted using org-markup instead of treating them as pure strings:
keywords
(comma-separated list of inlines),subtitle
(inline values),nocite
(inline values, can be repeated). - Support
\n
export option (#3940). This turns all newlines in the text into hard linebreaks.
- Support
-
RST reader:
-
Improved admonition support (#223). We no longer add an
admonition
class, we just use the class for the type of admonition,note
for example. We put the word corresponding to the label in a paragraph inside aDiv
at the beginning of the admonition with classadmonition-title
. This is about as close as we can get to RST’s own output. -
Initial support of
.. table
directive. This allows adding captions to tables. -
Support
.. line-block
directive. This is deprecated but may still be in older documents. -
Support scale and align attributes of images (#2662).
-
Implemented implicit internal header links (#3475).
-
Support RST-style citations (#853). The citations appear at the end of the document as a definition list in a special div with id
citations
. Citations link to the definitions. -
Recurse into bodies of unknown directives (#3432). In most cases it’s better to preserve the content than to emit it. This isn’t guaranteed to have good results; it will fail spectacularly for unknown raw or verbatim directives.
-
Handle chained link definitions (#262). For example,
.. _hello: .. _goodbye: example.com
Here both
hello
andgoodbye
should link toexample.com
. -
Support anchors (#262). E.g.
`hello` .. _hello: paragraph
This is supported by putting “paragraph” in a
Div
with idhello
. -
Support
:widths:
attribute for table directive. -
Implement csv-table directive (#3533). Most attributes are supported, including
:file:
and:url:
. -
Support unknown interpreted text roles by parsing them as Span with “role” attributes (#3407). This way they can be manipulated in the AST.
-
-
HTML reader: parse a span with class
smallcaps
asSmallCaps
. -
LaTeX reader:
- Implemented
\graphicspath
(#736). - Properly handle column prefixes/suffixes. For example, in
\begin{tabular}{>{$}l<{$}>{$}l<{$} >{$}l<{$}}
each cell will be interpreted as if it has a$
before its content and a$
after (math mode). - Handle komascript
\dedication
(#1845). It now adds adedication
field to metadata. It is up to the user to supply a template that uses this variable. - Support all
\textXX
commands, where XX =rm
,tt
,up
,md
,sf
,bf
(#3488). Spans with a class are used when there is nothing better. - Expand
\newenvironment
macros (#987). - Add support for LaTeX subfiles package (Marc Schreiber, #3530).
- Better support for subfigure package (#3577). A figure with two subfigures turns into two pandoc figures; the subcaptions are used and the main caption ignored, unless there are no subcaptions.
- Add support for \vdots (Marc Schreiber, #3607).
- Add basic support for hyphenat package (Marc Schreiber, #3603).
- Add basic
\textcolor
support (Marc Schreiber). - Add support for
tabularx
environment (Marc Schreiber, #3632). - Better handling of comments inside math environments (#3113). This solves a problem with commented out
\end{eqnarray}
inside an eqnarray (among other things). - Parse tikzpicture as raw verbatim environment if
raw_tex
extension is selected (#3692). Otherwise skip with a warning. This is better than trying to parse it as text! - Add
\colorbox
support (Marc Schreiber). - Set identifiers on Spans used for
\label
. - Have
\setmainlanguage
setlang
in metadata. - Support etoolbox’s
\ifstrequal
. - Support
plainbreak
,fancybreak
et al from the memoir class (bucklereed, #3833). - Support
\let
. Also, fix regular macros so they’re expanded at the point of use, and NOT also the point of definition.\let
macros, by contrast, are expanded at the point of definition. Added anExpansionPoint
field toMacro
to track this difference. - Support simple
\def
macros. Note that we still don’t support macros with fancy parameter delimiters, like\def\foo#1..#2{...}
. - Support \chaptername, \partname, \abstractname, etc. (#3559, obsoletes #3560).
- Put content of
\ref
,\label
,\eqref
commands intoSpan
with attributes, so they can be handled in filters (Marc Schreiber, #3639) - Add Support for
glossaries
andacronym
package (Marc Schreiber, #3589). Acronyms are not resolved by the reader, but acronym and glossary information is put into attributes on Spans so that they can be processed in filters. - Use
Link
instead ofSpan
for\ref
. This makes more sense semantically and avoids unnecessarySpan [Link]
nestings when references are resolved. - Rudimentary support for
\hyperlink
. - Support
\textquoteleft|right
,\textquotedblleft|right
(#3849). - Support
\lq
,\rq
. - Implement
\newtoggle
,\iftoggle
,\toggletrue|false
from etoolbox (#3853). - Support
\RN
and\Rn
, from biblatex (bucklereed, #3854). - Improved support for
\hyperlink
,\hypertarget
(#2549). - Support
\k
ogonek accent. - Improve handling of accents. Handle ogonek, and fall back correctly with forms like
\"{}
. - Better support for ogonek accents.
- Support for
\faCheck
and\faClose
(Marc Schreiber, #3727). - Support for
xspace
(Marc Schreiber, #3797). - Support
\setmainlanguage
or\setdefaultlanguage
(polyglossia) and\figurename
. - Better handling of
\part
in LaTeX (#1905). Now we parse chapters as level 0 headers, and parts as level -1 headers. After parsing, we check for the lowest header level, and if it’s less than 1 we bump everything up so that 1 is the lowest header level. So\part
will always produce a header; no command-line options are needed. - Add block version of
\textcolor
(Marc Schreiber). \textcolor
works as inline and block command (Marc Schreiber).\textcolor
will be parse as span at the beginning of a paragraph (Marc Schreiber).- Read polyglossia/babel
\text(LANG){...}
(bucklereed) - Improved handling of include files in LaTeX reader (#3971). Previously
\include
wouldn’t work if the included file contained, e.g., a begin without a matching end. - Support
\expandafter
(#3983). - Handle
\DeclareRobustCommand
(#3983). Currently it’s just treated as a synonym for\newcommand
. - Handle
\lettrine
(Mauro Bieg).
- Implemented
-
Math improvements due to updates in texmath:
- Improved handling of accents and upper/lower delimiters.
- Support for output in GNU eqn format (used with *roff).
- Allow
\boldsymbol
+ a token without braces, and similarly with other styling commands. - Improve parsing of
\mathop
to allow multi-character operator names. - Add thin space after math operators when “faking it with unicode.”
-
walk
is now used instead ofbottomUp
in theToJSONFilter
instance fora -> [a]
(pandoc-types). Note that behavior will be slightly different, sincebottomUp
’s treatment of a function[a] -> [a]
is to apply it to each sublist of a list, while walk applies it only to maximal sublists. Usually the latter behavior is what is wanted, and the former can be simulated when needed. But there may be existing filters that need to be rewritten in light of the new behavior. Performance should be improved. -
There are some changes to syntax highlighting due to revisions in the
skylighting
library:- Support for
powershell
has been added, and many syntax definitions have been updated. - Background colors have been added to the
kate
style. - The way highlighted code blocks are formatted in HTML has been changed (David Baynard), in ways that may require changes in hard-coded CSS affecting highlighting. (If you haven’t included hard-coded highlighting CSS in your template, you needn’t change anything.)
- Support for
API changes
-
New module
Text.Pandoc.Class
(Jesse Rosenthal, John MacFarlane). This contains definitions of thePandocMonad
typeclass, thePandocIO
andPandocPure
monads, and associated functions. -
Changed types of all writers and readers.
- We now use
Text
instead ofString
in the interface (#3731). (We have not yet changed the internals of most readers to work withText
, but making this change in the API now opens up a path to doing that.) - The result is now of form
m a
with constraintPandocMonad m
. Readers and writers can be combined to form monadic values which can be run using eitherrunIO
orrunPure
. IfrunIO
is used, then both readers and writers will be able to do IO when needed (for include files, for example); ifrunPure
is used, then the functions are pure and will not touch IO. - Where previously you used
writeRST def (readMarkdown def "[foo](url)")
, now you would userunPure $ readMarkdown def (pack "[foo](url)") >>= writeRST def
.
- We now use
-
New module
Text.Pandoc.Readers
(Albert Krewinkel). This contains reader helper functions formerly defined in the top-levelText.Pandoc
module.- Changed
StringReader
->TextReader
. getReader
now returns a pair of a reader andExtensions
, instead of building the extensions into the reader (#3659). The calling code must explicitly setreaderExtensions
using theExtensions
returned. The point of the change is to make it possible for the calling code to determine what extensions are being used.
- Changed
-
New module
Text.Pandoc.Writers
(Albert Krewinkel). This contains writer helper functions formerly defined in the top-levelText.Pandoc
module.- Changed
StringWriter
->TextWriter
. getWriter
now retuns a pair of a reader andExtensions
, instead of building the extensions into the reader (#3659). The calling code must explicitly setreaderExtensions
using theExtensions
returned. The point of the change is to make it possible for the calling code to determine what extensions are being used.
- Changed
-
New module
Text.Pandoc.Lua
, exportingrunLuaFilter
(Albert Krewinkel, #3514). -
New module
Text.Pandoc.App
. This abstracts out the functionality of the command line program (convertWithOpts
), so it can be reproduced e.g. in a desktop or web application. Instead of exiting, we throw errors (#3548), which are caught (leading to exit) in pandoc.hs, but allow other users ofText.Pandoc.App
to recover.pandoc.hs
is now a 2-liner. The module also exports some utility functions for parsing options and running filters. -
New module
Text.Pandoc.Logging
(exported module) (#3392). This now contains theVerbosity
definition previously inText.Pandoc.Options
, as well as a newLogMessage
datatype that will eventually be used instead of raw strings for warnings. This will enable us, among other things, to provide machine-readable warnings if desired. Include ToJSON instance and showLogMessage. This gives us the possibility of both machine-readable and human-readable output for log messages. -
New module
Text.Pandoc.BCP47
, withgetLang
,Lang(..)
,parseBCP47
. -
New module
Text.Pandoc.Translations
, exportingTerm
,Translations
,readTranslations
. -
New module
Text.Pandoc.Readers.LaTeX.Types', exporting
Macro,
Tok,
TokType,
Line,
Column`. -
Text.Pandoc.Error
: added many new constructors forPandocError
. -
Expose some previously private modules (#3260). These are often helpful to people writing their own reader or writer modules:
Text.Pandoc.Writers.Shared
Text.Pandoc.Parsing
Text.Pandoc.Asciify
Text.Pandoc.Emoji
Text.Pandoc.ImageSize
Text.Pandoc.Highlighting
`
-
New module
Text.Pandoc.Extensions
(Albert Krewinkel): Extension parsing and processing functions were defined in the top-levelText.Pandoc
module. These functions are moved to the Extensions submodule as to enable reuse in other submodules. -
Add
Ext_raw_attribute
constructor forExtension
. -
Add
Ext_fenced_divs
constructor for `Extension’. -
Add
Ext_four_space_rule
constructor inExtension
. -
Add
Ext_gfm_auto_identifiers
constructor forExtension
. -
Add
Monoid
instance forExtensions
. -
Add
Text.Pandoc.Writers.Ms
, exportingwriteMs
. -
Add
Text.Pandoc.Writers.JATS
, exportingwriteJATS
. -
Add
Text.Pandoc.Writers.Muse
, exportingwriteMuse
. -
Add
Text.Pandoc.Readers.Muse
, exportingreadMuse
. -
Add
Text.Pandoc.Readers.TikiWiki
, exportingreadTikiWiki
. -
Add
Text.Pandoc.Readers.Vimwiki
, exportingreadVimwiki
. -
Add
Text.Pandoc.Readers.Creole
, exportingreadCreole
. -
Export
setVerbosity
fromText.Pandoc
. -
Text.Pandoc.Pretty
: AddEq
instance forDoc
. -
Text.Pandoc.XML
:toEntities
: changed type toText -> Text
. -
Text.Pandoc.UTF8
:- Export
fromText
,fromTextLazy
,toText
,toTextLazy
. DefinetoString
,toStringLazy
in terms of them. - Add new functions parameterized on
Newline
:writeFileWith
,putStrWith
,putStrLnWith
,hPutStrWith
,hPutStrLnWith
.
- Export
-
Text.Pandoc.MediaBag
: removedextractMediaBag
. -
Text.Pandoc.Highlighting
:highlighting
now returns an Either rather than Maybe. This allows us to display error information returned by the skylighting library. Display a warning if the highlighting library throws an error.- Add parameter for
SyntaxMap
tohighlight
.
-
Text.Pandoc.Writers.Math
:- Export
defaultMathJaxURL
,defaultKaTeXURL
. This will ensure that we only need to update these in one place.
- Export
-
Text.Pandoc.SelfContained
:- Removed
WriterOptions
parameter frommakeSelfContained
. - Put
makeSelfContained
in PandocMonad instead of IO. This removes the need to pass MediaBag around and improves exceptions. It also opens up the possibility of using makeSelfContained purely. - Export
makeDataURI
.
- Removed
-
Text.Pandoc.ImageSize
:- Export
lengthToDim
, new functionscaleDimension
. - Export
inEm
from ImageSize (#3450). - Change
showFl
andshow
instance forDimension
so extra decimal places are omitted. - Added
Em
as a constructor ofDimension
. - Add
WriterOptions
parameter toimageSize
signature (Mauro Bieg).
- Export
-
Text.Pandoc.Templates
:- Change type of
renderTemplate'
. Now it runs inPandocMonad
and raises a properPandocTemplateError
if there are problems, rather than failing with uncatchableerror
. - Change signature of
getDefaultTemplate
. Now it runs in any instance ofPandocMonad
, and returns aString
rather than anEither
value. And it no longer takes adatadir
parameter, since this can be retrieved fromCommonState
.
- Change type of
-
Text.Pandoc.Options
:- Added
writerEpubSubdirectory
toWriterOptions
(#3720). The EPUB writer now takes its EPUB subdirectory from this option. - In
WriterOptions
, renamewriterLaTeXEngine
towriterPdfEngine
andwriterLaTeXArgs
towriterPdfArgs
(Mauro Bieg, #3909). - Add
writerSyntaxMap
toWriterOptions
. - Removed
writerEpubStylesheet
fromWriterOptions
. - Remove
writerUserDataDir
fromWriterOptions
. It is now carried inCommonState
inPandocMonad
instances. (And thus it can be used by readers too.) - Changed
writerEpubMetadata
to aMaybe String
. - Removed
readerApplyMacros
fromReaderOptions
. Now we just check thelatex_macros
reader extension. - FromJSON/ToJSON instances for
ReaderOptions
. - In
HTMLMathMethod
, theKaTeX
contsructor now takes only one string (for the KaTeX base URL), rather than two. - Removed
writerSourceURL
fromWriterOptions
. We now usestSourceURL
inCommonState
, which is set bysetInputFiles
.
- Added
-
Text.Pandoc.Shared
:tabFilter
now takes aText
, notString
.openURL
: Changed type from an Either. Now it will just raise an exception to be trapped later.- Remove
normalizeSpaces
(#1530). - Remove
warn
. (Usereport
fromText.Pandoc.Class
instead.) - Export a new function
crFilter
. - Add
eastAsianLineBreakFilter
(previously in Markdown reader). - Provide custom
isURI
that rejects unknown schemes. (Albert Krewinkel, #2713). We also export the set of knownschemes
. The new function replaces the function of the same name fromNetwork.URI
, as the latter did not check whether a scheme is well-known. All official IANA schemes (as of 2017-05-22) are included in the set of known schemes. The four non-official schemesdoi
,isbn
,javascript
, andpmid
are kept. - Remove
err
. - Remove
readDataFile
,readDefaultDataFile
,getReferenceDocx
,getReferenceODT
. These now live inText.Pandoc.Class
, where they are defined in terms ofPandocMonad
primitives and have different signatures. - Remove
openURL
. UseopenURL
fromText.Pandoc.Class
instead. - Add
underlineSpan
.
-
Text.Pandoc.Readers.HTML
: export newNamedTag
class. -
Text.Pandoc.Readers.Markdown
: removereadDocxWithWarnings
. With the new API one can simply usegetLog
after running the reader. -
Text.Pandoc.Readers.LaTeX
: Changed types forrawLaTeXInline
andrawLaTeXBlock
. (Both now return aString
, and they are polymorphic in state.)
Bug fixes and under-the-hood improvements
-
TEI writer: Added identifiers on
<div>
elements. -
DokuWiki reader: Better handling for code block in list item (#3824).
-
Custom writer: Remove old preprocesesor conditionals (Albert Krewinkel).
-
ZimWiki writer: Removed internal formatting from note and table cells, because ZimWiki does not support it (Alex Ivkin, #3446).
-
MediaWiki writer:
-
Org writer:
-
CommonMark writer:
- Avoid excess blank lines at end of output.
- Prefer pipe tables to HTML tables even if it means losing relative column width information (#3734).
- Support table, strikethrough extensions, when enabled (as with gfm). Note that we bypass the commonmark writer from cmark and construct our own pipe tables, with better results.
- Properly support
--wrap=none
. - Use smallcaps class for
SmallCaps
(#1592). - Omit “fig:” prefix in image titles. This is used internally to indicate internal figures.
-
RST writer:
- Properly handle table captions.
- Don’t wrap lines in in definition list terms. Wrapping is not allowed.
- Implemented
+/-smart
and improved escaping with+smart
. - Add empty comments when needed to avoid including a blockquote in the indented content of a preceding block (#3675).
- Improve grid table output, fix bug with empty rows (#3516). Uses the new
gridTable
in Writers.Shared, which is here improved to better handle 0-width cells. - Remove space at beginning/end of RST code span (#3496). Otherwise we get invalid RST. There seems to be no way to escape the space.
- Add header anchors when header has non-standard id (#3937).
- Correctly handle inline code containing backticks, using a
:literal:
role (#3974). - Don’t backslash-escape word-internal punctuation (#3978).
-
Markdown writer:
-
Don’t include variables in metadata blocks. Previously variables set on the command line were included in e.g. YAML metadata, contrary to documentation and intentions.
-
Improved escaping with
+smart
. -
Fixed grid tables embedded in grid tables (#2834).
-
Use span with class ‘smallcaps’ for SmallCaps, instead of a style attribute as before (#1592).
-
Escape initial
%
in a paragraph if thepandoc_title_blocks
extension is enabled (#3454). Otherwise in a document starting with a literal%
the first line is wrongly interpreted as a title. -
Fixed false ordered lists in YAML metadata (#3492, #1685). Now we properly escape things that would otherwise start ordered lists, such as
--- title: 1. inline ...
-
Better handling of tables with empty columns (#3337). We now calculate the number of columns based on the longest row (or the length of aligns or widths).
-
Escape unordered list markers at beginning of paragraph (#3497), to avoid false interpretation as a list.
-
Escape
|
appropriately. -
Ensure space before list at top level (#3487).
-
Avoid spurious blanklines at end of document after tables and list, for example.
-
Fixed bugs in simple/multiline list output (#3384). Previously we got overlong lists with
--wrap=none
. This is fixed. Previously a multiline list could become a simple list (and would always become one with--wrap=none
). -
Don’t emit a simple table if
simple_tables
disabled (#3529). -
Case-insensitive reference links (David A Roberts, #3616). Ensure that we do not generate reference links whose labels differ only by case. Also allow implicit reference links when the link text and label are identical up to case.
-
Put space before reference link definitions (Mauro Bieg, #3630).
-
Better escaping for links (David A. Roberts, #3619). Previously the Markdown writer would sometimes create links where there were none in the source. This is now avoided by selectively escaping bracket characters when they occur in a place where a link might be created.
-
Added missing
\n
(David A. Roberts, #3647). -
Fixed duplicated reference links with
--reference-links
and--reference-location=section
(#3674). Also ensure that there are no empty link references[]
. -
Avoid inline surround-marking with empty content (#3715). E.g. we don’t want
<strong></strong>
to become****
. Similarly for emphasis, super/subscript, strikeout. -
Don’t allow soft break in header (#3736).
-
Make sure
plain
,markdown_github
, etc. work for raw. Previously onlymarkdown
worked. Note: currently a raw block labeledmarkdown_github
will be printed for anymarkdown
format. -
Ensure that
+
and-
are escaped properly so they don’t cause spurious lists (#3773). Previously they were only if succeeded by a space, not if they were at end of line. -
Use pipe tables if
raw_html
disabled andpipe_tables
enabled, even if the table has relative width information (#3734). -
Markdown writer: don’t crash on
Str ""
. -
Make
Span
with null attribute transparent. That is, we don’t use brackets or<span>
tags to mark spans when there are no attributes; we simply output the contents. -
Escape pipe characters when
pipe_tables
enabled (#3887). -
Better escaping of
<
and>
. Ifall_symbols_escapable
is set, we backslash escape these. Otherwise we use entities as before. -
When writing plain, don’t use
to separate list and indented code. There’s no need for it in this context, since this isn’t to be interpreted using Markdown rules. -
Preserve classes in JS obfuscated links (Timm Albers, #2989). HTML links containing classes originally now preserve them when using javascript email obfuscation.
-
Render
SmallCaps
as a native span whennative_spans
are enabled. -
Always write attributes with
bracketed_spans
(d-dorazio).
-
-
Man writer:
-
DocBook writer:
- Fix internal links with
writerIdentifierPrefix opt
(#3397, Mauro Bieg).
- Fix internal links with
-
Docx writer:
- Don’t include bookmarks on headers unless non-null id (#3476).
- Support 9 levels of headers (#1642).
- Allow 9 list levels (#3519).
- Don’t take
distArchive
from datadir (#3322). The docx writer takes components from the distribution’s version ofreference.docx
when it can’t find them in a user’s customreference.docx
. Previously, we allowed areference.docx
in the data directory (e.g.~/.pandoc
) to be used as the distribution’s reference.docx. This led to a bizarre situation where pandoc would produce a good docx using--template ~/.pandoc/ref.docx
, but ifref.docx
were moved to~/.pandoc/reference.docx
, it would then produce a corrupted docx. - Fixed handling of soft hyphen (0173) (#3691).
- Better handling of keywords (#3719).
- Cleaner code for handling dir and style attributes for
Div
. - Use
Set
for dynamic styles to avoid duplicates. - Removed redundant element from data/docx/word/numbering.xml. The elements we need are generated when the document is compiled; this didn’t do anything.
- Activate
evenAndOddHeaders
from reference docx (#3901, Augustín Martín Barbero).
-
ODT/OpenDocument writer:
- Calculate aspect ratio for percentage-sized images (Mauro Bieg, #3239).
- Use more widely available bullet characters (#1400). The old characters weren’t available in some font sets. These seem to work well on Windows and Linux versions of LibreOffice.
- Wider labels for lists (#2421). This avoids overly narrow labels for ordered lists with
()
delimiters. However, arguably it creates overly wide labels for bullets. Also, lists now start flush with the margin, rather than indented. - Fixed dropped elements in some ordered lists (#2434).
-
FB2 writer:
- Don’t render
RawBlock
as code. - Don’t fail with an error on interior headers (e.g. in list) (#3750). Instead, omit them with an INFO message.
- Add support for “lang” metadata (Alexander Krotov, #3625).
- Format
LineBlock
as poem (Alexander Krotov). Previously writer produced one paragraph with<empty-line/>
elements, which are not allowed inside<p>
according to FB2 schema. - Replace
concatMap
withcMap
(Alexander Krotov). - Write FB2 lists without nesting blocks inside
<p>
(Alexander Krotov, #4004)
- Don’t render
-
HTML writer:
- Make sure
html4
,html5
formats work for raw blocks/inlines. - Render raw inline environments when
--mathjax
used (#3816). We previously did this only with raw blocks, on the assumption that math environments would always be raw blocks. This has changed since we now parse them as inline environments. - Ensure we don’t get two style attributes for width and height.
- Report when not rendering raw inline/block.
- Issue warning if no title specified and template used (#3473).
- Info message if
lang
is unspecified (#3486). - Removed unused parameter in
dimensionsToAttributeList
. - Avoid two class attributes when adding
uri
class (#3716). - Fix internal links with
writerIdentifierPrefix opt
(#3397, Mauro Bieg). - Use revealjs’s math plugin for mathjax (#3743). This is a thin wrapper around mathjax that makes math look better on revealjs.
- Slidy: use h1 for all slides, even if they were originally level 2 headers (#3566). Otherwise the built-in table of contents in Slidy breaks.
- Make sure
-
LaTeX writer:
- Don’t render LaTeX images with data: URIs (#3636). Note that
--extract-media
can be used when the input contains data: URIs. - Make highlighted code blocks work in footnotes (Timm Albers).
- Don’t use figure inside table cell (#3836).
- Use proper code for list enumerators (#3891). This should fix problems with lists that don’t use arabic numerals.
- Always add hypertarget when there’s a non-empty identifier (#2719). Previously the hypertargets were only added when there was actually a link to that identifier.
- Use
%
after hypertarget before code block. - Add
\leavevmode
before hypertarget at start of paragraph (#2704, fixes formatting problems in beamer citations). - Don’t use
lstinline
in
\item[..]
(#645). If you do, the contents of item disappear or are misplaced. Use\texttt
instead. - Fix problem with escaping in
lstinline
(#1629). Previously the LaTeX writer created invalid LaTeX when--listings
was specified and a code span occured inside emphasis or another construction. - Fix error with line breaks after empty content (#2874). LaTeX requires something before a line break, so we insert a
~
if no printable content has yet been emitted. - Use BCP47 parser.
- Fixed detection of otherlangs (#3770). We weren’t recursing into inline contexts.
- Handle language in inline code with
--listings
(#3422). - Write euro symbol directly in LaTeX (Andrew Dunning, #3801). The textcomp package allows pdfLaTeX to parse
€
directly, making the\euro
command unneeded. - Fixed footnotes in table captions (#2378). Note that if the table has a first page header and a continuation page header, the notes will appear only on the first occurrence of the header.
- In
writeBeamer
output, allow hyperlinks to frames (#3220). Previously you could link to a header above or below slide level but not to slide level. This commit changes that. Hypertargets are inserted inside frame titles; technically the reference is to just after the title, but in normal use (where slides are viewed full screen in a slide show), this does not matter. - Remove
\strut
at beginning of table cells (#3436). This fixes a problem with alignment of lists in table cells. The\strut
at the end seems to be enough to avoid the too-close spacing that motivated addition of the strut in #1573. - Add partial siunitx Support (Marc Schreiber, #3588).
- Don’t render LaTeX images with data: URIs (#3636). Note that
-
ConTeXt writer:
-
EPUB writer:
title_page.xhtml
is now put intext/
.- Don’t strip formatting in TOC (#1611).
-
Textile reader:
-
HTML reader:
- Added warnings for ignored material (#3392).
- Better sanity checks to avoid parsing unintended things as raw HTML in the Markdown reader (#3257).
- Revise treatment of
li
withid
attribute (#3596). Previously we always added an empty div before the list item, but this created problems with spacing in tight lists. Now we do this: If the list item contents begin with aPlain
block, we modify thePlain
block by adding aSpan
around its contents. Otherwise, we add aDiv
around the contents of the list item (instead of adding an emptyDiv
to the beginning, as before). - Add
details
tag to list of block tags (#3694). - Removed
button
from block tag list (#3717). It is already in theeitherBlockOrInlineTag
list, and should be both places. - Use
Set
s instead of lists for block tag lookup. - Rewrote to use
Text
throughout. Effect on memory usage is modest (< 10%). - Use the lang value of
<html>
to set the lang meta value (bucklereed, #3765). - Ensure that paragraphs are closed properly when the parent block element closes, even without
</p>
(#3794). - Parse
<figure>
and<figcaption>
(Mauro Bieg, #3813). - Parse
<main>
like<div role=main>
(bucklereed, #3791).<main>
closes<p>
and behaves like a block element generally - Support column alignments (#1881). These can be set either with a
width
attribute or withtext-width
in astyle
attribute. - Modified state type to be an instance of
HasLogMessages
, soregisterHeader
can issue warnings. </td>
or</th>
should close any open block tag (#3991).<td>
should close an open<th>
or<td>
.htmlTag
improvements (#3989). We previously failed on cases where an attribute contained a>
character. This patch fixes the bug, which especially affects raw HTML in Markdown.
-
Txt2Tags reader:
- Newline is not indentation (Alexander Krotov).
-
MediaWiki reader:
- Allow extra hyphens after
|-
in tables (#2649). - Allow blank line after table start (#2649).
- Fixed more table issues (#2649).
- Ensure that list starts begin at left margin (#2606). Including when they’re in tables or other list items.
- Make smart double quotes depend on
smart
extension (#3585). - Don’t do curly quotes inside
<tt>
contexts (#3585). Even if+smart
. - Modified state type to be an instance of
HasLogMessages
, soregisterHeader
can issue warnings.
- Allow extra hyphens after
-
TWiki reader (Alexander Krotov):
-
EPUB reader:
- Minor refactoring, avoiding explicit MediaBag handling. This all works behind the scenes in CommonState plumbing.
-
Docx reader:
- Don’t drop smartTag contents (#2242).
- Handle local namespace declarations (#3365). Previously we didn’t recognize math, for example, when the xmlns declaration occured on the element and not the root.
- More efficient trimSps (#1530). Replacing
trimLineBreaks
. This does the work ofnormalizeSpaces
as well, so we avoid the need for that function here. - Avoid 0-level headers (Jesse Rosenthal, #3830). We used to parse paragraphs styled with “HeadingN” as “nth-level header.” But if a document has a custom style named “Heading0”, this will produce a 0-level header, which shouldn’t exist. We only parse this style if N>0. Otherwise we treat it as a normal style name, and follow its dependencies, if any.
- Add tests for avoiding zero-level header (Jesse Rosenthal).
-
ODT reader:
- Replaced
collectRights
with Rights fromData.Either
. - Remove dead code (Albert Krewinkel).
- Replaced
-
Org reader (Albert Krewinkel, unless noted).
- Don’t allow tables inside list items (John MacFarlane, #3499).
- Disallow tables on list marker lines (#3499).
- Convert markup at beginning of footnotes (John MacFarlane, #3576).
- Allow emphasized text to be followed by
[
(#3577). - Handle line numbering switch for src blocks. The line-numbering switch that can be given to source blocks (
-n
with an start number as an optional parameter) is parsed and translated to a class/key-value combination used by highlighting and other readers and writers. - Stop adding rundoc prefix to src params. Source block parameter names are no longer prefixed with
rundoc
. This was intended to simplify working with the rundoc project, a babel runner. However, the rundoc project is unmaintained, and adding those markers is not the reader’s job anyway. The original language that is specified for a source element is now retained as thedata-org-language
attribute and only added if it differs from the translated language. - Allow multi-word arguments to src block params (#3477). The reader now correctly parses src block parameter list even if parameter arguments contain multiple words.
- Avoid creating
nullMeta
by applyingsetMeta
directly (Alexander Krotov). - Replace
sequence . map
withmapM
. - Fix smart parsing behavior. Parsing of smart quotes and special characters can either be enabled via the
smart
language extension or the'
and-
export options. Smart parsing is active if either the extension or export option is enabled. Only smart parsing of special characters (like ellipses and en and em dashes) is enabled by default, while smart quotes are disabled. Previously, all smart parsing was disabled unless the language extension was enabled. - Subject full doc tree to headline transformations (Albert Krewinkel, #3695). Emacs parses org documents into a tree structure, which is then post-processed during exporting. The reader is changed to do the same, turning the document into a single tree of headlines starting at level 0.
- Fix cite parsing behaviour (Herwig Stuetz). Until now,
org-ref
cite keys included special characters also at the end. This caused problems when citations occur right before colons or at the end of a sentence. With this change, all non alphanumeric characters at the end of a cite key are ignored. This also adds,
to the list of special characters that are legal in cite keys to better mirror the behaviour of org-export. - Fix module names in haddock comments. Copy-pasting had lead to haddock module descriptions containing the wrong module names.
- Recognize babel result blocks with attributes (#3706). Babel result blocks can have block attributes like captions and names. Result blocks with attributes were not recognized and were parsed as normal blocks without attributes.
- Include tags in headlines. The Emacs default is to include tags in the headline when exporting. Instead of just empty spans, which contain the tag name as attribute, tags are rendered as small caps and wrapped in those spans. Non-breaking spaces serve as separators for multiple tags.
- Respect export option for tags (#3713). Tags are appended to headlines by default, but will be omitted when the
tags
export option is set to nil. - Use
tag-name
attribute instead ofdata-tag-name
. - Use
org-language
attribute rather thandata-org-language
. - Modified state type to be an instance of
HasLogMessages
, soregisterHeader
can issue warnings. - End footnotes after two blank lines. Footnotes can not only be terminated by the start of a new footnote or a header, but also by two consecutive blank lines.
- Update emphasis border chars (#3933). The org reader was updated to match current org-mode behavior: the set of characters which are acceptable to occur as the first or last character in an org emphasis have been changed and now allows all non-whitespace chars at the inner border of emphasized text (see
org-emphasis-regexp-components
).
-
RST reader:
-
Fixed small bug in list parsing (#3432). Previously the parser didn’t handle properly this case:
* - a - b * - c - d
-
Handle multiline cells in simple tables (#1166).
-
Parse list table directive (Keiichiro Shikano, #3432).
-
Make use of
anyLineNewline
(Alexander Krotov, #3686). -
Use
anyLineNewline
inrawListItem
(Alexander Krotov, #3702). -
Reorganize block parsers for ~20% faster parsing.
-
Fixed
..include::
directive (#3880). -
Handle blank lines correctly in line blocks (Alexander Krotov, #3881). Previously pandoc would sometimes combine two line blocks separated by blanks, and ignore trailing blank lines within the line block.
-
Fix indirect hyperlink targets (#512).
-
-
Markdown reader:
- Allow attributes in reference links to start on next line (#3674).
- Parse YAML metadata in a context that sees footnotes defined in the body of the document (#1279).
- When splitting pipe table cells, skip tex math (#3481). You might have a
|
character inside math. (Or for that matter something that the parser might mistake for raw HTML.) - Treat span with class
smallcaps
as SmallCaps. This allows users to specify small caps in Markdown this way:[my text]{.smallcaps}
(#1592). - Fixed internal header links (#2397). This patch also adds
shortcut_reference_links
to the list of mmd extensions. - Treat certain environments as inline when they occur without space surrounding them (#3309, #2171). E.g. equation, math. This avoids incorrect vertical space around equations.
- Optimized
nonindentSpaces
. Makes the benchmark go from 40 to 36 ms. - Allow latex macro definitions indented 1-3 spaces. Previously they only worked if nonindented.
- Improved parsing of indented raw HTML blocks (#1841). Previously we inadvertently interpreted indented HTML as code blocks. This was a regression. We now seek to determine the indentation level of the contents of an HTML block, and (optionally) skip that much indentation. As a side effect, indentation may be stripped off of raw HTML blocks, if
markdown_in_html_blocks
is used. This is better than having things interpreted as indented code blocks. - Fixed smart quotes after emphasis (#2228). E.g. in
*foo*'s 'foo'
. - Warn for notes defined but not used (#1718).
- Use
anyLineNewline
(Alexander Krotov). - Interpret YAML metadata as Inlines when possible (#3755). If the metadata field is all on one line, we try to interpret it as Inlines, and only try parsing as Blocks if that fails. If it extends over one line (including possibly the
|
or>
character signaling an indented block), then we parse as Blocks. This was motivated by some German users finding thatdate: '22. Juin 2017'
got parsed as an ordered list. - Fixed spurious parsing as citation as reference def (#3840). We now disallow reference keys starting with
@
if thecitations
extension is enabled. - Parse
-@roe
as suppress-author citation (pandoc-citeproc#237). Previously only[-@roe]
(with brackets) was recognized as suppress-author, and-@roe
was treated the same as@roe
. - Fixed parsing of fenced code after list when there is no intervening blank line (#3733).
- Allow raw latex commands starting with
\start
(#3558). Previously these weren’t allowed because they were interpreted as starting ConTeXt environments, even without a corresponding\stop
… - Added
inlines
,inlines1
. - Require nonempty alt text for
implicit_figures
(#2844). A figure with an empty caption doesn’t make sense. - Removed texmath macro material; now all this is handled in the LaTeX reader functions.
- Fixed bug with indented code following raw LaTeX (#3947).
-
LaTeX reader:
- Rewrote LaTeX reader with proper tokenization (#1390, #2118, #3236, #3779, #934, #982). This rewrite is primarily motivated by the need to get macros working properly. A side benefit is that the reader is significantly faster. We now tokenize the input text, then parse the token stream. Macros modify the token stream, so they should now be effective in any context, including math. Thus, we no longer need the clunky macro processing capacities of texmath.
- Parse
\,
to\8198
(six-per-em space) (Henri Werth). - Allow
\newcommand\foo{blah}
without braces. - Support
\lstinputlisting
(#2116). - Issue warnings when skipping unknown latex commands (#3392).
- Include contents of
\parbox
. - Allow
\hspace
and\vspace
to count as raw block or inline. Previously we would refuse to parse anything as raw inline if it was in theblockCommands
list. Now we allow exceptions if they’re listed under ignoreInlines in inlineCommands. This should make it easier e.g. to include an\hspace
between two side-by-side raw LaTeX tables. - Don’t drop contents of
\hypertarget
. - Handle spaces before
\cite
arguments. - Allow newpage, clearpage, pagebreak in inline contexts as well as block contexts (#3494).
- Treat
{{xxx}}
the same as{xxx}
(#2115). - Use
pMacroDefinition
in macro (for more direct parsing). Note that this means thatmacro
will now parse one macro at a time, rather than parsing a whole group together. - Fixed failures on \ref{}, \label{} with
+raw_tex
. Now these commands are parsed as raw if+raw_tex
; otherwise, their argument is parsed as a bracketed string. - Don’t crash on empty
enumerate
environment (#3707). - Handle escaped
&
inside table cell (#3708). - Handle block structure inside table cells (#3709).
minipage
is no longer required. - Handle some width specifiers on table columns (#3709). Currently we only handle the form
0.9\linewidth
. Anything else would have to be converted to a percentage, using some kind arbitrary assumptions about line widths. - Make sure
\write18
is parsed as raw LaTeX. The change is in the LaTeX reader’s treatment of raw commands, but it also affects the Markdown reader. - Fixed regression with starred environment names (#3803).
- Handle optional args in raw
\titleformat
(#3804). - Improved heuristic for raw block/inline. An unknown command at the beginning of the line that could be either block or inline is treated as block if we have a sequence of block commands followed by a newline or a
\startXXX
command (which might start a raw ConTeXt environment). - Don’t remove macro definitions from the output, even if
Ext_latex_macros
is set, so that macros will be applied. Since they’re only applied to math in Markdown, removing the macros can have bad effects. Even for math macros, keeping them should be harmless. - Removed
macro
. It is no longer necessary, since therawLaTeXBlock
parser will parse macro definitions. This also avoids the need for a separatelatexMacro
parser in the Markdown reader. - Use
label
instead ofdata-label
for label in caption (#3639). - Fixed space after \figurename etc.
- Resolve references to section numbers.
- Fix
\let\a=0
case, with single character token. - Allow
@
as a letter in control sequences.@
is commonly used in macros using\makeatletter
. Ideally we’d make the tokenizer sensitive to\makeatletter
and\makeatother
, but until then this seems a good change. - Track header numbers and correlate with labels.
- Allow
]
inside group in option brackets (#3857). - lstinline with braces can be used (verb cannot be used with braces) (Marc Schreiber, #3535).
- Fix keyval funtion: pandoc did not parse options in braces correctly (Marc Schreiber, #3642).
- When parsing raw LaTeX commands, include trailing space (#1773). Otherwise things like
\noindent foo
break and turn into\noindentfoo
. Affects-f latex+raw_tex
and-f markdown
(and other formats that allowraw_tex
). - Don’t treat “…” as Quoted (#3958). This caused quotes to be omitted in
\texttt
contexts. - Add tests for existing
\includegraphics
behaviour (Ben Firshman). - Allow space before
=
in bracketd options (Ben Firshman). - Be more forgiving in parsing command options. This was needed, for example, to make some minted options work.
- Strip off quotes in
\include
filenames.
-
Added
Text.Pandoc.CSV
, simple (unexported) CSV parser. -
Text.Pandoc.PDF
:- Got
--resource-path
working with PDF output (#852). - Fetch images when generating PDF via context (#3380). To do this, we create the temp directory as a subdirectory of the working directory. Since context mk IV by default looks for images in the parent directory, this works.
- Use
report
instead ofwarn
, make it sensitive to verbosity settings. - Use
fillMediaBag
andextractMedia
to extract media to temp dir. This reduces code duplication. html2pdf
: use stdin instead of intermediate HTML file- Removed useless
TEXINPUTS
stuff forcontext2pdf
. mkiv context doesn’t useTEXINPUTS
.
- Got
-
Text.Pandoc.Pretty
:- Simplified definition of
realLength
. - Don’t error for blocks of size < 1. Instead, resize to 1 (see #1785).
- Simplified definition of
-
Text.Pandoc.MIME
:- Use
application/javascript
(notapplication/x-javascript
). - Added
emf
to mimeTypes with typeapplication/x-msmetafile
(#1713).
- Use
-
Text.Pandoc.ImageSize
: -
Use
Control.Monad.State.Strict
throughout. This gives 20-30% speedup and reduction of memory usage in most of the writers. -
Use
foldrWithKey
instead of deprecatedfoldWithKey
. -
Text.Pandoc.SelfContained
:- Fixed problem with embedded fonts (#3629).
- Refactored getData from
getDataURI
inSelfContained
. - Don’t use data URIs for script or style (#3423). Instead, just use script or style tags with the content inside. The old method with data URIs prevents certain optimizations outside pandoc. Exception: data URIs are still used when a script contains
</script>
or a style contains</
. - SelfContained: Handle URL inside material retrieved from a URL (#3629). This can happen e.g. with an @import of a google web font. (What is imported is some CSS which contains an url reference to the font itself.) Also, allow unescaped pipe (|) in URL.
- Load resources from
data-src
(needed for lazy loading in reveal.js slide shows). - Handle
data-background-image
attribute on section (#3979).
-
Text.Pandoc.Parsing
:- Added
indentWith
(Alexander Krotov, #3687). - Added
stateCitations
toParserState
. - Removed
stateChapters
fromParserState
. - In
ParserState
, makestateNotes'
a Map, addstateNoteRefs
. - Added
gobbleSpaces
andgobbleAtMostSpaces
. - Adjusted type of
insertIncludedFile
so it can be used with token parser. - Replace old texmath macro stuff from Parsing. Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
- Export
insertIncludedFile
. - Added
HasLogMessages
,logMessage
,reportLogMessages
(#3447). - Replace partial with total function (Albert Krewinkel).
- Introduce
HasIncludeFiles
type class (Albert Krewinkel). TheinsertIncludeFile
function is generalized to work with all parser states which are instances of that class. - Add
insertIncludedFilesF
which returns F blocks (Albert Krewinkel). TheinsertIncludeFiles
function was generalized and renamed toinsertIncludedFiles'
; the specialized versions are based on that. many1Till
: Check for the end condition before parsing (Herwig Stuetz). By not checking for the end condition before the first parse, the parser was applied too often, consuming too much of the input. This only affectsmany1Till p end
wherep
matches on a prefix ofend
.- Provide
parseFromString
(#3690). This is a verison ofparseFromString
specialied to ParserState, which resetsstateLastStrPos
at the end. This is almost always what we want. This fixes a bug where_hi_
wasn’t treated as emphasis in the following, because pandoc got confused about the position of the last word:- [o] _hi_
. - Added
takeP
,takeWhileP
for efficient parsing of[Char]
. - Fix
blanklines
documentation (Alexander Krotov, #3843). - Give less misleading line information with
parseWithString
. Previously positions would be reported past the end of the chunk. We now reset the source position within the chunk and report positions “in chunk.” - Add
anyLineNewline
(Alexander Krotov). - Provide shared F monad functions for Markdown and Org readers (Albert Krewinkel). The
F
monads used for delayed evaluation of certain values in the Markdown and Org readers are based on a shared data type capturing the common pattern of bothF
types. - Add
returnF
(Alexander Krotov). - Avoid parsing
Notes:**
as a bare URI (#3570). This avoids parsing bare URIs that start with a scheme + colon +*
,_
, or]
. - Added
readerAbbreviations
toParserState
. Markdown reader now consults this to determine what is an abbreviation. - Combine grid table parsers (Albert Krewinkel, #3638). The grid table parsers for markdown and rst was combined into one single parser
gridTable
, slightly changing parsing behavior of both parsers: (1) The markdown parser now compactifies block content cell-wise: pure text blocks in cells are now treated as paragraphs only if the cell contains multiple paragraphs, and as plain blocks otherwise. Before, this was true only for single-column tables. (2) The rst parser now accepts newlines and multiple blocks in header cells. - Generalize tableWith, gridTableWith (Albert Krewinkel). The parsing functions
tableWith
andgridTableWith
are generalized to work with more parsers. The parser state only has to be an instance of theHasOptions
class instead of requiring a concrete type. Block parsers are required to return blocks wrapped into a monad, as this makes it possible to use parsers returning results wrapped inFuture
s.
- Added
-
Text.Pandoc.Shared
: -
Text.Pandoc.Writers.Shared
:- Export
metaToJSON'
,addVariablesToJSON
(#3439). This allows us to add the variables AFTER using the metadata to generate a YAML header (in the Markdown writer). - Added
unsmartify
(previously in RST writer). Undo literal double curly quotes. Previously we left these. - Generalize type of
metaToJSON
so it can take a Text. Previously a String was needed as argument; now any ToJSON instance will do. - Added
gridTable
(previously in Markdown writer). gridTable
: Refactored to use widths in chars.gridTable
: remove unnecessary extra space in cells.- Fixed
addVariablesToJSON
. It was previously not allowing multiple values to become lists. - Pipe tables: impose minimum cell size (see #3526).
- Export
Default template changes
-
HTML templates (including EPUB and HTML slide show templates):
- Make default.html5 polyglot markup conformant (John Luke Bentley, #3473). Polyglot markup is HTML5 that is also valid XHTML. See https://www.w3.org/TR/html-polyglot. With this change, pandoc’s html5 writer creates HTML that is both valid HTML5 and valid XHTML.
- Regularized CSS in html/epub/html slide templates (#3485). All templates now include
code{white-space: pre-wrap}
and CSS forq
if--html-q-tags
is used. Previously some templates hadpre
and otherspre-wrap
; theq
styles were only sometimes included. - CSS for
.smallcaps
, (Mauro Bieg, #1592) default.revealjs
: makehistory
default to true.default.revealjs
: use lazy loading (#2283).default.revealjs
: addmathjax
variable and some conditional code to use the MathJaX plugin.default.slidy
useshttps
instead ofhttp
(ickc, #3848).default.dzslides
: Load Google Font using HTTPS by default (Yoan Blanc).
-
DocBook5 template: Use
lang
andsubtitle
variables (Jens Getreu, #3855). -
LaTeX/Beamer template:
- Combine LaTeX/Beamer templates (Andrew Dunning, #3878).
default.beamer
has been removed; beamer now uses thedefault.latex
template. Beamer-specific parts are conditional on thebeamer
variable set by the writer. Note thatpandoc -D beamer
will return this (combined) template. - Use
xcolor
forcolorlinks
option (Andrew Dunning, #3877). Beamer loadsxcolor
rather thancolor
, and thus thedvipsnames
option doesn’t take effect. This also provides a wider range of colour selections with thesvgnames
option. - Use starred versions of
xcolor
names (Andrew Dunning). Prevents changes to documents defined using thedvipsnames
list (e.g.Blue
gives a different result with svgnames enabled). - Load
polyglossia
after header-includes (#3898). It needs to be loaded as late as possible. - Use
unicode-math
(Vaclav Haisman). Usemathspec
with only XeLaTeX on request. - Don’t load
fontspec
beforeunicode-math
(over there). Theunicode-math
package loadsfontspec
so explict loading offontspec
beforeunicode-math
is not necessary. - Use
unicode-math
by default in default.latex template. mathspec will be used in xelatex if themathspec
variable is set; otherwise unicode-math will be used (Václav Haisman). - Use
dvipsnames
options whencolorlinks
specified (otherwise we get an error formaroon
) (Thomas Hodgson). - Added beamer
titlegraphic
andlogo
variables (Thomas Hodgson). - Fix typo in fix for notes in tables (#2378, zeeMonkeez).
- Fix
hyperref
options clash (Andrew Dunning, #3847) Avoids an options clash when loading a package (e.g.tufte-latex
) that useshyperref
settings different from those in the template. - Add
natbiboptions
variable (#3768). - Fix links inside captions in LaTeX output with links-as-notes (Václav Haisman, #3651). Declare our redefined
\href
robust. - Load
parskip
beforehyperref
(Václav Haisman, #3654). - Allow setting Japanese fonts when using LuaLaTeX (Václav Haisman, #3873). by using the
luatexja-fontspec
andluatexja-preset
packages. Use existingCJKmainfont
andCJKoptions
template variables. Addluatexjafontspecoptions
forluatexja-fontspec
andluatexjapresetoptions
forluatexja-preset
. - Added
aspectratio
variable to beamer template (Václav Haisman, #3723). - Modified template.latex to fix XeLaTex being used with tables (lwolfsonkin, #3661). Reordered
lang
variable handling to immediately beforebidi
.
- Combine LaTeX/Beamer templates (Andrew Dunning, #3878).
-
ConTeXt template: Improved font handling:
simplefonts
is now obsolete in ConTeXt (Pablo Rodríguez).
Documentation improvements
-
MANUAL.txt:
- Add URL for Prince HTML > PDF engine (Ian, #3919).
- Document that content above slide-level will be omitted in slide shows. See #3460, #2265.
- Explain
--webtex
SVG url (Mauro Bieg, #3471) - Small clarification in YAML metadata section.
- Document that html4 is technically XHTML 1.0 transitional.
- Remove refs to highlighting-kate (#3672).
- Document ibooks specific epub metadata.
- Clarify that mathml is used for ODT math.
- Mention limitations of Literate Haskell Support (#3410, Joachim Breitner).
- Add documentation of limitations of grid tables (Stephen McDowell, #3864).
- Clarify that meta-json contains transformed values (Jakob Voß, #3491) Make clear that template variable
meta-json
does not contain plain text values or JSON output format but field values transformed to the selected output format.
-
COPYRIGHT:
- Clarify that templates are dual-licensed.
- Clarify that pandoc-types is BSD3 licensed.
- List new files not written by jgm (Albert Krewinkel).
- Update dates in copyright notices (Albert Krewinkel). This follows the suggestions given by the FSF for GPL licensed software. https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html
-
INSTALL.md:
- Improved instructions for tests with patterns.
- Put RPM-based distros on separate point (Mauro Bieg, #3449)
-
CONTRIBUTING.md:
- Fixed typos (Wandmalfarbe, #3479).
- Add “ask on pandoc-discuss” (Mauro Bieg).
-
Add lua filter documentation in
doc/lua-filters.md
. Note that the end of this document is autogenerated fromdata/pandoc.lua
usingmake doc/lua-filters.md
, which usestools/ldoc.ltp
(Albert Krewinkel). -
Add
doc/filters.md
. This is the old scripting tutorial from the website. -
Add
doc/using-the-pandoc-api.md
(#3289). This gives an introduction to using pandoc as a Haskell library.
Build infrastructure improvements
-
Removed
data/templates
submodule. Templates are now a subtree indata/templates
. This removes the need to dogit submodule update
. -
Renamed
tests
->test
. -
Remove
https
flag. Always build with HTTPS support. -
Use
file-embed
instead ofhsb2hs
to embed data files whenembed_data_files
flag is set.file-embed
gives us better dependency tracking: if a data file changes, ghc/stack/cabal know to recompile the Data module. This also removeshsb2hs
as a build dependency. -
Add
custom-setup
stanza to pandoc, lowercase field names. -
Add
static
Cabal flag. -
Name change OSX -> MacOS. Add a -MacOS suffix to mac package rather than -OSX. Changed local names from osx to macos.
-
make_macos_package.sh - Use strip to reduce executable size.
-
Revised binary linux package. Now a completely static executable is created, using Docker and alpine. We create both a deb and a tarball. The old
deb
directory has been replaced with alinux
directory. Runningmake
in thelinux
directory should perform the build, putting the binary packages inartifacts/
. -
linux/control.in
: addReplaces:
, so existing pandoc-citeproc and pandoc-data packages will be uninstalled; this package provides both (#3822). Add latex packages as ‘suggested’, update description. -
Remove cpphs build requirement – it is no longer needed.
-
Replaced
{deb,macos,windows}/stack.yaml
withstack.pkg.yaml
. -
Name change OSX -> macOS (ickc, #3869).
-
Fix casing of Linux, UNIX, and Windows (ickc).
-
.travis.yml
: create a source dist and do cabal build and test there. That way we catch errors due to files missing from the data section of pandoc.cabal. -
Makefile:
- Split
make haddock
frommake full
. - Add BRANCH variable for winpkg.
- Add
lint
target. - Improve
make full
. Disable optimizations. Build everything, inc. trypandoc and benchmarks. Use parallel build. - Allow
make test
to takeTESTARGS
.
- Split
-
Added new command tests (
Tests.Command
), using small text files intest/command/
. Any files added in this directory will be treated as shell tests (see smart.md for an example). This makes it very easy to add regression tests etc. -
Test fixes so we can find data files. In old tests & command tests, we now set the environment variable
pandoc_datadir
. In lua tests, we set the datadir explicitly. -
Refactored
compareOutput
in docx writer test. -
Consolidated some common functions in
Tests.Helper
. -
Small change to unbalanced bracket test to speed up test suite.
-
Speed up Native writer quickcheck tests.
-
Use tasty for tests rather than test-framework.
-
Add simple Emacs mode to help with Pandoc templates editing. (Václav Haisman, #3889).
tools/pandoc-template-mode.el