Skip to content

Commit

Permalink
Support complex figures.
Browse files Browse the repository at this point in the history
Thanks and credit go to Aner Lucero, who laid the groundwork for this
feature in the 2021 GSoC project. He contributed many changes, including
modifications to the readers for HTML, JATS, and LaTeX, and to the HTML
and JATS writers.

Readers (Aner Lucero):

- HTML reader: `<figure>` elements are parsed as figures, with the
  caption taken from the respective `<figcaption>` elements.

- JATS reader: The `<fig>` and `<caption>` elements are parsed into
  figure elements, even if the contents is more complex.

- LaTeX reader: support for figures with non-image contents and for
  subfigures.

Writers (Aner Lucero, Albert Krewinkel):

- DokuWiki, Haddock, Jira, Man, MediaWiki, Ms, Muse, PPTX, RTF, TEI,
  ZimWiki writers: Figures are rendered like Div elements.

- Asciidoc writer: The figure contents is unwrapped; each image in the
  the figure becomes a separate figure.

- Classic custom writers: Figures are passed to the global function
  `Figure(caption, contents, attr)`, where `caption` and `contents` are
  strings and `attr` is a table of key-value pairs.

- ConTeXt writer: Figures are wrapped in a "placefigure" environment
  with `\startplacefigure`/`\endplacefigure`, adding the features
  caption and listing title as properties. Subfigures are place in a
  single row with the `\startfloatcombination` environment.

- DocBook writer: Uses `mediaobject` elements, unless the figure contains
  subfigures or tables, in which case the figure content is unwrapped.

- Docx writer: figures with multiple content blocks are rendered as
  tables with style `FigureTable`; like before, single-image figures are
  still output as paragraphs with style `Figure` or `Captioned Figure`,
  depending on whether a caption is attached.

- DokuWiki writer: Caption and "alt-text" are no longer combined. The
  alt text of a figure will now be lost in the conversion.

- FB2 writer: The figure caption is added as alt text to the images in
  the figure; pre-existing alt texts are kept.

- ICML writer: Only single-image figures are supported. The contents of
  figures with additional elements gets unwrapped.

- HTML writer: the alt text is no longer constructed from the caption,
  as was the case with implicit figures. This reduces duplication, but
  comes at the risk of images that are missing alt texts. Authors should
  take care to provide alt texts for all images.

- JATS writer: The `<fig>` and `<caption>` elements are used write
  figures.

- LaTeX writer: complex figures, e.g. with non-image contents and
  subfigures, are supported. The `subfigure` template variable is set if
  the document contains subfigures, triggering the conditional loading
  of the *subcaption* package. Contants of figures that contain tables
  are become unwrapped, as longtable environments are not allowed within
  figures.

- Markdown writer: figures are output as implicit figures if possible,
  and via HTML otherwise.

- OpenDocument writer: A separate paragraph is generated for each block
  element in a figure, each with style `FigureWithCaption`. Behavior for
  single-image figures therefore remains unchanged.

- Org writer: Only the first element in a figure is given a caption;
  additional block elements in the figure are appended without any
  caption being added.

- RST writer: Single-image figures are supported as before; the contents
  of more complex images become nested in a container of type `float`.

- Texinfo writer: Figures are rendered as float with type `figure`.

- Textile writer: Figures are rendered with the help of HTML elements.

- XWiki: Figures are placed in a group.

Signed-off-by: Aner Lucero <4rgento@gmail.com>
  • Loading branch information
tarleb committed Dec 9, 2022
1 parent 5819e36 commit a63b636
Show file tree
Hide file tree
Showing 104 changed files with 1,493 additions and 615 deletions.
10 changes: 10 additions & 0 deletions cabal.project
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,13 @@ source-repository-package
type: git
location: https://github.com/jgm/citeproc
tag: cb54223919ecd327250f1b167e4e0c61473f402e

source-repository-package
type: git
location: https://github.com/tarleb/pandoc-types
tag: 24303ff98d5572fd56f470437f62848e6900729a

source-repository-package
type: git
location: https://github.com/pandoc/pandoc-lua-marshal
tag: a2a97e2af78326ea7841101d4ef56e74426b66c4
3 changes: 3 additions & 0 deletions data/templates/default.latex
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,9 @@ $if(numbersections)$
$else$
\setcounter{secnumdepth}{-\maxdimen} % remove section numbering
$endif$
$if(subfigure)$
\usepackage{subcaption}
$endif$
$if(beamer)$
$else$
$if(block-headings)$
Expand Down
6 changes: 6 additions & 0 deletions pandoc-lua-engine/src/Text/Pandoc/Lua/Writer/Classic.hs
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,12 @@ blockToCustom (CodeBlock attr str) =
blockToCustom (BlockQuote blocks) =
invoke "BlockQuote" (Stringify blocks)

blockToCustom (Figure attr (Caption _ cbody) content) =
invoke "Figure"
(Stringify cbody)
(Stringify content)
(attrToMap attr)

blockToCustom (Table _ blkCapt specs thead tbody tfoot) =
let (capt, aligns, widths, headers, rows) = toLegacyTable blkCapt specs thead tbody tfoot
aligns' = map show aligns
Expand Down
6 changes: 6 additions & 0 deletions pandoc-lua-engine/test/sample.lua
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,12 @@ function CaptionedImage(src, tit, caption, attr)
end
end

function Figure(caption, contents, attr)
return '<figure' .. attributes(attr) .. '>\n' .. contents ..
'\n<figcaption>' .. caption .. '</figcaption>\n' ..
'</figure>'
end

-- Caption is a string, aligns is an array of strings,
-- widths is an array of floats, headers is an array of
-- strings, rows is an array of arrays of strings.
Expand Down
3 changes: 2 additions & 1 deletion pandoc-lua-engine/test/writer.custom
Original file line number Diff line number Diff line change
Expand Up @@ -737,7 +737,8 @@ So is &lsquo;pine.&rsquo;</p>
<p>From &ldquo;Voyage dans la Lune&rdquo; by Georges Melies (1902):</p>

<figure>
<img src="lalune.jpg" id="" alt="lalune"/><figcaption>lalune</figcaption>
<img src="lalune.jpg" title="Voyage dans la Lune"/>
<figcaption>lalune</figcaption>
</figure>

<p>Here is a movie <img src="movie.jpg" title=""/> icon.</p>
Expand Down
34 changes: 13 additions & 21 deletions src/Text/Pandoc/Readers/HTML.hs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ module Text.Pandoc.Readers.HTML ( readHtml
) where

import Control.Applicative ((<|>))
import Control.Monad (guard, msum, mzero, unless, void)
import Control.Monad (guard, mzero, unless, void)
import Control.Monad.Except (throwError, catchError)
import Control.Monad.Reader (ask, asks, lift, local, runReaderT)
import Data.Text.Encoding.Base64 (encodeBase64)
Expand All @@ -36,6 +36,7 @@ import Data.List.Split (splitWhen)
import Data.List (foldl')
import qualified Data.Map as M
import Data.Maybe (fromMaybe, isJust, isNothing)
import Data.Either (partitionEithers)
import Data.Monoid (First (..))
import qualified Data.Set as Set
import Data.Text (Text)
Expand Down Expand Up @@ -63,8 +64,8 @@ import Text.Pandoc.Options (
extensionEnabled)
import Text.Pandoc.Parsing hiding ((<|>))
import Text.Pandoc.Shared (
addMetaField, blocksToInlines', extractSpaces,
htmlSpanLikeElements, renderTags', safeRead, tshow, formatCode)
addMetaField, extractSpaces, htmlSpanLikeElements, renderTags',
safeRead, tshow, formatCode)
import Text.Pandoc.URI (escapeURI)
import Text.Pandoc.Walk
import Text.TeXMath (readMathML, writeTeX)
Expand Down Expand Up @@ -581,24 +582,15 @@ pPara = do
<|> return (B.para contents)

pFigure :: PandocMonad m => TagParser m Blocks
pFigure = try $ do
TagOpen _ _ <- pSatisfy (matchTagOpen "figure" [])
skipMany pBlank
let pImg = (\x -> (Just x, Nothing)) <$>
(pInTag TagsOmittable "p" pImage <* skipMany pBlank)
pCapt = (\x -> (Nothing, Just x)) <$> do
bs <- pInTags "figcaption" block
return $ blocksToInlines' $ B.toList bs
pSkip = (Nothing, Nothing) <$ pSatisfy (not . matchTagClose "figure")
res <- many (pImg <|> pCapt <|> pSkip)
let mbimg = msum $ map fst res
let mbcap = msum $ map snd res
TagClose _ <- pSatisfy (matchTagClose "figure")
let caption = fromMaybe mempty mbcap
case B.toList <$> mbimg of
Just [Image attr _ (url, tit)] ->
return $ B.simpleFigureWith attr caption url tit
_ -> mzero
pFigure = do
TagOpen tag attrList <- pSatisfy $ matchTagOpen "figure" []
let parser = Left <$> pInTags "figcaption" block <|>
(Right <$> block)
(captions, rest) <- partitionEithers <$> manyTill parser (pCloses tag <|> eof)
-- Concatenate all captions together
return $ B.figureWith (toAttr attrList)
(B.simpleCaption (mconcat captions))
(mconcat rest)

pCodeBlock :: PandocMonad m => TagParser m Blocks
pCodeBlock = try $ do
Expand Down
35 changes: 11 additions & 24 deletions src/Text/Pandoc/Readers/JATS.hs
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ import Text.TeXMath (readMathML, writeTeX)
import qualified Data.Set as S (fromList, member)
import Data.Set ((\\))
import Text.Pandoc.Sources (ToSources(..), sourcesToText)
import qualified Data.Foldable as DF

type JATS m = StateT JATSState m

Expand Down Expand Up @@ -232,29 +231,17 @@ parseBlock (Elem e) =
terms' <- mapM getInlines terms
items' <- mapM getBlocks items
return (mconcat $ intersperse (str "; ") terms', items')
parseFigure =
-- if a simple caption and single graphic, we emit a standard
-- implicit figure. otherwise, we emit a div with the contents
case filterChildren (named "graphic") e of
[g] -> do
capt <- case filterChild (named "caption") e of
Just t -> mconcat .
intersperse linebreak <$>
mapM getInlines
(filterChildren (const True) t)
Nothing -> return mempty

let figAttributes = DF.toList $
("alt", ) . strContent <$>
filterChild (named "alt-text") e

return $ simpleFigureWith
(attrValue "id" e, [], figAttributes)
capt
(attrValue "href" g)
(attrValue "title" g)

_ -> divWith (attrValue "id" e, ["fig"], []) <$> getBlocks e
parseFigure = do
capt <- case filterChild (named "caption") e of
Just t -> mconcat . intersperse linebreak <$>
mapM getInlines (filterChildren (const True) t)
Nothing -> return mempty
contents <- getBlocks e

return $ figureWith
(attrValue "id" e, [], [])
(simpleCaption $ plain capt)
contents
parseFootnoteGroup = do
forM_ (filterChildren (named "fn") e) $ \fn -> do
let id' = attrValue "id" fn
Expand Down
61 changes: 29 additions & 32 deletions src/Text/Pandoc/Readers/LaTeX.hs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ import Data.Maybe (fromMaybe, maybeToList)
import qualified Data.Set as Set
import Data.Text (Text)
import qualified Data.Text as T
import Data.Either (partitionEithers)
import Skylighting (defaultSyntaxMap)
import System.FilePath (addExtension, replaceExtension, takeExtension)
import Text.Collate.Lang (renderLang)
Expand Down Expand Up @@ -1011,8 +1012,8 @@ environments = M.union (tableEnvironments blocks inline) $
, ("letter", env "letter" letterContents)
, ("minipage", env "minipage" $
skipopts *> spaces *> optional braced *> spaces *> blocks)
, ("figure", env "figure" $ skipopts *> figure)
, ("subfigure", env "subfigure" $ skipopts *> tok *> figure)
, ("figure", env "figure" $ skipopts *> figure')
, ("subfigure", env "subfigure" $ skipopts *> tok *> figure')
, ("center", divWith ("", ["center"], []) <$> env "center" blocks)
, ("quote", blockQuote <$> env "quote" blocks)
, ("quotation", blockQuote <$> env "quotation" blocks)
Expand Down Expand Up @@ -1164,37 +1165,33 @@ letterContents = do
_ -> mempty
return $ addr <> bs -- sig added by \closing

figure :: PandocMonad m => LP m Blocks
figure = try $ do
figure' :: PandocMonad m => LP m Blocks
figure' = try $ do
resetCaption
blocks >>= addImageCaption

addImageCaption :: PandocMonad m => Blocks -> LP m Blocks
addImageCaption = walkM go
where go p@(Para [Image attr@(_, cls, kvs) _ (src, tit)])
| not ("fig:" `T.isPrefixOf` tit) = do
st <- getState
case sCaption st of
Nothing -> return p
Just (Caption _mbshort bs) -> do
let mblabel = sLastLabel st
let attr' = case mblabel of
Just lab -> (lab, cls, kvs)
Nothing -> attr
case attr' of
("", _, _) -> return ()
(ident, _, _) -> do
num <- getNextNumber sLastFigureNum
setState
st{ sLastFigureNum = num
, sLabels = M.insert ident
[Str (renderDottedNum num)] (sLabels st) }

return $ SimpleFigure attr'
(maybe id removeLabel mblabel
(blocksToInlines bs))
(src, tit)
go x = return x
innerContent <- many $ try (Left <$> label) <|> (Right <$> block)
let content = walk go $ mconcat $ snd $ partitionEithers innerContent
st <- getState
let caption' = case sCaption st of
Nothing -> B.emptyCaption
Just capt -> capt
let mblabel = sLastLabel st
let attr = case mblabel of
Just lab -> (lab, [], [])
Nothing -> nullAttr
case mblabel of
Nothing -> pure ()
Just lab -> do
num <- getNextNumber sLastFigureNum
setState
st { sLastFigureNum = num
, sLabels = M.insert lab [Str (renderDottedNum num)] (sLabels st)
}
return $ B.figureWith attr caption' content

where
-- Remove the `Image` caption b.c. it's on the `Figure`
go (Para [Image attr _ target]) = Plain [Image attr [] target]
go x = x

coloredBlock :: PandocMonad m => Text -> LP m Blocks
coloredBlock stylename = try $ do
Expand Down
3 changes: 2 additions & 1 deletion src/Text/Pandoc/Readers/LaTeX/Math.hs
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,8 @@ addQed bs =
qedSign = B.str "\xa0\x25FB"

italicize :: Block -> Block
italicize x@(Para [Image{}]) = x -- see #6925
italicize x@(Para [Image{}]) = x -- see #6925
italicize x@(Plain [Image{}]) = x -- ditto
italicize (Para ils) = Para [Emph ils]
italicize (Plain ils) = Plain [Emph ils]
italicize x = x
11 changes: 3 additions & 8 deletions src/Text/Pandoc/Readers/Org/Blocks.hs
Original file line number Diff line number Diff line change
Expand Up @@ -489,15 +489,10 @@ figure = try $ do
figKeyVals = blockAttrKeyValues figAttrs
attr = (figLabel, mempty, figKeyVals)
in if isFigure
then (\c ->
B.simpleFigureWith
attr c imgSrc (unstackFig figName)) <$> figCaption
then (\c -> B.figureWith attr (B.simpleCaption (B.plain c))
(B.plain $ B.image imgSrc figName mempty))
<$> figCaption
else B.para . B.imageWith attr imgSrc figName <$> figCaption
unstackFig :: Text -> Text
unstackFig figName =
if "fig:" `T.isPrefixOf` figName
then T.drop 4 figName
else figName

-- | Succeeds if looking at the end of the current paragraph
endOfParagraph :: Monad m => OrgParser m ()
Expand Down
22 changes: 21 additions & 1 deletion src/Text/Pandoc/Shared.hs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ module Text.Pandoc.Shared (
compactify,
compactifyDL,
linesToPara,
figureDiv,
makeSections,
uniqueIdent,
inlineListToIdentifier,
Expand Down Expand Up @@ -90,7 +91,8 @@ import Data.Containers.ListUtils (nubOrd)
import Data.Char (isAlpha, isLower, isSpace, isUpper, toLower, isAlphaNum,
generalCategory, GeneralCategory(NonSpacingMark,
SpacingCombiningMark, EnclosingMark, ConnectorPunctuation))
import Data.List (find, intercalate, intersperse, sortOn, foldl', groupBy)
import Data.List (find, foldl', groupBy, intercalate, intersperse,
union, sortOn)
import qualified Data.Map as M
import Data.Maybe (mapMaybe, fromMaybe)
import Data.Monoid (Any (..))
Expand Down Expand Up @@ -427,6 +429,23 @@ combineLines = intercalate [LineBreak]
linesToPara :: [[Inline]] -> Block
linesToPara = Para . combineLines

-- | Creates a Div block from figure components. The intended use is in
-- writers of formats that do not have markup support for figures.
--
-- The resulting div is given the class @figure@ and contains the figure
-- body and the figure caption. The latter is wrapped in a 'Div' of
-- class @caption@, with the stringified @short-caption@ as attribute.
figureDiv :: Attr -> Caption -> [Block] -> Block
figureDiv (ident, classes, kv) (Caption shortcapt longcapt) body =
let divattr = ( ident
, ["figure"] `union` classes
, kv
)
captkv = maybe mempty (\s -> [("short-caption", stringify s)]) shortcapt
capt = [Div ("", ["caption"], captkv) longcapt | not (null longcapt)]
in Div divattr (body ++ capt)

-- | Returns 'True' iff the given element is a 'Para'.
isPara :: Block -> Bool
isPara (Para _) = True
isPara _ = False
Expand Down Expand Up @@ -830,6 +849,7 @@ blockToInlines (Table _ _ _ (TableHead _ hbd) bodies (TableFoot _ fbd)) =
unTableBodies = concatMap unTableBody
blockToInlines (Div _ blks) = blocksToInlines' blks
blockToInlines Null = mempty
blockToInlines (Figure _ _ body) = blocksToInlines' body

blocksToInlinesWithSep :: Inlines -> [Block] -> Inlines
blocksToInlinesWithSep sep =
Expand Down
26 changes: 20 additions & 6 deletions src/Text/Pandoc/Writers/AsciiDoc.hs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
{- |
Module : Text.Pandoc.Writers.AsciiDoc
Expand Down Expand Up @@ -29,6 +30,7 @@ import Data.Maybe (fromMaybe, isJust)
import qualified Data.Set as Set
import qualified Data.Text as T
import Data.Text (Text)
import System.FilePath (dropExtension)
import Text.Pandoc.Class.PandocMonad (PandocMonad, report)
import Text.Pandoc.Definition
import Text.Pandoc.ImageSize
Expand Down Expand Up @@ -152,10 +154,6 @@ blockToAsciiDoc opts (Div (id',"section":_,_)
blockToAsciiDoc opts (Plain inlines) = do
contents <- inlineListToAsciiDoc opts inlines
return $ contents <> blankline
blockToAsciiDoc opts (SimpleFigure attr alternate (src, tit))
-- image::images/logo.png[Company logo, title="blah"]
= (\args -> "image::" <> args <> blankline) <$>
imageArguments opts attr alternate src tit
blockToAsciiDoc opts (Para inlines) = do
contents <- inlineListToAsciiDoc opts inlines
-- escape if para starts with ordered list marker
Expand Down Expand Up @@ -189,7 +187,23 @@ blockToAsciiDoc opts (Header level (ident,_,_) inlines) = do
return $ identifier $$
nowrap (text (replicate (level + 1) '=') <> space <> contents) <>
blankline

blockToAsciiDoc opts (Figure attr (Caption _ longcapt) body) = do
-- Images in figures all get rendered as individual block-level images
-- with the given caption. Non-image elements are rendered unchanged.
capt <- inlineListToAsciiDoc opts (blocksToInlines longcapt)
let renderFigElement = \case
Plain [Image imgAttr alternate (src, tit)] -> do
args <- imageArguments opts imgAttr alternate src tit
let figAttributes = case attr of
("", _, _) -> empty
(ident, _, _) -> literal $ "[#" <> ident <> "]"
-- .Figure caption
-- image::images/logo.png[Company logo, title="blah"]
return $ "." <> nowrap capt $$
figAttributes $$
"image::" <> args <> blankline
blk -> blockToAsciiDoc opts blk
vcat <$> mapM renderFigElement body
blockToAsciiDoc _ (CodeBlock (_,classes,_) str) = return $ flush (
if null classes
then "...." $$ literal str $$ "...."
Expand Down Expand Up @@ -615,7 +629,7 @@ imageArguments :: PandocMonad m => WriterOptions ->
ADW m (Doc Text)
imageArguments opts attr altText src title = do
let txt = if null altText || (altText == [Str ""])
then [Str "image"]
then [Str . T.pack . dropExtension $ T.unpack src]
else altText
linktext <- inlineListToAsciiDoc opts txt
let linktitle = if T.null title
Expand Down
Loading

0 comments on commit a63b636

Please sign in to comment.