Skip to content

Commit

Permalink
Support complex figures. [API change]
Browse files Browse the repository at this point in the history
Thanks and credit go to Aner Lucero, who laid the groundwork for this
feature in the 2021 GSoC project. He contributed many changes, including
modifications to the readers for HTML, JATS, and LaTeX, and to the HTML
and JATS writers.

Shared (Albert Krewinkel):

- The new function `figureDiv`, exported from `Text.Pandoc.Shared`,
  offers a standardized way to convert a figure into a Div element.

Readers (Aner Lucero):

- HTML reader: `<figure>` elements are parsed as figures, with the
  caption taken from the respective `<figcaption>` elements.

- JATS reader: The `<fig>` and `<caption>` elements are parsed into
  figure elements, even if the contents is more complex.

- LaTeX reader: support for figures with non-image contents and for
  subfigures.

- Markdown reader: paragraphs containing just an image are treated as
  figures if the `implicit_figures` extension is enabled. The identifier
  is used as the figure's identifier and the image description is also
  used as figure caption; all other attributes are treated as belonging
  to the image.

Writers (Aner Lucero, Albert Krewinkel):

- DokuWiki, Haddock, Jira, Man, MediaWiki, Ms, Muse, PPTX, RTF, TEI,
  ZimWiki writers: Figures are rendered like Div elements.

- Asciidoc writer: The figure contents is unwrapped; each image in the
  the figure becomes a separate figure.

- Classic custom writers: Figures are passed to the global function
  `Figure(caption, contents, attr)`, where `caption` and `contents` are
  strings and `attr` is a table of key-value pairs.

- ConTeXt writer: Figures are wrapped in a "placefigure" environment
  with `\startplacefigure`/`\endplacefigure`, adding the features
  caption and listing title as properties. Subfigures are place in a
  single row with the `\startfloatcombination` environment.

- DocBook writer: Uses `mediaobject` elements, unless the figure contains
  subfigures or tables, in which case the figure content is unwrapped.

- Docx writer: figures with multiple content blocks are rendered as
  tables with style `FigureTable`; like before, single-image figures are
  still output as paragraphs with style `Figure` or `Captioned Figure`,
  depending on whether a caption is attached.

- DokuWiki writer: Caption and "alt-text" are no longer combined. The
  alt text of a figure will now be lost in the conversion.

- FB2 writer: The figure caption is added as alt text to the images in
  the figure; pre-existing alt texts are kept.

- ICML writer: Only single-image figures are supported. The contents of
  figures with additional elements gets unwrapped.

- HTML writer: the alt text is no longer constructed from the caption,
  as was the case with implicit figures. This reduces duplication, but
  comes at the risk of images that are missing alt texts. Authors should
  take care to provide alt texts for all images.

  Some readers, most notably the Markdown reader with the
  `implicit_figures` extension, add a caption that's identical to the
  image description. The writer checks for this and adds an
  `aria-hidden` attribute to the `<figcaption>` element in that case.

- JATS writer: The `<fig>` and `<caption>` elements are used write
  figures.

- LaTeX writer: complex figures, e.g. with non-image contents and
  subfigures, are supported. The `subfigure` template variable is set if
  the document contains subfigures, triggering the conditional loading
  of the *subcaption* package. Contants of figures that contain tables
  are become unwrapped, as longtable environments are not allowed within
  figures.

- Markdown writer: figures are output as implicit figures if possible,
  via HTML if the `raw_html` extension is enabled, and as Div elements
  otherwise.

- OpenDocument writer: A separate paragraph is generated for each block
  element in a figure, each with style `FigureWithCaption`. Behavior for
  single-image figures therefore remains unchanged.

- Org writer: Only the first element in a figure is given a caption;
  additional block elements in the figure are appended without any
  caption being added.

- RST writer: Single-image figures are supported as before; the contents
  of more complex images become nested in a container of type `float`.

- Texinfo writer: Figures are rendered as float with type `figure`.

- Textile writer: Figures are rendered with the help of HTML elements.

- XWiki: Figures are placed in a group.

Co-authored-by: Aner Lucero <4rgento@gmail.com>
  • Loading branch information
tarleb and argent0 committed Jan 12, 2023
1 parent c99cc53 commit 09e70f2
Show file tree
Hide file tree
Showing 99 changed files with 1,491 additions and 596 deletions.
10 changes: 10 additions & 0 deletions cabal.project
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,13 @@ source-repository-package
type: git
location: https://github.com/jgm/citeproc
tag: a8b7748537401d6c9749406a49df7008cea8c7f2

source-repository-package
type: git
location: https://github.com/tarleb/pandoc-types
tag: f84b7359765a2798f22efe4e9457538cda7a8d4a

source-repository-package
type: git
location: https://github.com/pandoc/pandoc-lua-marshal
tag: a2a97e2af78326ea7841101d4ef56e74426b66c4
3 changes: 3 additions & 0 deletions data/templates/default.latex
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,9 @@ $if(numbersections)$
$else$
\setcounter{secnumdepth}{-\maxdimen} % remove section numbering
$endif$
$if(subfigure)$
\usepackage{subcaption}
$endif$
$if(beamer)$
$else$
$if(block-headings)$
Expand Down
6 changes: 6 additions & 0 deletions pandoc-lua-engine/src/Text/Pandoc/Lua/Writer/Classic.hs
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,12 @@ blockToCustom (CodeBlock attr str) =
blockToCustom (BlockQuote blocks) =
invoke "BlockQuote" (Stringify blocks)

blockToCustom (Figure attr (Caption _ cbody) content) =
invoke "Figure"
(Stringify cbody)
(Stringify content)
(attrToMap attr)

blockToCustom (Table _ blkCapt specs thead tbody tfoot) =
let (capt, aligns, widths, headers, rows) = toLegacyTable blkCapt specs thead tbody tfoot
aligns' = map show aligns
Expand Down
6 changes: 6 additions & 0 deletions pandoc-lua-engine/test/sample.lua
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,12 @@ function CaptionedImage(src, tit, caption, attr)
end
end

function Figure(caption, contents, attr)
return '<figure' .. attributes(attr) .. '>\n' .. contents ..
'\n<figcaption>' .. caption .. '</figcaption>\n' ..
'</figure>'
end

-- Caption is a string, aligns is an array of strings,
-- widths is an array of floats, headers is an array of
-- strings, rows is an array of arrays of strings.
Expand Down
3 changes: 2 additions & 1 deletion pandoc-lua-engine/test/writer.custom
Original file line number Diff line number Diff line change
Expand Up @@ -737,7 +737,8 @@ So is &lsquo;pine.&rsquo;</p>
<p>From &ldquo;Voyage dans la Lune&rdquo; by Georges Melies (1902):</p>

<figure>
<img src="lalune.jpg" id="" alt="lalune"/><figcaption>lalune</figcaption>
<img src="lalune.jpg" title="Voyage dans la Lune"/>
<figcaption>lalune</figcaption>
</figure>

<p>Here is a movie <img src="movie.jpg" title=""/> icon.</p>
Expand Down
34 changes: 13 additions & 21 deletions src/Text/Pandoc/Readers/HTML.hs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ module Text.Pandoc.Readers.HTML ( readHtml
) where

import Control.Applicative ((<|>))
import Control.Monad (guard, msum, mzero, unless, void)
import Control.Monad (guard, mzero, unless, void)
import Control.Monad.Except (throwError, catchError)
import Control.Monad.Reader (ask, asks, lift, local, runReaderT)
import Data.Text.Encoding.Base64 (encodeBase64)
Expand All @@ -36,6 +36,7 @@ import Data.List.Split (splitWhen)
import Data.List (foldl')
import qualified Data.Map as M
import Data.Maybe (fromMaybe, isJust, isNothing)
import Data.Either (partitionEithers)
import Data.Monoid (First (..))
import qualified Data.Set as Set
import Data.Text (Text)
Expand Down Expand Up @@ -63,8 +64,8 @@ import Text.Pandoc.Options (
extensionEnabled)
import Text.Pandoc.Parsing hiding ((<|>))
import Text.Pandoc.Shared (
addMetaField, blocksToInlines', extractSpaces,
htmlSpanLikeElements, renderTags', safeRead, tshow, formatCode)
addMetaField, extractSpaces, htmlSpanLikeElements, renderTags',
safeRead, tshow, formatCode)
import Text.Pandoc.URI (escapeURI)
import Text.Pandoc.Walk
import Text.TeXMath (readMathML, writeTeX)
Expand Down Expand Up @@ -581,24 +582,15 @@ pPara = do
<|> return (B.para contents)

pFigure :: PandocMonad m => TagParser m Blocks
pFigure = try $ do
TagOpen _ _ <- pSatisfy (matchTagOpen "figure" [])
skipMany pBlank
let pImg = (\x -> (Just x, Nothing)) <$>
(pInTag TagsOmittable "p" pImage <* skipMany pBlank)
pCapt = (\x -> (Nothing, Just x)) <$> do
bs <- pInTags "figcaption" block
return $ blocksToInlines' $ B.toList bs
pSkip = (Nothing, Nothing) <$ pSatisfy (not . matchTagClose "figure")
res <- many (pImg <|> pCapt <|> pSkip)
let mbimg = msum $ map fst res
let mbcap = msum $ map snd res
TagClose _ <- pSatisfy (matchTagClose "figure")
let caption = fromMaybe mempty mbcap
case B.toList <$> mbimg of
Just [Image attr _ (url, tit)] ->
return $ B.simpleFigureWith attr caption url tit
_ -> mzero
pFigure = do
TagOpen tag attrList <- pSatisfy $ matchTagOpen "figure" []
let parser = Left <$> pInTags "figcaption" block <|>
(Right <$> block)
(captions, rest) <- partitionEithers <$> manyTill parser (pCloses tag <|> eof)
-- Concatenate all captions together
return $ B.figureWith (toAttr attrList)
(B.simpleCaption (mconcat captions))
(mconcat rest)

pCodeBlock :: PandocMonad m => TagParser m Blocks
pCodeBlock = try $ do
Expand Down
35 changes: 11 additions & 24 deletions src/Text/Pandoc/Readers/JATS.hs
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ import Text.TeXMath (readMathML, writeTeX)
import qualified Data.Set as S (fromList, member)
import Data.Set ((\\))
import Text.Pandoc.Sources (ToSources(..), sourcesToText)
import qualified Data.Foldable as DF

type JATS m = StateT JATSState m

Expand Down Expand Up @@ -232,29 +231,17 @@ parseBlock (Elem e) =
terms' <- mapM getInlines terms
items' <- mapM getBlocks items
return (mconcat $ intersperse (str "; ") terms', items')
parseFigure =
-- if a simple caption and single graphic, we emit a standard
-- implicit figure. otherwise, we emit a div with the contents
case filterChildren (named "graphic") e of
[g] -> do
capt <- case filterChild (named "caption") e of
Just t -> mconcat .
intersperse linebreak <$>
mapM getInlines
(filterChildren (const True) t)
Nothing -> return mempty

let figAttributes = DF.toList $
("alt", ) . strContent <$>
filterChild (named "alt-text") e

return $ simpleFigureWith
(attrValue "id" e, [], figAttributes)
capt
(attrValue "href" g)
(attrValue "title" g)

_ -> divWith (attrValue "id" e, ["fig"], []) <$> getBlocks e
parseFigure = do
capt <- case filterChild (named "caption") e of
Just t -> mconcat . intersperse linebreak <$>
mapM getInlines (filterChildren (const True) t)
Nothing -> return mempty
contents <- getBlocks e

return $ figureWith
(attrValue "id" e, [], [])
(simpleCaption $ plain capt)
contents
parseFootnoteGroup = do
forM_ (filterChildren (named "fn") e) $ \fn -> do
let id' = attrValue "id" fn
Expand Down
61 changes: 29 additions & 32 deletions src/Text/Pandoc/Readers/LaTeX.hs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ import Data.Maybe (fromMaybe, maybeToList)
import qualified Data.Set as Set
import Data.Text (Text)
import qualified Data.Text as T
import Data.Either (partitionEithers)
import Skylighting (defaultSyntaxMap)
import System.FilePath (addExtension, replaceExtension, takeExtension)
import Text.Collate.Lang (renderLang)
Expand Down Expand Up @@ -1011,8 +1012,8 @@ environments = M.union (tableEnvironments blocks inline) $
, ("letter", env "letter" letterContents)
, ("minipage", env "minipage" $
skipopts *> spaces *> optional braced *> spaces *> blocks)
, ("figure", env "figure" $ skipopts *> figure)
, ("subfigure", env "subfigure" $ skipopts *> tok *> figure)
, ("figure", env "figure" $ skipopts *> figure')
, ("subfigure", env "subfigure" $ skipopts *> tok *> figure')
, ("center", divWith ("", ["center"], []) <$> env "center" blocks)
, ("quote", blockQuote <$> env "quote" blocks)
, ("quotation", blockQuote <$> env "quotation" blocks)
Expand Down Expand Up @@ -1164,37 +1165,33 @@ letterContents = do
_ -> mempty
return $ addr <> bs -- sig added by \closing

figure :: PandocMonad m => LP m Blocks
figure = try $ do
figure' :: PandocMonad m => LP m Blocks
figure' = try $ do
resetCaption
blocks >>= addImageCaption

addImageCaption :: PandocMonad m => Blocks -> LP m Blocks
addImageCaption = walkM go
where go p@(Para [Image attr@(_, cls, kvs) _ (src, tit)])
| not ("fig:" `T.isPrefixOf` tit) = do
st <- getState
case sCaption st of
Nothing -> return p
Just (Caption _mbshort bs) -> do
let mblabel = sLastLabel st
let attr' = case mblabel of
Just lab -> (lab, cls, kvs)
Nothing -> attr
case attr' of
("", _, _) -> return ()
(ident, _, _) -> do
num <- getNextNumber sLastFigureNum
setState
st{ sLastFigureNum = num
, sLabels = M.insert ident
[Str (renderDottedNum num)] (sLabels st) }

return $ SimpleFigure attr'
(maybe id removeLabel mblabel
(blocksToInlines bs))
(src, tit)
go x = return x
innerContent <- many $ try (Left <$> label) <|> (Right <$> block)
let content = walk go $ mconcat $ snd $ partitionEithers innerContent
st <- getState
let caption' = case sCaption st of
Nothing -> B.emptyCaption
Just capt -> capt
let mblabel = sLastLabel st
let attr = case mblabel of
Just lab -> (lab, [], [])
Nothing -> nullAttr
case mblabel of
Nothing -> pure ()
Just lab -> do
num <- getNextNumber sLastFigureNum
setState
st { sLastFigureNum = num
, sLabels = M.insert lab [Str (renderDottedNum num)] (sLabels st)
}
return $ B.figureWith attr caption' content

where
-- Remove the `Image` caption b.c. it's on the `Figure`
go (Para [Image attr _ target]) = Plain [Image attr [] target]
go x = x

coloredBlock :: PandocMonad m => Text -> LP m Blocks
coloredBlock stylename = try $ do
Expand Down
3 changes: 2 additions & 1 deletion src/Text/Pandoc/Readers/LaTeX/Math.hs
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,8 @@ addQed bs =
qedSign = B.str "\xa0\x25FB"

italicize :: Block -> Block
italicize x@(Para [Image{}]) = x -- see #6925
italicize x@(Para [Image{}]) = x -- see #6925
italicize x@(Plain [Image{}]) = x -- ditto
italicize (Para ils) = Para [Emph ils]
italicize (Plain ils) = Plain [Emph ils]
italicize x = x
13 changes: 12 additions & 1 deletion src/Text/Pandoc/Readers/Markdown.hs
Original file line number Diff line number Diff line change
Expand Up @@ -1046,7 +1046,7 @@ para = try $ do
[Image attr figCaption (src, tit)]
| extensionEnabled Ext_implicit_figures exts
, not (null figCaption) -> do
B.simpleFigureWith attr (B.fromList figCaption) src tit
implicitFigure attr (B.fromList figCaption) src tit

_ -> constr inlns

Expand Down Expand Up @@ -1077,6 +1077,17 @@ para = try $ do
plain :: PandocMonad m => MarkdownParser m (F Blocks)
plain = fmap B.plain . trimInlinesF <$> inlines1

implicitFigure :: Attr -> Inlines -> Text -> Text -> Blocks
implicitFigure (ident, classes, attribs) capt url title =
let alt = case "alt" `lookup` attribs of
Just alt' -> B.text alt'
_ -> capt
attribs' = filter ((/= "alt") . fst) attribs
figattr = (ident, mempty, mempty)
caption = B.simpleCaption $ B.plain capt
figbody = B.plain $ B.imageWith ("", classes, attribs') url title alt
in B.figureWith figattr caption figbody

--
-- raw html
--
Expand Down
11 changes: 3 additions & 8 deletions src/Text/Pandoc/Readers/Org/Blocks.hs
Original file line number Diff line number Diff line change
Expand Up @@ -489,15 +489,10 @@ figure = try $ do
figKeyVals = blockAttrKeyValues figAttrs
attr = (figLabel, mempty, figKeyVals)
in if isFigure
then (\c ->
B.simpleFigureWith
attr c imgSrc (unstackFig figName)) <$> figCaption
then (\c -> B.figureWith attr (B.simpleCaption (B.plain c))
(B.plain $ B.image imgSrc figName mempty))
<$> figCaption
else B.para . B.imageWith attr imgSrc figName <$> figCaption
unstackFig :: Text -> Text
unstackFig figName =
if "fig:" `T.isPrefixOf` figName
then T.drop 4 figName
else figName

-- | Succeeds if looking at the end of the current paragraph
endOfParagraph :: Monad m => OrgParser m ()
Expand Down
10 changes: 7 additions & 3 deletions src/Text/Pandoc/Readers/RST.hs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import Control.Monad (forM_, guard, liftM, mplus, mzero, when)
import Control.Monad.Except (throwError)
import Control.Monad.Identity (Identity (..))
import Data.Char (isHexDigit, isSpace, toUpper, isAlphaNum)
import Data.List (deleteFirstsBy, elemIndex, nub, sort, transpose)
import Data.List (deleteFirstsBy, elemIndex, nub, partition, sort, transpose)
import qualified Data.Map as M
import Data.Maybe (fromMaybe, maybeToList, isJust)
import Data.Sequence (ViewR (..), viewr)
Expand Down Expand Up @@ -730,8 +730,12 @@ directive' = do
"figure" -> do
(caption, legend) <- parseFromString' extractCaption body'
let src = escapeURI $ trim top
return $ B.simpleFigureWith
(imgAttr "figclass") caption src "" <> legend
let (ident, cls, kvs) = imgAttr "class"
let (figclasskv, kvs') = partition ((== "figclass") . fst) kvs
let figattr = ("", concatMap (T.words . snd) figclasskv, [])
let capt = B.caption Nothing (B.plain caption <> legend)
return $ B.figureWith figattr capt $
B.plain (B.imageWith (ident, cls, kvs') src "" (B.text src))
"image" -> do
let src = escapeURI $ trim top
let alt = B.str $ maybe "image" trim $ lookup "alt" fields
Expand Down
22 changes: 21 additions & 1 deletion src/Text/Pandoc/Shared.hs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ module Text.Pandoc.Shared (
compactify,
compactifyDL,
linesToPara,
figureDiv,
makeSections,
uniqueIdent,
inlineListToIdentifier,
Expand Down Expand Up @@ -90,7 +91,8 @@ import Data.Containers.ListUtils (nubOrd)
import Data.Char (isAlpha, isLower, isSpace, isUpper, toLower, isAlphaNum,
generalCategory, GeneralCategory(NonSpacingMark,
SpacingCombiningMark, EnclosingMark, ConnectorPunctuation))
import Data.List (find, intercalate, intersperse, sortOn, foldl', groupBy)
import Data.List (find, foldl', groupBy, intercalate, intersperse,
union, sortOn)
import qualified Data.Map as M
import Data.Maybe (mapMaybe, fromMaybe)
import Data.Monoid (Any (..))
Expand Down Expand Up @@ -427,6 +429,23 @@ combineLines = intercalate [LineBreak]
linesToPara :: [[Inline]] -> Block
linesToPara = Para . combineLines

-- | Creates a Div block from figure components. The intended use is in
-- writers of formats that do not have markup support for figures.
--
-- The resulting div is given the class @figure@ and contains the figure
-- body and the figure caption. The latter is wrapped in a 'Div' of
-- class @caption@, with the stringified @short-caption@ as attribute.
figureDiv :: Attr -> Caption -> [Block] -> Block
figureDiv (ident, classes, kv) (Caption shortcapt longcapt) body =
let divattr = ( ident
, ["figure"] `union` classes
, kv
)
captkv = maybe mempty (\s -> [("short-caption", stringify s)]) shortcapt
capt = [Div ("", ["caption"], captkv) longcapt | not (null longcapt)]
in Div divattr (body ++ capt)

-- | Returns 'True' iff the given element is a 'Para'.
isPara :: Block -> Bool
isPara (Para _) = True
isPara _ = False
Expand Down Expand Up @@ -830,6 +849,7 @@ blockToInlines (Table _ _ _ (TableHead _ hbd) bodies (TableFoot _ fbd)) =
unTableBodies = concatMap unTableBody
blockToInlines (Div _ blks) = blocksToInlines' blks
blockToInlines Null = mempty
blockToInlines (Figure _ _ body) = blocksToInlines' body

blocksToInlinesWithSep :: Inlines -> [Block] -> Inlines
blocksToInlinesWithSep sep =
Expand Down
Loading

0 comments on commit 09e70f2

Please sign in to comment.