Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simpleFigure builder [GSoC 2021] #90

Closed
wants to merge 8 commits into from
Closed

Conversation

argent0
Copy link
Contributor

@argent0 argent0 commented Jun 4, 2021

Introducing new constructor to handle figures. This is in the context of pandoc's issue #3177. Use of these on Readers and Writers was submitted with the pandoc pull request.

The SimpleFigure construction

To address some of the issues with the previous handling of figures, we introduced, along with some helper functions, the SimpleFigure pattern synonym

-- | Constructor for a figure with a single image.
--
-- It can be used to construct a figure:
--
-- >>> SimpleFigure nullAttr [] (T.pack "", T.pack "title")
-- Para [Image ("",[],[]) [] ("","fig:title")]
--
--
-- It can be used to pattern match:
--
-- >>> let img = Para [Image undefined undefined (undefined, T.pack "title")]
-- >>> case img of { SimpleFigure _ _ _ -> True; _ -> False }
-- False
-- >>> let fig = Para [Image undefined undefined (undefined, T.pack "fig:title")]
-- >>> case fig of { SimpleFigure _ _ tit -> snd tit; _ -> T.pack "" }
-- "title"
pattern SimpleFigure :: Attr -> [Inline] -> Target -> Block
pattern SimpleFigure attributes figureCaption tgt <-
    Para [Image attributes figureCaption
        (isFigureTarget -> Just tgt)]  where
  SimpleFigure attributes figureCaption tgt =
    Para [Image attributes figureCaption (second ("fig:" <>) tgt)]

This is very much like adding a new constructor to the Block type but with some differences.

Some benefits of this approach are:

  • It is backward compatible with the previous construction.

  • It formalizes, without enforcing, the roles for the construction's elements.

  • It makes it easier to find where are figures handled in the code for both Readers and Writers.

    Here is an example diff from the RST Reader in the pandoc pull request:

    - return $ B.para (B.imageWith (imgAttr "figclass") src "fig:"
    -             caption) <> legend
    + return $ B.simpleFigureWith
    +     (imgAttr "figclass") caption src "" <> legend

Of course, there are also some limitations:

  • It doesn't enforce the construction on new code, as an actual constructor would do. For example, code with non-exhaustive patterns matches won't rise a warning if SimpleFigure is not handled.

  • It lacks an explicit alt-text field. This can be included using attributes, but I would have liked to give it more importance.

I believe this is a modest improvement of the previous code. It provides an explicit representation for figures. It could go a little further with an explicit constructor; but keeping the behavior backward compatible prevents breaking some workflows down the line, for example someone using a lua-filter.

The Figure constructor

The SimpleFigure constructor only addressed one type of figures, the ones with only one image in them. To address the general case of the concept of a document within a document detailed in the section on figures, we introduced the following Block constructor:

data Block =
	-- ...
	-- | Figure, with attributes, caption and caption position, width
	-- (optional), and content (list of blocks)
	| Figure Attr Caption [Block]
	-- ...

This includes other helper functions and tests and is part of the pandoc-types pull request. It's based on the previous work by @despresc. Modifications include: the removal of the CaptionPos argument that can be specified as an attribute; and the code involving tables.

Tests for this constructor are also included with the pull request.

This constructor allows pandoc to capture the semantics of figures from different formats into its internal representation. It also helps when writing output in formats that support figures.

The new Figure block represents a figure with attributes, caption,
caption position, and arbitrary block content.

Since the figure block represents captioned material, the Caption in
Table is no longer necessary, and has been removed.
@jgm
Copy link
Owner

jgm commented Jun 5, 2021

Some comments:

  1. Adding a simpleFigure function to builder seems like a good idea independently of what we decide about the AST. We could replace the use of B.para [B.image ...] to render figures in current writes with this builder. Then, if we later add a figure element to the AST, we can simply change simpleFigure to produce this instead.
  2. I don't really understand why the builder function is thought of as an alternative to a new AST element. We could of course do both. AST elements generally have builder functions to construct them, don't they?
  3. I wasn't clear about why the alternative you're considering is a Graphic constructor intended for figures that don't appear in the table of figures? I think I'm missing some background here.

@argent0
Copy link
Contributor Author

argent0 commented Jun 6, 2021

2. I don't really understand why the builder function is thought of as an _alternative_ to a new AST element.  We could of course do both.  AST elements generally have builder functions to construct them, don't they?

Right, doing the AST element would incluce the builder function (I'll clarify this in the top message)

3. I wasn't clear about why the alternative you're considering is a `Graphic` constructor intended for figures that don't appear in the table of figures?  I think I'm missing some background here.

Because, as I see it now, I was drawing too much inspiration from JATS's <graphic>' and '<fig> tags and was also considering:

data Block
  = ...
    | Graphic Attr Caption Target
    | Figure Attr Caption [Block]
    ...

But now I believe that using the current Div constructor

data Block
   =...
   | Div Attr [Block]
   ...

Could cover the JATS's <fig> If it had a Caption.

@tarleb
Copy link
Contributor

tarleb commented Jun 7, 2021

We should probably move the discussion about the new constructor back to the fork.

I'm wondering whether it would make sense to let the constructor also add the fig: title prefix that pandoc uses to mark images as figures? So if we were to change the convention behind that, we'd have to update only a single location.

test/test-pandoc-types.hs Outdated Show resolved Hide resolved
@argent0 argent0 changed the title simpleFigure builder [GSoC 2021] [API change] simpleFigure builder [GSoC 2021] Jun 7, 2021
@argent0 argent0 changed the title [API change] simpleFigure builder [GSoC 2021] simpleFigure builder [GSoC 2021] Jun 7, 2021
src/Text/Pandoc/Builder.hs Outdated Show resolved Hide resolved
src/Text/Pandoc/Builder.hs Outdated Show resolved Hide resolved
@argent0
Copy link
Contributor Author

argent0 commented Jun 17, 2021

Is Builder.hs the right place for the pattern thou?

@tarleb
Copy link
Contributor

tarleb commented Jun 17, 2021

Ah, right. I guess Definition would be a better place for it.

@argent0 argent0 force-pushed the figures-gsoc branch 5 times, most recently from db51d6c to be85d6f Compare June 23, 2021 15:07
@argent0 argent0 marked this pull request as draft July 5, 2021 12:09
@argent0 argent0 marked this pull request as ready for review August 19, 2021 20:32
@jgm
Copy link
Owner

jgm commented Sep 11, 2021

If I'm understanding correctly, this PR does two distinct things:

  • First, it introduces a simpleFigure builder and a SimpleFigure pattern synonym. These can be used to simplify existing pandoc code without really changing what is going on in the AST. As I understand it, Use the new 'simpleFigure' builder function in the readers. pandoc#7364 modifies the pandoc code base to use these. This all seems like a good idea.

  • Second, it introduces a new Block constructor Figure, to be used in the future for more complex figures.

Can these two distinct changes be split into two separate PRs? Because merging this PR and jgm/pandoc#7364 would, if I'm not mistaken, leave the code unable to compile without warnings, since the writers would also need to be modified to handle the new Figure constructor...

@argent0
Copy link
Contributor Author

argent0 commented Sep 11, 2021

You are correct that this PR does two things and could be staged into:

  1. Introducing the pattern synonym.
  2. Introducing the Block constructor.

I added 2 to 1 b.c. time constrains and I needed one version of pandoc-types with both to write the "second" part of the pandc PR. That is, the pandoc PR can be staged in two parts.

I'll close this PR and make one new with 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants