Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocBook figure elements should be converted to Figure in the AST #8668

Closed
tombolano opened this issue Mar 4, 2023 · 2 comments
Closed

DocBook figure elements should be converted to Figure in the AST #8668

tombolano opened this issue Mar 4, 2023 · 2 comments
Labels

Comments

@tombolano
Copy link

Explain the problem.
Consider the following DocBook document (example.xml):

<section xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="my-document">
  <title>My document</title>
  <figure xml:id="my_figure_id">
    <title>This is the caption</title>
    <mediaobject>
      <imageobject>
        <imagedata fileref="my_image.png" />
      </imageobject>
      <textobject><phrase>This is the caption</phrase></textobject>
    </mediaobject>
  </figure>
</section>

When converting the above DocBook with pandoc, the AST obtained with pandoc -f docbook -t native example.xml is:

[ Header
    1
    ( "my-document" , [] , [] )
    [ Str "My" , Space , Str "document" ]
, Para
    [ Image
        ( "my_figure_id" , [] , [] )
        [ Str "This"
        , Space
        , Str "is"
        , Space
        , Str "the"
        , Space
        , Str "caption"
        ]
        ( "my_image.png" , "fig:" )
    ]
]

Then, for example, when I convert the DocBook document to LaTeX, the figure is considered to be an inline element and thus a figure environment is not created, i.e., this is the output of pandoc -f docbook -t latex example.xml:

\hypertarget{my-document}{%
\section{My document}\label{my-document}}

\includegraphics{my_image.png}

The figure has no caption and no label so it cannot be cross-referenced.

Note that the previous examples were obtained with the development version of pandoc. I also tried with pandoc version 2.14.0.3 (the one that is available in the Fedora 37 repositories, the OS that I am currently using) and in that version the result was as expected: the command pandoc -f docbook -t latex example.xml produced the following result

\hypertarget{my-document}{%
\section{My document}\label{my-document}}

\begin{figure}
\centering
\includegraphics{my_image.png}
\caption{This is the caption}
\end{figure}

This seems to be related to the Figure block element discussed in #3177. I think that the core problem is that the DocBook figure element needs to be converted to a Figure element in the pandoc AST, instead of an Image inside a Para element as it is done now.

Note that DocBook also has an informalfigure element (https://tdg.docbook.org/tdg/5.1/informalfigure.html), this is a figure without title, currently it is processed in the same way as the figure element, but the informalfigure element is also a block element, thus I guess it should be also converted to a Figure element in the pandoc AST.

Pandoc version?
Pandoc development version

@tombolano tombolano added the bug label Mar 4, 2023
@jgm
Copy link
Owner

jgm commented Mar 4, 2023

Looks like the JATS reader has code to parse fig elements as figures.
I wonder if that code could easily be ported over to handle DocBook figure? @tarleb would know.

@tarleb
Copy link
Collaborator

tarleb commented Mar 5, 2023

@argent0 wrote most of that code. Porting the code over to DocBook should be fairly straightforward, I think.

@jgm jgm closed this as completed in 2dd645e Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants