Graphic elements contain text that isn't wrapped in label or caption #115

axfelix · 2018-01-16T20:42:12Z

Getting invalid JATS, with plaintext that should be wrapped in a caption element, as the value of graphic, as below:

<fig position="float" orientation="portrait"><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image1.jpeg" position="float" orientation="portrait" xlink:type="simple"/>Fig. 3. The structure of a multidimensional control system for ceramsite burning: EM – an electromechanical part; <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image2.wmf" position="float" orientation="portrait" xlink:type="simple"/>– a vector specifying exposure; D – a temperature sensor</fig>

From this doc:
1339-5501-1-LE.docx

The text was updated successfully, but these errors were encountered:

axfelix · 2018-01-17T05:22:32Z

Am guessing it's an edge case around

https://github.com/MartinPaulEve/meTypeset/blob/master/bin/captionclassifier.py#L193

but not too sure what's happening here...

MartinPaulEve · 2018-01-17T08:33:12Z

Thanks for this, Alex -- and for the minimal test case. I'll take a look at the weekend! M

…

On 16/01/18 20:42, axfelix wrote: Getting invalid JATS, with plaintext that should be wrapped in a caption element, as the value of graphic, as below: |<fig position="float" orientation="portrait"><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image1.jpeg" position="float" orientation="portrait" xlink:type="simple"/>Fig. 3. The structure of a multidimensional control system for ceramsite burning: EM – an electromechanical part; <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image2.wmf" position="float" orientation="portrait" xlink:type="simple"/>– a vector specifying exposure; D – a temperature sensor</fig>| From this doc: 1339-5501-1-LE.docx <https://github.com/MartinPaulEve/meTypeset/files/1636721/1339-5501-1-LE.docx> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#115>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA_ot3caZZaC419kHv4TbD-ZB7bggHBxks5tLQmkgaJpZM4RgZWy>.

-- Professor Martin Paul Eve Chair of Literature, Technology and Publishing Birkbeck, University of London T: 0203 073 8420 E: martin.eve@bbk.ac.uk W: https://www.martineve.com R: 416, 43 Gordon Square, London, WC1H 0PD Books: https://www.martineve.com/books/ Articles: https://www.martineve.com/c-v/ Series Editor: New Horizons in Contemporary Writing (Bloomsbury) Director, Birkbeck Centre for Technology and Publishing Founder, Open Library of the Humanities (https://www.openlibhums.org) Chief Editor, Orbit (https://www.pynchon.net) Senior Online Editor, Alluvium, (http://www.alluvium-journal.org)

MartinPaulEve · 2018-01-20T12:18:39Z

Hi Alex,

OK, so I've done some investigation of the problem here and have got this far:

<fig position="float" orientation="portrait"><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image1.jpeg" id="IDd73b995a-a3f3-4940-9d03-e8db274d85f9" position="float" orientation="portrait" xlink:type="simple"><label>Fig</label><caption><p>3 The structure of a multidimensional control system for ceramsite burning: EM – an electromechanical part;</p></caption></graphic><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image2.png" position="float" orientation="portrait" xlink:type="simple"/>– a vector specifying exposure; D – a temperature sensor</fig>

The problem here is that the caption contains an image. So, unfortunately, the caption is split into two tail blocks across two different elements.

I'm not really sure that we can fix this; are images even allowed in image captions?

Any thoughts welcome.

axfelix · 2018-01-22T16:37:32Z

Oh boy. It looks like there are technically valid ways to include rich media in captions (either through inline-graphic or alternatives, but ... it's not clear that's the intended behaviour in this or in any other case we'll see.

I'd be tempted to just insert </fig><fig> in the middle of any time we see </graphic><graphic> to be honest...

axfelix · 2018-04-28T00:43:19Z

Jaiden Dembo.docx

another example, should be slightly less problematic to fix

(not sure why we're seeing more of these lately)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graphic elements contain text that isn't wrapped in label or caption #115

Graphic elements contain text that isn't wrapped in label or caption #115

axfelix commented Jan 16, 2018

axfelix commented Jan 17, 2018

MartinPaulEve commented Jan 17, 2018 via email

MartinPaulEve commented Jan 20, 2018

axfelix commented Jan 22, 2018

axfelix commented Apr 28, 2018

Graphic elements contain text that isn't wrapped in label or caption #115

Graphic elements contain text that isn't wrapped in label or caption #115

Comments

axfelix commented Jan 16, 2018

axfelix commented Jan 17, 2018

MartinPaulEve commented Jan 17, 2018 via email

MartinPaulEve commented Jan 20, 2018

axfelix commented Jan 22, 2018

axfelix commented Apr 28, 2018