Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphic elements contain text that isn't wrapped in label or caption #115

Open
axfelix opened this issue Jan 16, 2018 · 5 comments
Open

Comments

@axfelix
Copy link
Contributor

axfelix commented Jan 16, 2018

Getting invalid JATS, with plaintext that should be wrapped in a caption element, as the value of graphic, as below:

<fig position="float" orientation="portrait"><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image1.jpeg" position="float" orientation="portrait" xlink:type="simple"/>Fig. 3. The structure of a multidimensional control system for ceramsite burning: EM &#8211; an electromechanical part; <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image2.wmf" position="float" orientation="portrait" xlink:type="simple"/>&#8211; a vector specifying exposure; D &#8211; a temperature sensor</fig>

From this doc:
1339-5501-1-LE.docx

@axfelix
Copy link
Contributor Author

axfelix commented Jan 17, 2018

Am guessing it's an edge case around

https://github.com/MartinPaulEve/meTypeset/blob/master/bin/captionclassifier.py#L193

but not too sure what's happening here...

@MartinPaulEve
Copy link
Owner

MartinPaulEve commented Jan 17, 2018 via email

@MartinPaulEve
Copy link
Owner

Hi Alex,

OK, so I've done some investigation of the problem here and have got this far:

<fig position="float" orientation="portrait"><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image1.jpeg" id="IDd73b995a-a3f3-4940-9d03-e8db274d85f9" position="float" orientation="portrait" xlink:type="simple"><label>Fig</label><caption><p>3 The structure of a multidimensional control system for ceramsite burning: EM &#8211; an electromechanical part;</p></caption></graphic><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="media/image2.png" position="float" orientation="portrait" xlink:type="simple"/>&#8211; a vector specifying exposure; D &#8211; a temperature sensor</fig>

The problem here is that the caption contains an image. So, unfortunately, the caption is split into two tail blocks across two different elements.

I'm not really sure that we can fix this; are images even allowed in image captions?

Any thoughts welcome.

@axfelix
Copy link
Contributor Author

axfelix commented Jan 22, 2018

Oh boy. It looks like there are technically valid ways to include rich media in captions (either through inline-graphic or alternatives, but ... it's not clear that's the intended behaviour in this or in any other case we'll see.

I'd be tempted to just insert </fig><fig> in the middle of any time we see </graphic><graphic> to be honest...

@axfelix
Copy link
Contributor Author

axfelix commented Apr 28, 2018

Jaiden Dembo.docx

another example, should be slightly less problematic to fix

(not sure why we're seeing more of these lately)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants