Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JATS reader losing citation reference identities #8007

Closed
castedo opened this issue Apr 6, 2022 · 8 comments
Closed

JATS reader losing citation reference identities #8007

castedo opened this issue Apr 6, 2022 · 8 comments
Labels

Comments

@castedo
Copy link
Contributor

castedo commented Apr 6, 2022

Something funky is going on with the ref ids of citations when either writing or reading JATS. I suspect it's an issue with reading JATS.

REPO STEPS
With start.md

pandoc start.md -s -t jats+element_citations --citeproc --metadata link-citations=true \
  | tee jats.xml \
  | pandoc -f jats -s -t markdown -o roundtrip.md

pandoc jats.xml -f jats -s -t html --citeproc \
  > thru-jats.html

pandoc start.md -s -t html --citeproc \
  > skip-jats.html

sed 's/rid="ref-/rid="/g' jats.xml \
  | tee hack.xml \
  | pandoc -f jats -s -t html --citeproc \
  > thru-hack.html

GOT

$ diff start.md roundtrip.md 
31c31
< -   [@degroot_probability_2002]
---
> -   ([@ref-degroot_probability_2002])
33c33
< -   [@steele_cauchy-schwarz_2004]
---
> -   ([@ref-steele_cauchy-schwarz_2004])

EXPECTED
start.md and roundtrip.md to be the same.

That rountrip isn't really the issue that affects me. The bigger issue is that the command generating thru-jats.html fails to handle citations properly:

[WARNING] Citeproc: citation ref-degroot_probability_2002 not found
[WARNING] Citeproc: citation ref-steele_cauchy-schwarz_2004 not found

It should generated roughly the same content as skip-jats.html. But to get that I have to do an ugly hack to jats.xml with sed to make it work (thru-hack.html).

Also, technically I'm starting with separate .bib files, but I've simplified this down to start.md. I'm pretty sure if this issue is fixed then it will fix the actual issue I'm hitting.

Pandoc version?
2.18

Happy bug squashing 🕷️ 🐛 🐞 😄!

@castedo castedo added the bug label Apr 6, 2022
@jgm
Copy link
Owner

jgm commented Apr 6, 2022

Are you sure you're using 2.18?
With the released version I'm seeing this:

% /usr/local/bin/pandoc ~/Downloads/start.md -s -t jats+element_citations --citeproc --metadata link_citatations=true | tee intermediate.jats | /usr/local/bin/pandoc -f jats -s -t markdown  -o roundtrip.md
% diff ~/Downloads/start.md roundtrip.md
31c31
< -   [@degroot_probability_2002]
---
> -   (DeGroot and Schervish 2002)
33c33
< -   [@steele_cauchy-schwarz_2004]
---
> -   (Steele 2004)

Perhaps it's still not what you want, but it's a different issue if so.

@jgm
Copy link
Owner

jgm commented Apr 6, 2022

Note that 2.18 contains c18bb2a which was explicitly designed to round-trip the ids, stripping the ref- prefix citeproc puts on the containing ref elements.

@castedo
Copy link
Contributor Author

castedo commented Apr 6, 2022

Are you sure you're using 2.18?

Yep.

$ pandoc --version
pandoc 2.18
Compiled with pandoc-types 1.22.2, texmath 0.12.5, skylighting 0.12.3,
citeproc 0.7, ipynb 0.2, hslua 2.2.0
Scripting engine: Lua 5.4

You've got a typo in what you tested with:

--metadata link_citatations=true

but you want:

--metadata link-citations=true

@jgm
Copy link
Owner

jgm commented Apr 6, 2022

OK, now I see the same as you:

% /usr/local/bin/pandoc start.md -s -t jats+element_citations --citeproc --metadata link-citations=true | /usr/local/bin/pandoc -f jats -s -t markdown  -o roundtrip.md && diff start.md roundtrip.md 
2d1
< link-citations: true
32c31
< -   [@degroot_probability_2002]
---
> -   ([@ref-degroot_probability_2002])
34c33
< -   [@steele_cauchy-schwarz_2004]
---
> -   ([@ref-steele_cauchy-schwarz_2004])

@jgm
Copy link
Owner

jgm commented Apr 6, 2022

I see the issue now; it's in the treatment of xref elements by the JATS reader.
We need to strip off the ref- prefix there, too.

@castedo
Copy link
Contributor Author

castedo commented Apr 6, 2022

BTW, I have not yet dug into why I see extra ( ) get added around citations, since it's a minor issue. It's something I notice in the generated HTML too. But if you happen to notice how this can be fixed, either inside pandoc or in how I'm using it, much appreciated.

@jgm
Copy link
Owner

jgm commented Apr 6, 2022

The parentheses are added by citeproc; it's part of the CSL style. The style could be adjusted, but probably you
want author-in-text citations, @key instead of [@key].

@castedo
Copy link
Contributor Author

castedo commented Apr 6, 2022

The style could be adjusted

Thanks for that tip, that is working well now.

Just as an FYI of an end-user corner case:
The corner I've gotten myself into is the I have to use --citeproc twice, when writing the JATS XML and when reading it with pandoc. In order to read JATS XML with pandoc I need to read element-citation, which means formatting needs to be done when reading JATS. But in order to generate references in the JATS file, I also need to use --citeproc when writing. But I don't want to do any formatting outside the <xref > element otherwise I get these weird "double" formatting effects. It's a bit hacky but I'm now using this super minimal CSL file to essentially avoid formatting when I write the JATS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants