Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

Suppression of name parsing using " does not work #151

Closed
njbart opened this issue Jul 26, 2015 · 11 comments
Closed

Suppression of name parsing using " does not work #151

njbart opened this issue Jul 26, 2015 · 11 comments

Comments

@njbart
Copy link
Contributor

njbart commented Jul 26, 2015

In citeproc-js, parsing of particles can be suppressed to treat particles as part of the family name field by enclosing the family name field content in double quotes (http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#particles-as-part-of-the-last-name). pandoc-citeproc should be fixed to allow this, and to remove the extra (smart) double quotes that currently appear in the output.

Example:

#!/bin/sh
cat > test.json  << EOT
[
    {
        "id": "item1",
        "type": "article-journal",
        "title": "Test",
        "author": [
            {
                "family": "\"de Man\"",
                "given": "Al"
            }
        ],
        "issued": {
            "date-parts": [
                [
                    "2019"
                ]
            ]
        }
    }
]
EOT

pandoc-citeproc -y test.json

Expected:


---
references:
- id: item1
  type: article-journal
  author:
  - given: Al
    family: \"de Man\"
  issued:
  - year: 2019
  title: Test
...

Actual:


---
references:
- id: item1
  type: article-journal
  author:
  - given: Al
    non-dropping-particle: “de Man”
  issued:
  - year: 2019
  title: Test
...

Note that, in addition, we see the effect of #130, too.

@njbart
Copy link
Contributor Author

njbart commented Aug 18, 2015

Actually, the “protecting” double quotes should be kept when converting between biblio formats. I edited the “Expected:” bit above accordingly, assuming that \" is appropriate for yaml, too.

When formatting a bibliography, however, they should of course be removed:

echo @item1 | pandoc -F pandoc-citeproc --biblio test.json -t plain

Actual:

“de Man” (2019)

“de Man”, Al. 2019. “Test.”

Expected:

de Man (2019)

de Man, Al. 2019. “Test.”

@njbart
Copy link
Contributor Author

njbart commented Aug 18, 2015

Instead of protecting names in yaml databases from being parsed, we could of course also adopt a convention that expects all names in yaml databases to be fully parsed already (unlike json databases).

pandoc-citeproc then would not try to parse any names from yaml databases (unless the parse-names: true flag is explicitly set, that is).

If we adopted this solution – which I’d favour – we’d also want to protect family names containing spaces upon export from yaml to json, or simply set the parse-names: false flag for these.

@jgm
Copy link
Owner

jgm commented Aug 18, 2015

I'd prefer to allow unparsed names in YAML databases; it's easier for users not to have to worry about the various sorts of particles, etc.

@njbart
Copy link
Contributor Author

njbart commented Aug 19, 2015

OK, I can see that parsing names by default might seem more convenient, but the CSL specs expect fully parsed names throughout. I also doubt that very many users actually write CSL JSON or CSL YAML databases by hand. Still, those who wish to have names parsed by pandoc currently can include a per-entry parse-names: true flag (and for the future, I’d suggest adding a global flag, too; see below).

It was citeproc-js that introduced name parsing (listed in its specs under “Dirty Tricks”, http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#input-data-rescue), with these specs saying names are only parsed if the parse-names flag is set to true. citeproc-js later switched to parsing names by default to facilitate processing data from Zotero and the like which export unparsed names, though this was never documented in the citeproc-js specs.

As it seems, Zotero is currently being reworked to do the parsing itself, and using fully parsed names for data interchange with citeproc-js, thus strictly following the CSL specs again (https://forums.zotero.org/discussion/30974/any-idea-why-an-a-author-comes-last-in-the-bibliography/#Item_46).

I feel the best solution for pandoc would be to introduce a metadata variable (parse-names?) and/or a command-line switch for name parsing, the default being no parsing, i.e., following the CSL specs. (BTW, citeproc-js has a processor option for globally enabling/disabling name parsing, too, see here.)

As long as Zotero still exports unparsed names, we could temporarily leave the default set to parse-names: true for CSL JSON databases only, and switch that at the same time Zotero and citeproc-js make the change.

After that, anyone who wishes to have names from CSL JSON or CSL YAML databases parsed by pandoc would have to state this explicitly, either per-entry or globally.

Note that this in no way affects bibtex/biblatex name parsing which seems perfectly ok (and outputs fully parsed names, which again is perfectly ok).

@njbart
Copy link
Contributor Author

njbart commented Aug 19, 2015

Update: Zotero, starting with at least 4.0.28.1, is now exporting fully parsed names, see zotero/zotero@1cbd7f7

Thus, pandoc-citeproc should stop parsing names by default straight away, to avoid parsing names once too often.

@jgm
Copy link
Owner

jgm commented Sep 10, 2015

We have now stopped parsing names by default. Do we need special handling of double quotes?

@njbart
Copy link
Contributor Author

njbart commented Sep 11, 2015

Well, if you do want to keep the option to parse two-field names (and I guess this makes sense), I feel this missing bit should be added to make it work correctly overall, and follow the inofficial standard introduced by citeproc-js. The parsing rules for two-field names do yield some false positives, i.e., elements that look like particles but in fact aren’t, so an option for protecting names is needed.

I’d say it’s not one of the top priorities, though – #47 and #94 would be much higher on my list … :-)

@jgm
Copy link
Owner

jgm commented Dec 17, 2017

We now handle these double quotes -- effectively just ignoring the quotes.
Commit ec643af

@jgm jgm closed this as completed Dec 17, 2017
@njbart
Copy link
Contributor Author

njbart commented Dec 22, 2017

Well, the output from my original example now is

---
references:
- id: item1
  type: article-journal
  author:
  - family: “de Man”
    given: Al
  issued:
  - year: '2019'
  title: Test
...

Note the smart quotes – but with a default of parse-names: false there shouldn’t be any quotes now.

@jgm
Copy link
Owner

jgm commented Dec 24, 2017 via email

@njbart
Copy link
Contributor Author

njbart commented Dec 24, 2017

I thought I was using the latest development version, but cannot reproduce this now. Sorry for the false alarm.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants