Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strong identifiers should be imported for authors when available #9927

Open
tfmorris opened this issue Oct 2, 2024 · 8 comments
Open

Strong identifiers should be imported for authors when available #9927

tfmorris opened this issue Oct 2, 2024 · 8 comments
Assignees
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Priority: 3 Issues that we can consider at our leisure. [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed]

Comments

@tfmorris
Copy link
Contributor

tfmorris commented Oct 2, 2024

MARC records, Wikisource metadata, and perhaps other sources of metadata often include various strong identifiers for authors (LCCN, VIAF, Wikidata, ISNI, etc) which should a) be imported and b) used for matching author records to disambiguate similar author records.

Per @cdrini:

Unfortunately, I believe this isn't supported by our import endpoint at this time :( So even if we specify them, they would not only not be used to match the author, they would not even be saved onto the author record.

Originally posted by @cdrini in #9674 (comment)

@hornc @mekarpeles

@pidgezero-one
Copy link
Contributor

Please assign this one to me, it'll be a followup to #9674 !

@github-actions github-actions bot added the Needs: Response Issues which require feedback from lead label Oct 13, 2024
@Freso
Copy link
Contributor

Freso commented Oct 14, 2024

How does this relate to #9448 ?

@pidgezero-one
Copy link
Contributor

I think they might be the same? Pinging @cdrini @scottbarnes for opinions!

@tfmorris
Copy link
Contributor Author

@Freso I think they overlap significantly (now that the issue number has been corrected from what I received in the email notification).

The main difference is one of emphasis. This issue is focused the major source of reliable bibliographic data, MARC recrds, and on strong identifiers which are backed by national libraries and other large, reliable institutions. The identifiers mentioned in 9448 are Amazon (which is certainly NOT reliable) and LibriVox which is a niche site with a handful of authors (~15K listed, but many without any works). Add a WikiSource import as proposed in #9674 will definitely NOT satisfy the requirements specified here.

MARC+VIAF would be an 80-90% solution. WikiSource + LibriVox would be a <5% solution.

@pidgezero-one
Copy link
Contributor

pidgezero-one commented Oct 16, 2024

@tfmorris My implementation accounts for every identifier in author/identifiers.yml and not just Wikisource or just the identifiers outlined in #9448, but it depends on future imports adhering to the schema changes I'm proposing. There's no MARC identifier for authors defined in authors/identifiers.yml, though.

@tfmorris
Copy link
Contributor Author

This is also only a slight superset of #7724

@tfmorris
Copy link
Contributor Author

There's no MARC identifier for authors defined in authors/identifiers.yml, though.

MARC is the standard for bibliographic metadata used by libraries around the world, not an identifier type. It will contain VIAF, LCCN, BNF, GND, ISNI, etc identifiers.

@mekarpeles mekarpeles added Priority: 3 Issues that we can consider at our leisure. [managed] Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] and removed Needs: Response Issues which require feedback from lead labels Oct 29, 2024
@hornc
Copy link
Collaborator

hornc commented Nov 28, 2024

I think #9448 is a pre-requisite for this issue. This issue involves extracting identifiers from MARC records, but currently there is nowhere in the import format to put them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Priority: 3 Issues that we can consider at our leisure. [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed]
Projects
None yet
Development

No branches or pull requests

5 participants