Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOBID seems to miss some embedded GND mappings #66

Open
nichtich opened this issue Jul 12, 2024 · 9 comments
Open

LOBID seems to miss some embedded GND mappings #66

nichtich opened this issue Jul 12, 2024 · 9 comments

Comments

@nichtich
Copy link
Member

See https://lobid.org/gnd/4026894-9 and same GND record in Cocoda: only one embedded mapping is detected but there are more closeMatch, exactMatch and DDC mappings.

@stefandesu
Copy link
Member

Good call. Currently, only the sameAs property of the LOBID JSON record is parsed. There are also exactMatch, closeMatch, and I assume more. However, these currently only include the target concept URI, without any information about the target vocabulary (in sameAs, this is included via collection). We could simply include these without toScheme, but most applications would have trouble actually using this, I think.

@nichtich
Copy link
Member Author

The list of target vocabularies is small and each vocabulary has a known URI namespace, these could be hardcoded.

@stefandesu
Copy link
Member

The list of target vocabularies is small and each vocabulary has a known URI namespace, these could be hardcoded.

Sounds good. Is there a list of target vocabularies for embedded mappings in GND?

@nichtich
Copy link
Member Author

So far I've seen

This should be enough to start with.

@acka47
Copy link

acka47 commented Aug 5, 2024

FYI, a full list of enrichments and the linking properties used in GND/EntityFacts can be found at https://wiki.dnb.de/x/TZa5C

@stefandesu
Copy link
Member

I'm confused, as most of those listed by @nichtich are not on that list. 🤔 @acka47

@acka47
Copy link

acka47 commented Aug 7, 2024

Sorry for the confusion. This list at https://wiki.dnb.de/x/TZa5C wasn't the best pointer for this context as it is about the sameAs statements. For the linking sources @nichtich listed other RDF properties are used, see his example https://lobid.org/gnd/4026894-9.

DDC:

"relatedDdcWithDegreeOfDeterminacy2" : [ {
    "id" : "http://dewey.info/class/1--0285/",
    "label" : "http://dewey.info/class/1--0285/"
  } ],
  "relatedDdcWithDegreeOfDeterminacy3" : [ {
    "id" : "http://dewey.info/class/004/",
    "label" : "http://dewey.info/class/004/"
  } ],

LCSH, RAMEAU, BNCF, EMBNE (and STW should also work like this):

"closeMatch" : [ {
    "id" : "http://id.loc.gov/authorities/subjects/sh89003285",
    "label" : "http://id.loc.gov/authorities/subjects/sh89003285"
  }, {
    "id" : "https://data.bnf.fr/ark:/12148/cb11932109b",
    "label" : "https://data.bnf.fr/ark:/12148/cb11932109b"
  }, {
    "id" : "http://purl.org/bncf/tid/1576",
    "label" : "http://purl.org/bncf/tid/1576"
  }, {
    "id" : "https://datos.bne.es/resource/XX525961",
    "label" : "https://datos.bne.es/resource/XX525961"
  } ],

However, at least RAMEAU, are also linked via sameAs.

Searching for SKOS links in lobid, I get back some more sources, e.g. $ curl "https://lobid.org/gnd/search?q=_exists_:exactMatch&size=500" | jq .member[].exactMatch[].label | sort yields concepts in these namespaces:

  • http://lod.gesis.org/thesoz
  • https://aims.fao.org/aos/agrovoc/ & http://aims.fao.org/aos/agrovoc/
  • https://data.bnf.fr/ark:/
  • https://id.loc.gov/authorities/names
  • https://id.loc.gov/authorities/subjects & http://id.loc.gov/authorities/subjects
  • https://id.nlm.nih.gov/mesh/ & https://id.nlm.nih.gov/mesh/
  • https://provenienz.gbv.de/T-PRO_Thesaurus_der_Provenienzbegriffe
  • https://zbw.eu/stw/descriptor/ & http://zbw.eu/stw/descriptor/

Searching for closeMatch ($ url "https://lobid.org/gnd/search?q=_exists_:closeMatch&size=500" | jq .member[].closeMatch[].label | sort) adds https://purl.org/bncf/tid/, https://purl.org/bncf/tid/, and https://datos.bne.es/resource/.

Interestingly, for some sources (id.loc.gov, agrovoc, mesh, stw, BNCF) both http and https URI schemas can be found which is probaby not intended. (Ping @thoffma.)

@thoffma
Copy link

thoffma commented Aug 7, 2024

Thank you for pointing this out. We will check the URIs in this regard before the next dump creation in October/November.

@stefandesu
Copy link
Member

Implemented and released in v3.4.11. I also updated the Cocoda dev instance, so you can see the results for the example above here: https://coli-conc.gbv.de/cocoda/dev/?fromScheme=http%3A%2F%2Fbartoc.org%2Fen%2Fnode%2F430&from=https%3A%2F%2Fd-nb.info%2Fgnd%2F4026894-9

Possible improvements:

  • Check if hardcoded mappings types for different properties are okay: a290b7c#diff-125ad04a29911164a913cbe0bdc8ae76677482a4dcabc3bcd7a58192c50eebf3R103-R110
  • http/https URI fixes in data (currently worked around by using uriPattern instead of namespace)
  • Question: Is it enough to specify BARTOC URIs for hardcoded vocabularies?
  • Find a better solution for the current DDC workaround (the issue is that DDC URIs in GND do not include the DDC version)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants