Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take 064 into account for transformation to RDF #266

Closed
acka47 opened this issue Nov 5, 2015 · 18 comments
Closed

Take 064 into account for transformation to RDF #266

acka47 opened this issue Nov 5, 2015 · 18 comments

Comments

@acka47
Copy link
Contributor

acka47 commented Nov 5, 2015

Sub-issue of #161. The "Formschlagwörter" are in field 064 in RDA instead of being listed with the other subject headings.

Examples

http://lobid.org/resource/HT017458093 which has Formschlagwort ""Zeitung" but isn't typed as such yet:

          <datafield ind2="1" ind1="a" tag="064">
            <subfield code="a">Zeitung</subfield>
            <subfield code="9">(DE-588)4067510-5</subfield>
          </datafield>

http://lobid.org/resource/HT018781721 (snippet) which has Formschlagwort "Zeitschrift" and is already typed as bibo:Journal:

          <datafield ind2="1" ind1="a" tag="064">
            <subfield code="a">Zeitschrift</subfield>
            <subfield code="9">(DE-588)4067488-5</subfield>
          </datafield>
          <datafield in

http://lobid.org/resource/HT018772904 (Formschlagwort "Bibliographische Reihe" and already typed as bibo:Series):

          <datafield ind2="1" ind1="a" tag="064">
            <subfield code="a">Monografische Reihe</subfield>
            <subfield code="9">(DE-588)4179998-7</subfield>
          </datafield>
@acka47 acka47 self-assigned this Nov 5, 2015
@acka47 acka47 added the ready label Aug 23, 2016
@acka47
Copy link
Contributor Author

acka47 commented Aug 31, 2016

We might consider aligning the RDF for pre-RDA and RDA records by removing "Formschlagwörter" from the subject array for pre-RDA records. See also hbz/lobid-rdf-to-json#23 (comment).

@acka47
Copy link
Contributor Author

acka47 commented Sep 2, 2016

As nobody asked for this, I'd say it is sufficient to do this in API 2.0. Thus, adding the label.

@acka47
Copy link
Contributor Author

acka47 commented Oct 7, 2016

Here is the core list of GND Formschlagwörter: http://access.rdatoolkit.org/document.php?id=nlgpschp7&target=nlgps07-27

Here is the extended list with all GND Formschlagörter (PDF): https://wiki.dnb.de/download/attachments/106042227/AH-007.pdf

@acka47
Copy link
Contributor Author

acka47 commented Oct 7, 2016

There is redundant information in MAB/Aleph fields 051/052. I wonder whether infromation in 051 is generated automatically from the 064 information or not (which would mean that it might even contradict each other). From the MAB documentation:

051     VEROEFFENTLICHUNGSSPEZIFISCHE ANGABEN ZU BEGRENZTEN
        WERKEN

          Indikator:
          blank = nicht definiert

          Datenelemente:
            0  Erscheinungsform
               a = unselbstaendig erschienenes Werk
               f = Fortsetzung
               m = einbaendiges Werk - nicht Teil eines
                   Gesamtwerks
               n = mehrbaendiges begrenztes Werk - nicht Teil
                   eines Gesamtwerks
               s = einbaendiges Werk  u n d  Teil (mit
                   Stuecktitel) eines Gesamtwerks
               t = mehrbaendiges begrenztes Werk  u n d
                   Teil (mit Stuecktitel) eines Gesamtwerks

          1-3  Veroeffentlichungsart und Inhalt
               a = Abstract (Referat)
               b = Bibliographie
               c = Katalog
               d = Woerterbuch
               e = Enzyklopaedie
               f = Festschrift
               g = Datenbank
               h = Biographie
               i = Registerwerk
               j = Fortschrittsbericht
               k = Konferenzschrift
               l = Gesetz
               m = Musikalia
               n = Normschrift
               o = Loseblattausgabe
               p = Patentdokument
               q = Lieferungswerk
               r = Report
               s = Statistik
               t = Aufsatz
               u = Universitaetsschrift
               v = Sonderdruck
               x = Schulbuch
               z = sonstige Veroeffentlichungsart/-inhalt

Some examples to take a closer look at: http://lobid.org/hbz01/HT019025947, http://lobid.org/hbz01/HT019025943, http://lobid.org/hbz01/HT018814546, http://lobid.org/hbz01/HT018913029, http://lobid.org/hbz01/HT018909174

@ChristophEwertowski ChristophEwertowski self-assigned this Jan 11, 2017
@ChristophEwertowski
Copy link
Contributor

By testing I found out that fields 051/052 aren't automatically generated from 064. For the core Formschlagwörter I took the first five hits of lobid.org/resource and looked at them at lobid.org/hbz01. For Autobiografie, Bibliografie, Biografie, Comic, Festschrift, Hochschulschrift, Hörbuch, Schulbuch, Website and Zeitschrift I found no file with a field 064.

@acka47
Copy link
Contributor Author

acka47 commented Jan 12, 2017

@ChristophEwertowski We have to check the RDA titles to see whether the 051/052 are automatically generated from 064. RDA are those with creation date after 2015-10-01. You can limit a query to those using the Elasticsearch query DSL, see https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-uri-request.html

@acka47
Copy link
Contributor Author

acka47 commented Jan 12, 2017

@dr0i showed me how to limit the queries to those created after a specific point in time using the URL. E.g. http://lobid.org/resources?q=describedby.dateCreated:%3E20151001

@ChristophEwertowski
Copy link
Contributor

ChristophEwertowski commented Jan 12, 2017

I confined my search to October 2015 and onwards and looked at it again. There are still cases where 064 doesn't exist but 051 does, so for these cases 051 isn't automatically generated from 064. Example: http://lobid.org/hbz01/HT018979011

In other cases both fields exist but contain different information. Example: http://lobid.org/hbz01/HT018976920
<controlfield tag="051">at||||||</controlfield> which means "unselbstaendig erschienenes Werk, Aufsatz".

<datafield tag="064" ind1="a" ind2="1">
    <subfield code="a">Biografie</subfield>
    <subfield code="9">(DE-588)4006804-3</subfield>
    <subfield code="y">1921-1978</subfield>
</datafield>

Since a biography could also exist in other forms, e.g. books, for this case 051 couldn't be generated from 064.

@ChristophEwertowski
Copy link
Contributor

ChristophEwertowski commented Jan 12, 2017

The Formschlagwörter are still apart from the other keywords in field 064 and not 952 (see first post from Nov. 2015) (example: http://lobid.org/hbz01/HT019016389). So also they are not in subjectLabels (http://lobid.org/resource/HT019016389/about).

And if you look closer at the first example you can see that in the hbz01 file it's described as a newspaper (http://lobid.org/hbz01/HT017458093, field 064) and in the lobid-resource as a journal (http://lobid.org/resource/HT017458093, type:bibo/Journal) which are two different publication types.

@acka47
Copy link
Contributor Author

acka47 commented Jan 13, 2017

And if you look closer at the first example you can see that in the hbz01 file it's described as a newspaper (http://lobid.org/hbz01/HT017458093, field 064) and in the lobid-resource as a journal (http://lobid.org/resource/HT017458093, type:bibo/Journal) which are two different publication types.

The example you point to has p in 052 at position 0 which is – correctly – transformed to type "Journal". Thus, this rather seems a cataloging error to me.

Source data:

<controlfield tag="052">pag||||aw||||||</controlfield>

From the MAB documentation:

052       VEROEFFENTLICHUNGSSPEZIFISCHE ANGABEN ZU FORTLAUFENDEN
          SAMMELWERKEN

          Indikator:
          blank = nicht definiert

          Datenelemente:
            0  Erscheinungsform
               a = unselbstaendig erschienenes Werk
               f = Fortsetzung
               j = zeitschriftenartige Reihe
               p = Zeitschrift
               r = Schriftenreihe (Serie)
               z = Zeitung

@acka47 acka47 added ready and removed working labels Jan 16, 2017
@ChristophEwertowski
Copy link
Contributor

ChristophEwertowski commented Jan 20, 2017

To get back, I sum up which points are open:
Do we really need Formschlagwörter?

  • If yes, put them to the other subject headings.

Are the fields 051/052 derived from 064 for RDA? (Probably not.) @acka47 which person would be the right contact person?

  • Contacted

I'm going to tackle the first question by looking which and how much Formschlagwörter are already represented by mapping of 050-052.

@ChristophEwertowski
Copy link
Contributor

  1. As acka47 said: They contain different content and should be kept in our data.

@acka47 acka47 added ready and removed working labels Feb 20, 2017
@fsteeg fsteeg removed the launch label Mar 7, 2017
@acka47
Copy link
Contributor Author

acka47 commented Mar 24, 2017

R.D. (Edoweb) just asked for the 064 in an email:

wir bemerken eben erst, daß die Marc-Kat. 064 nicht in der
Lobid-Schnittstelle und damit auch nicht ins Edoweb transportiert wird.
Beisp.:
image001
Darin sind wichtige Informationen für die Sacherschließung. Können Sie
sagen, ob das ein Versäumnis ist und ob man das nachholen kann?

Here is a link to the example from the screenshot: http://lobid.org/resources/HT019149667

@acka47
Copy link
Contributor Author

acka47 commented Mar 24, 2017

I think it will be hard to align 064 ("Nature of Content"/"Art des Inhalts", see ) with the information we already have about a resource from other fields (inlcuding Formschlagwörter). Thus, it might be the easiest way to just add 064 independently to the RDF. The fitting property from the RDA registry is http://rdaregistry.info/Elements/u/P60584 "has nature of content". I couldn't find controlled vocabulary for the values. It looks like the controlled value list is DACH-specific and thus it's not surprising.

@acka47
Copy link
Contributor Author

acka47 commented Mar 24, 2017

I couldn't find controlled vocabulary for the values.

As there are GND URIs given (I already linked to the PDF above that also lists the GND URIs), we will just use these along with the label given in subfield a, e.g. for the example:

{
   "@context":"http://lobid.org/resources/context.jsonld",
   "id":"http://lobid.org/resources/HT019149667#!",
   "natureOfContent":[
      {
         "id":"http://d-nb.info/gnd/4048476-2",
         "label":"Ratgeber"
      },
      {
         "id":"http://d-nb.info/gnd/4142300-8",
         "label":"Amtliche Publikation"
      }
   ]
}

@acka47 acka47 removed their assignment Mar 24, 2017
@ChristophEwertowski
Copy link
Contributor

ChristophEwertowski commented Mar 28, 2017

NatureOfContent is added. Example (production) / example (test).

@acka47
Copy link
Contributor Author

acka47 commented Mar 29, 2017

Looks good.+1

@acka47 acka47 removed their assignment Mar 29, 2017
@dr0i dr0i added deploy and removed review labels Apr 6, 2017
@dr0i
Copy link
Member

dr0i commented Apr 6, 2017

Deplyoed to prodcution, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants