Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enrich with RVK based on Culturegraph #1058

Closed
5 tasks done
dr0i opened this issue Feb 27, 2020 · 17 comments
Closed
5 tasks done

Enrich with RVK based on Culturegraph #1058

dr0i opened this issue Feb 27, 2020 · 17 comments
Assignees

Comments

@dr0i
Copy link
Member

dr0i commented Feb 27, 2020

From our appointment on 13th February 2020. @hagbeck's workflow (which we shall implement at lobid):

See https://katalog.ub.tu-dortmund.de/taxonomy/tree for Dortmund's enrichment.

Example in culturegraph, see field 084.

Download CG data at : https://data.dnb.de/culturegraph/ , atm aggregate_20240507.marcxml.gz

@dr0i dr0i self-assigned this Feb 27, 2020
dr0i added a commit that referenced this issue Apr 21, 2020
Filters out all resources belonging to hbz, get the RVK and build an
 lasticsearch bulk json file from this.

- use master-snapshot of metafacture to ommit id key for elasticsearch index
- add morph converting rules from marcxml to json
- add tests
- add runner

This is a prerequesite for #1058.
dr0i added a commit that referenced this issue Apr 21, 2020
Filters out all resources belonging to hbz, get the RVK and build an
 lasticsearch bulk json file from this.

- use master-snapshot of metafacture to ommit id key for elasticsearch index
- add morph converting rules from marcxml to json
- add tests
- add runner

This is a prerequesite for #1058.
dr0i added a commit that referenced this issue Apr 22, 2020
Not all input records are of interest. They are passed empty. With this filter
empty records are ignored, not passed.

See #1058.
dr0i added a commit that referenced this issue Apr 23, 2020
- shrink unnecessary test data
- update test

See #1058.
dr0i added a commit that referenced this issue Apr 23, 2020
hbz-Ids will be concatenated into one field delimited by a space.

- shrink unnecessary test data
- update test

See #1058.
dr0i added a commit that referenced this issue Apr 30, 2020
dr0i added a commit that referenced this issue Apr 30, 2020
dr0i added a commit to metafacture/metafacture-examples that referenced this issue Apr 30, 2020
Originates from hbz/lobid-resources#1058.
Created for a ligthning talk at Dini-Kim-Workshop 2020.
dr0i added a commit to metafacture/metafacture-examples that referenced this issue Apr 30, 2020
Originates from hbz/lobid-resources#1058.
Created for a ligthning talk at Dini-Kim-Workshop 2020.
dr0i added a commit to metafacture/metafacture-examples that referenced this issue Apr 30, 2020
Originates from hbz/lobid-resources#1058.
Created for a ligthning talk at Dini-Kim-Workshop 2020.
dr0i added a commit to metafacture/metafacture-examples that referenced this issue Apr 30, 2020
Originates from hbz/lobid-resources#1058.
Created for a ligthning talk at Dini-Kim-Workshop 2020.
dr0i added a commit to metafacture/metafacture-examples that referenced this issue Apr 30, 2020
Originates from hbz/lobid-resources#1058.
Created for a ligthning talk at Dini-Kim-Workshop 2020.
dr0i added a commit to metafacture/metafacture-examples that referenced this issue Apr 30, 2020
Originates from hbz/lobid-resources#1058.
Created for a ligthning talk at Dini-Kim-Workshop 2020.
@TobiasNx
Copy link
Contributor

TobiasNx commented Oct 4, 2022

@dr0i should we implement this for ALMA too?

@dr0i
Copy link
Member Author

dr0i commented Oct 6, 2022

Yeah I definitely have this on my mind ! :) (and we do it only with ALMA and Fix)

TobiasNx added a commit that referenced this issue Jun 4, 2024
TobiasNx added a commit that referenced this issue Jun 5, 2024
TobiasNx added a commit that referenced this issue Jun 6, 2024
@dr0i dr0i moved this from Working to Review in lobid-resources Jun 11, 2024
@dr0i
Copy link
Member Author

dr0i commented Jun 17, 2024

Check next Monday.
Also, at some point we need to update the data automatically. Should be sufficient to get the data once a month from https://data.dnb.de/culturegraph/ .

@dr0i dr0i moved this from Review to Deploy in lobid-resources Jun 17, 2024
@dr0i
Copy link
Member Author

dr0i commented Jun 24, 2024

@dr0i dr0i moved this from Deploy to Ready in lobid-resources Jul 9, 2024
@acka47
Copy link
Contributor

acka47 commented Aug 12, 2024

update lookup table once a month based on new data from https://data.dnb.de/culturegraph/

This to do is still open, @dr0i .

@TobiasNx
Copy link
Contributor

TobiasNx commented Aug 22, 2024

I thought about this today. Does it make sense to mark the enriched rvk elements with a version property, even if it is not really correct?

e.g.

"subject":[
   {
      "notation":"SK 110",
      "type":[
         "Concept"
      ],
      "version":"enrichment",
      "source":{
         "label":"RVK (Regensburger Verbundklassifikation)",
         "id":"https://d-nb.info/gnd/4449787-8"
      }
   },

@acka47
Copy link
Contributor

acka47 commented Aug 22, 2024

I thought about this today. Does it make sense to mark the enriched rvk elements with a version property, even if it is not really correct?

We have already discussed this and addressed the question in the blog post: https://blog.lobid.org/2024/07/04/rvk-enrichment.html#verzicht-auf-provenienzangaben Until now, nobody asked for a marker, so that I'd say we don't need it. We can re-evaluate if things change.

dr0i added a commit that referenced this issue Aug 26, 2024
dr0i added a commit that referenced this issue Aug 27, 2024
@dr0i dr0i moved this from Ready to Review in lobid-resources Aug 27, 2024
@dr0i
Copy link
Member Author

dr0i commented Aug 27, 2024

To be checked after 11th September (second Wednesday in a month).

@dr0i
Copy link
Member Author

dr0i commented Sep 16, 2024

The monthly update looks good:

~/git/lobid-resources-alma$ ls -hal lookup-tables/data/rvk*
-rw-rw-r-- 1 sol sol 258M Sep 11 13:34 lookup-tables/data/rvk.tsv
-rw-rw-r-- 1 sol sol 253M Jul 1 12:46 lookup-tables/data/rvk.tsv.20240507

Note that rvk.tsv.20240507 is not an automatically generated backup - there is no backup generated. It's just convenient to have something to compare to (and possibly a quick fallback to some state if really needed).

Closing.

@dr0i dr0i closed this as completed Sep 16, 2024
@github-project-automation github-project-automation bot moved this from Review to Done in lobid board Sep 16, 2024
@github-project-automation github-project-automation bot moved this from Review to Done in lobid-resources Sep 16, 2024
@blackwinter
Copy link
Member

Do I understand correctly that this is in a local directory named lookup-tables, not in the lookup-tables repository?

@dr0i
Copy link
Member Author

dr0i commented Sep 16, 2024

yep -it's locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Status: Done
Development

No branches or pull requests

6 participants