Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new Vocabularies/Taxonomies as per Metadata Interest Group mappings #35

Merged
merged 14 commits into from
Jan 13, 2020

Conversation

Natkeeran
Copy link
Contributor

@Natkeeran Natkeeran commented Aug 2, 2019

GitHub Issue: (link)

What does this Pull Request do?

Currently, this PR adds the following vocabularies. It would be easy to test a set of fields in a batch. I plan to add all the straightforward ones first.

  • language
  • genre

How should this be tested?

Interested parties

Tag (@ mention) interested parties or, if unsure, @Islandora-CLAW/committers

@Natkeeran Natkeeran changed the title Mods vocabs Adding new Vocabularies/Taxonomies as per Metadata Interest Group mappings Aug 2, 2019
@Natkeeran
Copy link
Contributor Author

I am not sure why or where Travis is failing!

@seth-shaw-unlv
Copy link
Contributor

Looks like a network issue. I'll restart it.

@MarcusBarnes
Copy link

I've tested this PR and it appears to work as expect. Moving on to testing the related PR Islandora/islandora_defaults#5.

@whikloj
Copy link
Member

whikloj commented Aug 14, 2019

@Natkeeran this needs the new services thing to make Travis start MySQL. See Islandora/documentation#155 (comment)

@Natkeeran
Copy link
Contributor Author

@whikloj
Travis is happy. Islandora_defaults travis should be ok when we merge this one!

Copy link
Contributor

@seth-shaw-unlv seth-shaw-unlv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the UUIDs.

@seth-shaw-unlv
Copy link
Contributor

Considering the Islandora 8 Tech Call earlier today, I would like @rosiel to weigh in on the PR before we merge it. For the genre and language vocabularies it probably makes sense to use the external_uri field instead of authority_link one. Granted, that field isn't available without the islandora_core_feature module, but we can put the field definition inside of modules/controlled_access_terms_defaults/config/optional/so that it only gets added when the islandora_core_feature is enabled.

Also, the MIG mapping for "Place Published" simply says it is a reference, but doesn't specify which one. This PR makes a country vocabulary; but should we use the existing geo_location vocabulary instead?

Finally, could genre and form be combined into a single vocabulary?

@rosiel
Copy link
Member

rosiel commented Sep 29, 2019

For the genre and language vocabularies it probably makes sense to use the external_uri field instead of authority_link one.

I assume this has something to do with the magical property of the specific field external_uri to swap out values. However, I have not been able to document this behaviour. Please advise.

that field isn't available without the islandora_core_feature module, but we can put the field definition inside of modules/controlled_access_terms_defaults/config/optional/so that it only gets added when the islandora_core_feature is enabled.

So the entire vocabulary is added only if the islandora_core_feature is installed? or the field that contains the URI is only added if the islandora_core_feature is installed? Either way, I wonder about this module's re-use value outside of Islandora... I know it's been part of the "selling point" that the modules we make are "not islandora specific", but they kind of are. I guess part of the problem is that I don't understand what this module does as a module vs the content that its default feature installs. In short, my opinion is that this is a confusing setup.

"Place Published"

If you scroll one cell to the right, it says "Import MARC Country vocabulary". In MARC metadata, which a lot of us use, there are two places where the place of publication is entered.

  • One is a 24x field, where you transcribe the location as it appears on the (probably title page) e.g. "London". This is a free-text field, because you write what you see, so if it's an old book you may put a name of a place that no longer exists by that name. Or, depending, you may record a string such as "Tokyo, Toronto". Two places, one of them in the country I'm in. In general, there are rules for putting stuff in this field that make it so that you can't assume it reconciles nicely to a controlled vocabulary. This is the "Place Published" text field on Row 36.
  • The other is one of the "control fields" in the header of the record, where you record a code referring to the country or province where the item was published. These must come from the controlled list called the MARC Country code list. Here is the old location - note that the hyphens represent discontinued codes, so should not be available to pick. This vocabulary also exists on id.loc.gov, and each these values has a URI. This is the "Place Published (MARC Country)" field on Row 37 that you linked to. In this case, the local URI, such as taxonomy/term/129 is probably (in 99% of cases) irrelevant.

When we refer to places as subjects, we use a different vocabulary, which is (sometimes) reconcilable against something with coordinates. That makes more sense to keep as the Geo_location one.

Is this module the one that populates the default values? I can't figure out the mechanism for that.

Finally, could genre and form be combined into a single vocabulary?

I'll have to get back to you on that, sorry.

@seth-shaw-unlv
Copy link
Contributor

  • authority link field:
    1. The external_uri field doesn't work the way I thought. I thought it would remap the term's URI, but it simply does a schema:sameas, the same way that field_authority_link does; so please disregard.
    2. Only that field wouldn't be available; but again, we aren't getting the extra JSON-LD magic I thought, so moot point.
  • MARC locations: Ok. I hesitate to have two separate vocabularies for geographic locations; but this is a very specific case.
    • I still prefer using the geo_location vocabulary and add the MARC Country URI as one of the values in the field_linked_authority. In that case I'm also inclined to add the country on migration (rather than pre-loading values by default) and allow metadata creators to add values as needed.
    • The islandora_core_feature module provides a default migration for arbitrary tags defined in that module. It is simple to replicate if we choose to enable loading the entire vocabulary.
    • The local URI being irrelevant is why I kinda wish we had the external_uri magic I thought we had... 😞
  • genre/form: 👍 just let let us know.

Side note on module reusability: I do reuse the module for our ArchivesSpace/Drupal integration project. True, we are also using it with Islandora 8, but a repository wouldn't need to.

@seth-shaw-unlv
Copy link
Contributor

@Natkeeran & @rosiel; I'd like to get this settled in advance of a coming Islandora 8.x-1.1 release.

@Natkeeran, are you available to pull out those UUIDs and potentially do a few small updates (pending below), or would you rather I fork your branch and issue a new PR?

@rosiel, I'm thinking that the Genre and Form should be kept separate as both MODS and Dublin Core (as indicated by the selected predicates) indicates they are two very different things; although I would prefer we change 'Form' to 'Physical Form'. Also, I think that Genre overlaps the existing Resource Types vocabulary (and we are mapping them with the same DC predicate) so it makes sense to me to keep them as a single vocabulary.

As for the country codes v. geographic locations; I still prefer them to be combined as we can associate a term with any controlled vocabulary we want, including multiple vocabularies or none at all. I feel that the fewer terms with duplication across multiple vocabularies the more useful our linked data will be.

All that stated, I'll approve/merge this if the MIG wants to keep them separate. (Tagging @mbolam & @rtilla1.)

@rosiel
Copy link
Member

rosiel commented Nov 7, 2019

Genre/Form: Yes, separate, please.
Form -> Physical Form: Yes, please. I have a similar comment in the islandora_defaults PR.
The vocabulary that we're using for Resource Type is a very different vocabulary than anything we'd use for Genre. Following with the concept of "Vocabularies" (i.e. that there are existing controlled vocabularies out there, and sometimes there are multiple we'll need to use) I would like to keep, at least in Defaults, the idea that a Drupal Vocabulary corresponds to an external vocabulary (or an internal vocabulary, as case may be). I can see having at one institution, separate vocabularies (so that they could be managed separately) of:

  • marcgt (MARC genre terms)
  • AAT genre terms
  • local genre terms

This is the list of the known Vocabularies that might apply to genre/form: https://www.loc.gov/standards/sourcelist/genre-form.html

The suggestion to arbitrarily, or based on string matching, merge terms from existing vocabularies is NOT advised. As @rtilla1 and I kind of touched on in our talk, authority work is the maintenance of these vocabularies, and it is careful, painstaking work that declares something to be authoritatively true in a concrete knowledge system. That's what controlled vocabs (i.e. authority files) are - someone making a decision that "this is the way the world is" - these were made in the days when not everybody could say anything about any topic, and these authority files/vocabularies still hold weight as "official" manifestations of institutions.

So that's the philosophical side. Here's an example of why this matters: https://www.pbs.org/newshour/politics/gop-reinstates-usage-of-illegal-alien-in-library-of-congress-records

So no, none of us get to say "London, as defined as a place by Geonames, is exactly the same as London, defined as a place by Wikidata".

Using fewer controlled vocabularies in our predicates is probably a good idea for ease of harvesting, but I don't think using fewer controlled vocabularies in our values gives us any benefit at all.

@seth-shaw-unlv
Copy link
Contributor

Perhaps to clarify, I'm not saying I would arbitrarily or string-match similar terms. Any association between related terms ought to be made by the metadata creators (or, more specifically, someone tasked with approving these relations suggested by metadata creators) after careful review.

That stated, that any term we create is essentially a local authority record and until we have a magical swap my URI for an external one feature every new taxonomy term we add creates a new local authority record. While true that "London, as defined as a place by Geonames" is not exactly the same as "London, defined as a place by Wikidata", they are similar and represent to our casual users the same conceptual place. Therefore they could both be associated with our local record.

If, however, I create a taxonomy term for both Londons, then I will have two local authority records for London linking back to their respective sources. They will both, also come up in our site searches unless you suppress one or both of them, which can lead to a confusing end-user experience. I don't think they want to see multiple terms across various vocabularies that express generally the same concept. I would much rather create a single London local authority record in my Geographic Locations vocabulary that includes links to Geonames and Wikidata. Creating new Drupal vocabularies for Geonames and Wikidata to link to entries in a Geographic Locations vocabulary because they aren't exactly the same, as well as a separate countries vocabulary, strikes me as introducing unnecessarily complicated bloat.

Getting back to the Countries list; if I already have a country in my Geographic Locations vocabulary from Geonames and I add a second Countries list for the MARC authority it will result in two entities on my site, representing two local records that, to most users, also represent the same concept. OR I can simply add the MARC URI to my existing Geographic Locations terms resulting in a single local record related to both of those other authority record sources.

I can still split these all apart if/when I start exporting records. Need the MARC code for a country I linked to from my record? Fine, do a look-up on the term for the URI that matches the MARC URL pattern. Want to know what the LOC says about it while navigating the linked data? Follow the LOC URL.

Now, if we get the 'magically swap my local URL for the URL in one of my fields' feature setup, then creating whole vocabularies and terms might make more sense. Until then, I want to limit the number of local records I create.

@seth-shaw-unlv
Copy link
Contributor

seth-shaw-unlv commented Nov 8, 2019

Oh, speaking to the LOC illegal-alien issue, that is exactly why we want our local records. We have materials related to many indigenous groups where we don't want the LOC authorized heading to be used when displaying records; although we do want to indicate that we are conceptually talking about the same group. So, our local records use the name the indigenous group prefers but we also link to the LOC record as well as (potentially) Wikidata or other sources.

That is also why we use schema:sameAs. The schema:sameAs predicate allows a broader representation of 'same identity' than other ontologies (the example includes an official website in addition to a Wikipedia page which can be very different representation of the same 'identity').

@seth-shaw-unlv
Copy link
Contributor

Yesterday in the MIG meeting it was determined that the MARC Country Codes should probably be added as a separate vocabulary.

I would like to propose, as a compromise, that we include it as a sub-module.

As was noted in the meeting, this PR is impacting a module that is 'required' by default installations of Islandora 8, and I would rather this vocabulary not be. Yes, it is useful for those migrating from Islandora 7 and we should make it easy to include; but not all of us are coming from MARC or MODS and we 'greenfield' users may not need it.

We can make a sub-module called MARC Countries (working title; we can make it more generic too) that includes the vocabulary configs AND a CSV migration of the terms that can be triggered automatically on module installation. Any fields that we want to add to islandora_defaults that use this vocabulary can be put in the config/optional/ directory with the countries module listed as a dependency. (When a
module is enabled Drupal will automatically look for configurations in the optional directories that reference it and enables those it can.) Then, adding support for this vocabulary is as simple as adding it to your islandora playbook or simply enabling it in the UI after installation.

@rosiel
Copy link
Member

rosiel commented Nov 20, 2019

I kinda like that. However, it even sounds like maybe MARC Country codes should be in Islandora Defaults? It's part of "the default metadata profile", and I'm unclear why we have to add all the taxonomies to support the Repository Item content type here in a different module. Is it because Defaults is a Feature, and Features can't include the Migration steps that allow us to populate content entities (as vocabularies are config but terms are content)?

@seth-shaw-unlv
Copy link
Contributor

This division stems back to when I initially wrote Controlled Access Terms to create common vocabularies shared by Islandora and my ArchivesSpace/Drupal integration. (The initial set was a reflection of what I was importing from ArchivesSpace.) I wanted both projects to be able to be use them independently but also co-exisist without duplicating these common vocabularies; ergo, a separate module both could use.

I'm fine with the submodule living in either location. I don't have any plans for supporting MARC country codes in the ArchivesSpace integration project.

@seth-shaw-unlv
Copy link
Contributor

Is it because Defaults is a Feature, and Features can't include the Migration steps that allow us to populate content entities (as vocabularies are config but terms are content)?

To respond directly to this; no, we don't have any restrictions on what we include in a module that is also a Feature. Indeed, I have a local module dedicated to our migrations that is also a Feature. I rely on Features heavily while developing those migrations. (Also, migrate_islandora_csv is also a module/Feature dedicated to migrating content entities.)

Also, technically, terms and nodes are both content entities. The Migrate API can theoretically migrate any entity (content OR config) although I haven't seen any examples of config migrations.

@seth-shaw-unlv
Copy link
Contributor

@Natkeeran, in summary, please:

  • remove the UUIDs,
  • rename 'Form' to 'Physical Form' (I know changing machine names in a PR is a bit of a pain, but changing it from form to physical_form would be useful and help avoid confusion),
  • remove 'Country' (to be replaced with a MARC Countries vocab in a submodule either in this module or in islandora_defaults).

Does that look right, @rosiel?

@Natkeeran
Copy link
Contributor Author

@seth-shaw-unlv @rosiel

I've removed the uuids and renamed the form to physical form.

Should we move the Country into a sub module in islandora_defaults called islandora_marc_fields or islandora_marc_extended ?

@seth-shaw-unlv
Copy link
Contributor

If you think there will be several other MARC-specific items.

I figured we would call the sub module simply 'marc_countries', or 'islandora_marc_countries' if we want to be insistent on keeping the islandora prefix for all submodules.

Also, if you want to remove the countries vocabulary from this PR we can go ahead and merge it since an islandora_defaults submodule is a separate PR anyway.

@seth-shaw-unlv
Copy link
Contributor

👍 I want to spin it up again before approval/merge, but that might be tomorrow due to meetings today.

@Natkeeran
Copy link
Contributor Author

@seth-shaw-unlv

I've removed the country vocab from here and put into a sub module in islandora_defaults. This PR should be good to go.

Copy link
Contributor

@seth-shaw-unlv seth-shaw-unlv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few minutes today! Looks good. No errors on load.

@seth-shaw-unlv seth-shaw-unlv merged commit 0680767 into Islandora:8.x-1.x Jan 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants