Provide scholarly content types #275

bryjbrown · 2016-05-30T22:25:14Z

Title (Goal)	Provide scholarly content types
Primary Actor	Repository admin
Scope	institutional repositories
Level	Low
Story	As a repository admin, I want to be able to store many different types of scholarly content in the institutional repository.

Scholarly content types:

Journal article
Book chapter
Book
Presentation
Poster
Research data (1 set)
Data Sets (2 or more sets)
Electronic Theses & Dissertations (ETDs)
Undergraduate Honors Theses

Examples:

As a repository admin, I want to ingest a faculty researcher's book chapter.
As a repository admin, I want to ingest a PhD student's dissertation along with some Excel spreadsheets (research data)
As a repository admin, I want to ingest a faculty researcher's published journal article with research data (mandated by grant)
As a repository admin, I want to ingest a faculty researcher's presentation at a conference (slides & optional audio/video)
As a repository admin, I want to collect many research data objects together into an associated data set

Remarks:

This use case's primary purpose is to invite a discussion about what types of "scholarly" content CLAW will support and what metadata profile & display should be attached to those chosen.
- Helpful suggestions would include merging two or more proposed content types together, renaming them, or adding new content types to the list
- Naming specific metadata or cdisplay concerns surrounding each content type is also incredibly helpful.
  - Some may have a use for the SPAR ontology
  - May have to do some original ontology-ing
- Another primary concern is the way that some objects relate to or contain/are contained by others (think Islandora 1.x compound objects). Examples include:
  - Data sets to individual pieces of research data
  - ETDs that come with supplementary data
  - Journal articles that are published with one piece of research data or an entire set
- How to model these relations? PCDM objects containing child objects? LDP containers of some sort?
Scholar in 1.x only provides citations/theses
What content types do the Hydra folks use? Any special associated metadata? What are the display differences?
- Looks like the current Hydra IR project is Sufia

bryjbrown · 2016-05-30T22:48:43Z

Extra info to inform the discussion: the benefit of having a separate content type for every possible kind of scholarly work is that each one gets a separate set of metadata field and display. The downside is that it adds complexity for users.

DiegoPino · 2016-05-31T00:09:01Z

@bryjbrown be prepared for a long and fine discussion at this weeks Claw Call 😄

I will summon here @ruebot, @whikloj, @acoburn, @manez, @br2490 and any other @Islandora-CLAW/7-x-2-x-sprinters

I have been working on trying to define a data models approach for CLAW this last months. Here some of my notes of simple ideas, thoughts or conclusions (and questions! lots of them)

In RDF/Fedora4 a resource can be of multiple rdf:types. By default Fedora 4 already provides us with some base ones for Fedora4/LDP world. The ability to choose multiple ones and based on those(sometimes not even needed, SPAR for example provides object and data properties that can be added to any rdf:type) select the valid properties should prevail to give our system flexibility and expressivity.
Which leads to the question: what is a "data model" then? Should we fix it at the meaning/definition level of what book, thesis, video is (the higher ontologies) or at a lower level? In Claw this should no longer match our previous 7.x-1.x static, predefined idea of a composite ds, since a valid data model is any RDF Graph construct that is valid in a given ontology(or set/ intersection/union, etc) domain and luckily, if we respect some basics of LDP containment, Fedora 4 allows us to follow that approach.
So a data model should be build/live on one, lower, semantic layer(structure), allowing other layers (higher ones) to extend the idea with more complex semantics. For example: PCDM. If our data model will be build under a LDP constrained PCDM Ontology then basically there will be little to no difference between a thesis, a paper or manuscript. Since the same construct should be able to suffice the need to store/define/retrieve the different parts that can be used to describe such. On the other side, a book, a magazine, newspaper issue, could need a more complex data structure since we are dealing with sequences. But here comes the difficulty. Nothing should be wrong if you want/need to use a complex paged structure for a thesis right? Maybe your thesis has a video? we can relate to a different PCDM object or maybe just attach the video file as part of this? So maybe data models should be functional and close to functional needs (like ordering, retrieving, displaying) and say little about the human nature/interpretation of the digitally described resource?
Sadly, this simple base idea of what RDF stands for adds a tremendous level of complexity to CLAW if we want to provide the gained flexibility provided. So what we know/ask our self is:
- PCDM allows us to build multiple valid graphs (too many!). If it does not conflict with the ontolog(y)ies, then the construct is right. Should we allow our users to build them interactively to suit the constructs to their own needs? And if so, how do we assure the re-usability of the already built constructs? Lets say admin comes up with a pretty good pcdm based graph. We just want other users to fill with resources and properties, not re-invent the wheel right? Maybe we could create named graphs in some rdf store(triple store) to make this reusable, like a template(i hate templates) or just go with full restricted ontologies that describe and limit each type of resource to one single option? (I need to read more about this https://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts ), I think this is one of our First big open questions.

After you have solved that problem:

To assert, based on such a construct, that your base PCDM Object for example is describing a thesis, you can add an rdf:type of some ontology that serves this purpose. (e.g skos something) or to multiple members of this graph and then choose the by-those-defined object properties and data properties to add further info about the resource (look at this for example http://www.loc.gov/bibframe/docs/bibframe2-model.html). But you can do both! Thesis in different domains. And should we fix this or allow that freedom?
- To describe that there are embargoes (info) you add now something like SPAR, to one or more different resources of this new PCDM based graph.
  - To impose access restrictions then you could add WebAC to this also.
    - And so on and so on, adding more and more layers (ontologies).

The problem is I always finish with the same questions, how much do we want to fix in code, how much flexibility do we want to give metadata professionals on CLAW?
Display is a different issue and luckily easier to solve. Just traverse any graph harvesting the resources and display them. You don't have to care during the traversal what they are, just what they "contain/describe". You can then "theme" based on multiple options, rdf:types, mime/types, etc. and aggregate them on a united display.
The trouble is during definition -> creation -> re-use of complex definitions.

My conclusions:
Data Model is the rdf graph structure used to bind the needed resources (functional). No higher human meanings here. I see this almost as a simple semantic "storage" hierarchy.
We can provide an in-code-fixed set of data models that can respond to certain needs(like hydra does) or we can, given a or a few ontologies build a human friendly interface/reasoning algorithms so admins can built their own, fulfilling simple functional needs. Here i need more knowledge.
Thesis, etc, are not longer data models, are definitions, additional semantic layers you can add to any of the underlying data models you can imagine/create without failing axiomatic assertions for LDP/PCDM or whatever other structural ontology we choose.

Damn is this hard!

bryjbrown · 2016-05-31T12:45:14Z

@DiegoPino Brandon Weigel responded to the thread about this on the IRIG list:

Regarding the scholarly content type question, there's also an issue open in Jira for 7.x: https://jira.duraspace.org/browse/ISLANDORA-1638

My proposal there is to allow a selection of CModels to be included via a config screen. There was interest in making this happen at the committers' call, but some comments and shows of support/need could help bump up the priority. (And of course such functionality would be appreciated in CLAW as well...)

I think in general, trying to decide which content models should be included isn't the ideal approach to this question. Hard-coding the content models that scholar profiles can display (i.e. the way the module's current iteration handles things) will inevitably leave a good segment of our community dissatisfied; you'll never be able to find one set of content models that suit every user's needs (I speak from the experience of managing a multisite for 12 different post-secondary institutions of various types and sizes). My favourite option would be to allow for all CModels to be included, and let administrators enable and disable them via the configuration UI.

It sounds like he's hitting on the same idea that @DiegoPino is, that we don't need to explicitly define every type of scholarly work ahead of time and that a flexible approach would be much more pragmatic.

+1 for this, you've both convinced me. I still think there's room for discussion about best practices for modelling complex scholarly objects that may have children (especially things related to publishing research data and datasets!), but I'll put that on pause until my data modelling worldview is less 1.x-centric.

DonRichards · 2020-05-12T23:57:41Z

Just pointing out the meta tags for Datasets Drupal contrib module https://www.drupal.org/project/schema_dataset

DiegoPino added use case labels May 31, 2016

mjordan mentioned this issue Jul 4, 2018

Model complex objects like books, newspapers, and generic "compound" objects #868

Open

kstapelfeldt added Subject: Content/Object Model related to Islandora content modelling. Type: use case proposes a new feature or function for the software using user-first language. and removed architecture labels Sep 25, 2021

DonRichards added the Subject: Institutional Repository IRIG label Oct 21, 2021

kstapelfeldt added this to Islandora Issues Queue Feb 1, 2022

kstapelfeldt moved this to Todo in Islandora Issues Queue Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide scholarly content types #275

Provide scholarly content types #275

bryjbrown commented May 30, 2016

bryjbrown commented May 30, 2016

DiegoPino commented May 31, 2016 •

edited

Loading

bryjbrown commented May 31, 2016

DonRichards commented May 12, 2020

Provide scholarly content types #275

Provide scholarly content types #275

Comments

bryjbrown commented May 30, 2016

bryjbrown commented May 30, 2016

DiegoPino commented May 31, 2016 • edited Loading

bryjbrown commented May 31, 2016

DonRichards commented May 12, 2020

DiegoPino commented May 31, 2016 •

edited

Loading