Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates existing user docs and adds one more #616

Merged
merged 4 commits into from
May 8, 2017
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions docs/user-documentation/CLAWfor1x.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,16 @@ Fedora 3 objects are FOXML (Fedora Object eXtensible Markup Language) documents,
* `System Properties`: A set of system-defined descriptive properties that is necessary to manage and track the object in the repository.
* `Datastream(s)`: The element in a Fedora digital object that represents a content item.

In Fedora 4 , what we would have called `objects` are now referred to as `resources` and are not composed of XML; instead, they are stored in ModeShape as nodes with RDF properties. They can contain the following elements:
In Fedora 4 , what we would have called `objects` are now referred to as [`Resources`](https://www.w3.org/TR/ld-glossary/#resource) (and *everything* in Fedora 4 is a `Resource`). Instead of being composed of XML as they were in Fedora 3, they are stored in [ModeShape](http://modeshape.jboss.org/) as nodes with RDF properties. A `Resource` in Islandora CLAW may [contain](https://www.w3.org/TR/ldp/#dfn-containment) RDF data or binary files, similar to the way Islandora 7.x-1.x objects stored descriptive metadata and binary files in datastreams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Resource in Islandora CLAW may contain RDF data or binary files, similar to the way Islandora 7.x-1.x objects stored descriptive metadata and binary files in datastreams.

NonRdfSource's are Resources and do not contain anything. Containment is limited to containers.


* `Resource`: Roughly equivalent to a Fedora 3 object - a conceptual representation of a thing that can contain files or other containers.
* `Non-RDF Source`: Roughly equivalent to a datastream. A Non-RDF Source (or binary) is simply a bitstream (e.g. JPG, PDF, XML, MP3, etc.).
Unlike Islandora 7.x-1.x objects that store metadata and binary files in a predefined way depending on the content model, Islandora CLAW uses [Linked Data Platform Containers](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-container), or LDPCs, to allow resources to contain each other in a flexible way. LDPCs allow one `Resource` to act as a collection of other `Resources` similar to the way an Islandora 7.x-1.x collection contains objects, or objects contain datastreams. When part of a `Resource`, binary files (such as JPG, PDF, MP3, etc) are referred to as [`Non-RDF Sources`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-non-rdf-source) because their content is not RDF data. `Resources` that contain only RDF data are called [`RDF Sources`](https://www.w3.org/TR/ldp/#ldpr-resource).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the clarification between RdfSources and NonRdfSources up to the paragraph above? Then you could cut out some of the preceding paragraph.


CLAW makes use of the [Portland Common Data Model (PCDM)](https://github.com/duraspace/pcdm/wiki) as a layer of abstraction over LDPCs to make containment simpler to understand for users; a `pcdm:Collection` may contain other `pcdm:Collections` or `pcdm:Objects` (similar to an Islandora 7.x-1.x collection content model), and a `pcdm:Object` may contain other `pcdm:Objects` (similar to the way an Islandora 7.x-1.x compound object has child objects) or `pcdm:Files` (similar to the way Islandora 7.x-1.x objects have datastreams).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for bringing up that basically all our turn-key datatype can be compound objects.


### Datastreams
In Islandora CLAW, RDF datastreams (RELS-EXT and RELS-INT) are stored as RDF in Fedora. Binary datastreams are files or `nonRdfResources` (see [PCDM](https://github.com/duraspace/pcdm/wiki)). Descriptive metadata datastreams (MODS, DC, DwC, PBCore, etc) are stored as RDF; [`RDFSource`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-rdf-source).
In Islandora 7.x-1.x, every object has a specific content model which defined what datastreams it could have and which were absolutely required. Some of these Islandora 7.x-1.x datastreams contained metadata about the object (RELS-EXT, RELS-INT, DC, MODS, PREMIS, etc) while others contained binary files (JPG, PDF, MP3, PNG, TIFF, etc). In Islandora CLAW, all metadata about a resource is stored as RDF attributes directly on the resource itself, whether that resource is a `pcdm:Collection`, `pcdm:Object` or a `pcdm:File`, so we no longer need to separate metadata by type (MODS, DC, PREMIS, etc) and store it in binary files as we did in Islandora 7.x-1.x.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add RELS-EXT and RELS-INT to the list of no longer needed metadata datastreams. And perhaps add that putting RDF on the NonRdfSources serves the same purpose as RELS-INT in the paragraph below.


Binary files, such as JPGs, PNGs, MP3s, and PDFs, are handled via `pcdm:Files` which are contained by a parent `pcdm:Object`, similar to how an Islandora 7.x-1.x cmodel may hold a PDF or JPG as a datastream. Unlike Islandora 7.x-1.x, these binary files can actually have their own technical metadata attached them. This is because `pcdm:Collections`, `pcdm:Objects` and even `pcdm:Files` are all `RDF Sources` containing only RDF data, with `pcdm:Files` having links to the URL of the `Non-RDF Source` (binary file) they represent as part of their RDF data in addition to whatever other metadata you may want about the file. Using this system, a `pcdm:Object` can contain as many `pcdm:Files` as necessary, and each `pcdm:File` can have separate metadata about itself and its relationship to other `pcdm:Files` attached to the parent `pcdm:Object`.
Copy link
Member

@whikloj whikloj May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, its totally correct...I'm wondering if a small diagram might help. Once you start with the whole

...RDF Sources containing only RDF data, with pcdm:Files having links to the URL of...

I feel it might start to get hazy to some, but I'm not sure. Consider this an idea and not a requirement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. We'll probably need a few more diagrams in general. I'll make a separate PR for that when I get some time to draw it out.


#### PIDs
Every object in a Fedora 3 repository had a Persistent Identifier following the pattern `namespace:pid`. Fedora 4 resources do not have PIDs. Instead, since Fedora 4 is an [LDP server](https://www.w3.org/ns/ldp), their identifiers are fundamentally their URIs. The PIDs of objects migrated from a Fedora 3 repository can still be stored in Fedora 4, as additional properties on the new Fedora 4 resource.
Expand Down
11 changes: 4 additions & 7 deletions docs/user-documentation/intro-to-claw.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,18 @@

Islandora CLAW is the project name for development of Islandora to work with Fedora 4. To fully understand Islandora CLAW, it is best to start by looking at its contrasts to the previous version of Islandora, known as 7.x-1.x.

Islandora 7.x-1.x works as a bridge between Drupal 7.x and Fedora 3. Put simply, Islandora 7.x-1.x is middleware between Fedora 3 and Drupal 7.x, sometimes expressed as a hamburger:
## Islandora 7.x-1.x (with Fedora 3)
Islandora 7.x-1.x is "middleware" for Drupal 7.x and Fedora 3, meaning that it fits as a layer in between these two systems and acts as a bridge allowing them to talk to each other. This is sometimes expressed as a hamburger:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more inclined to describe 7.x as Drupal modules that talk to a Fedora server instead of Drupal's database (although that is used in places as well).


![image](../assets/hamburger.png)

Islandora CLAW does more than simply replace that base layer with Fedora 4. It is a total re-architecting of the interaction between the various pieces. Rather than a hamburger, Islandora CLAW is a chimera:
## Islandora CLAW (with Fedora 4)

![image](../assets/claw-chimera.png)

Or, for a diagram that doesn't involve food or animals:
Islandora CLAW does more than simply replace that base layer with Fedora 4. It is a total re-architecting of the interaction between the various pieces. Rather than a hamburger, Islandora CLAW is a [chimera](https://en.wikipedia.org/wiki/Chimera_(mythology)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLAW is where it really becomes middleware. I'd maybe add that here.


![image](../assets/claw-diagram.png)
![image](../assets/claw-chimera.png)

This new structure has several advantages:
Like Islandora 7.x-1.x, Islandora CLAW uses Drupal modules to extend Drupal's native functionality to handle new types of content (Fedora Resources), but unlike Islandora 7.x-1.x, Islandora CLAW contains a completely new layer of "plumbing" between Drupal, Fedora, Blazegraph (CLAW's default triplestore), Solr and any other [microservices](https://en.wikipedia.org/wiki/Microservices) to allow all of these systems to pass messages to each other and stay in sync. This new structure has several advantages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace 'other microservices' with just 'microservices'. Otherwise this would imply Solr an Blazegraph are microservices.


* Parcelling out the various services and dependencies allows for more horizontal scalability
* Changing the relationship between Drupal and Fedora allows for a more flexible approach to front-end management (i.e, it need not be Drupal) while also taking much greater advantage of features available from Drupal (i.e, Fedora objects are treated more like nodes, for the purposes of using Drupal contrib modules. Many Islandora 7.x-1.x modules are redundant in Islandora CLAW because they reproduce existing Drupal contrib modules that can be used out of the box in Islandora CLAW).
Expand Down
70 changes: 70 additions & 0 deletions docs/user-documentation/intro-to-ld-for-claw.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Introduction to Linked Data for CLAW
The purpose of this page is to provide a guided reading list to anyone who wants to get up to speed on the basics of linked data within the Islandora community. Those who make their way through the readings will be able to talk competently about linked data and better understand the design decisions made in Islandora CLAW. The list starts with the fundamentals of linked data (RDF, SPARQL, serializations and ontologies) and moves toward more advanced topics specific to the use cases of a Fedora 4 based digital repository system.

# Basics of Linked Data
This section seeks to give the reader a foundational understanding of what linked data is, why it is useful, and a very superficial understanding of how it works.
- [Tim Berners-Lee’s description of Linked Data](https://www.w3.org/DesignIssues/LinkedData.html)
- [Manu Sporny's "What is Linked Data?" YouTube Video](https://www.youtube.com/watch?v=4x_xzT5eF5Q)
- [Wikipedia article on Linked Data](https://en.wikipedia.org/wiki/Linked_data)
- [Wikipedia article on Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web)
- [Wikipedia article on URIs](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier)
- [Wikipedia article on the W3C](https://en.wikipedia.org/wiki/World_Wide_Web_Consortium)
- [W3C’s description of Linked Data](https://www.w3.org/standards/semanticweb/data)
- [W3C’s Linked Data Glossary](https://www.w3.org/TR/ld-glossary/)
- [W3C’s Architecture of the World Wide Web](https://www.w3.org/TR/webarch/)

# Understanding RDF
This section is all about RDF, the Resource Description Framework, which defines the way linked data is structured.
- [Wikipedia article on RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework)
- [D-Lib’s Intro to RDF](http://www.dlib.org/dlib/may98/miller/05miller.html)
- [W3C’s RDF 1.1 Primer](https://www.w3.org/TR/rdf11-primer/)
- [W3C’s RDF 1.1 Concepts](https://www.w3.org/TR/rdf11-concepts/)

# Querying Linked Data with SPARQL
This section takes a look at SPARQL, the query language that allows you to ask linked data very specific questions. The queryable nature of linked data is one of the things that makes it so special. Try some SPARQL queries on DBpedia's endpoint to get some hands-on experience.
- [Wikipedia article on SPARQL](https://en.wikipedia.org/wiki/SPARQL)
- [W3C’s SPARQL 1.1 Overview](https://www.w3.org/TR/sparql11-overview/)
- [W3C’s SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/)
- [DBpedia's SPARQL Endpoint](https://dbpedia.org/sparql)

# RDF Serialization Formats
RDF data can be translated into many different formats. RDF/XML is the original way that RDF data was shared, but there are much more human-friendly serialization formats like Turtle which is great for beginners. JSON-LD is the easiest format for applications to use, and is the serialization format that CLAW uses internally. Make sure to check out the [JSON-LD Playground](http://json-ld.org/playground/) for an interactive learning experience.
- [Wikipedia article on Serialization](https://en.wikipedia.org/wiki/Serialization)
- [W3C’s RDF/XML Syntax Specification](https://www.w3.org/TR/REC-rdf-syntax/)
- [W3C’s RDF 1.1 Turtle](https://www.w3.org/TR/turtle/)
- [W3C’s JSON-LD 1.0](https://www.w3.org/TR/json-ld/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://json-ld.org
and
http://json-ld.org/playground/
are great JSON-LD resources

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, totally forgot about those! Adding those now.

- [JSON-LD Website](http://json-ld.org/)
- [JSON-LD Playground](http://json-ld.org/playground/)

# Ontology & Vocabulary Basics
Ontologies & vocabularies are created by communities of people to describe things, and once created, anyone can use an ontology or vocabulary to describe their resources. This section goes over some of the more popular ontologies & vocabularies in use.
- [Wikipedia article on Ontologies](https://en.wikipedia.org/wiki/Ontology_(information_science))
- [W3C’s description of Ontologies/Vocabularies (sameish thing)](https://www.w3.org/standards/semanticweb/ontology)
- [Wikipedia article on Friend of a Friend (FOAF) ontology](https://en.wikipedia.org/wiki/FOAF_(ontology))
- [FOAF 0.99 Vocabulary Specification](http://xmlns.com/foaf/spec/)
- [Socially Interconnected Online Communities Ontology (SIOC)](http://sioc-project.org/)
- [Dublin Core in RDF](http://dublincore.org/documents/dc-rdf/)

# Building Ontologies
One isn't limited to the ontologies & vocabularies that already exist in the world, anyone is free to create their own. This section goes over ontologies that exist to help those trying to create their own ontologies.
- [Wikipedia article on RDF Schema (RDFS)](https://en.wikipedia.org/wiki/RDF_Schema)
- [W3C’s RDF Schema (RDFS) 1.1](https://www.w3.org/TR/rdf-schema/)
- [Wikipedia article on Simple Knowledge Organization System (SKOS)](https://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System)
- [ALA’s SKOS: A Guide for Information Professionals](http://www.ala.org/alcts/resources/z687/skos)
- [Wikipedia article on Web Ontology Language (OWL)](https://en.wikipedia.org/wiki/Web_Ontology_Language)
- [W3C’s OWL 2 Primer](https://www.w3.org/TR/owl2-primer/)
- [W3C’s OWL 2 Quick Reference](https://www.w3.org/TR/owl2-quick-reference/)

# Repository-Specific Ontologies
Most ontologies are very specific to certain use cases, and digital repository systems are no different. This section covers ontologies that are of specific interest to users of CLAW, or any Fedora 4 based digital repository system.
- [MODS RDF Namespace Document](http://www.loc.gov/standards/mods/modsrdf/v1/)
- [MODS RDF Ontology Primer](https://www.loc.gov/standards/mods/modsrdf/primer.html)
- [MODS RDF Ontology Primer 2: MODS XML to RDF Conversion](https://www.loc.gov/standards/mods/modsrdf/primer-2.html)
- [PREMIS RDF Namespace Document](http://id.loc.gov/ontologies/premis.html)
- [Linked Data Platform (LDP) 1.0 Primer](https://www.w3.org/TR/ldp-primer/)
- [LDP 1.0 Specification](https://www.w3.org/TR/ldp/)
- [Portland Common Data Model (PCDM) wiki)](https://github.com/duraspace/pcdm/wiki)
- [PCDM ontologies list](http://pcdm.org/)
- [PCDM Models ontology (defines Collections, Objects & Files)](http://pcdm.org/2016/04/18/models)
- [Fedora ontologies](http://fedora.info/)
- [CLAWntology](https://github.com/Islandora-CLAW/CLAWntology)