diff --git a/docs/user-documentation/CLAWfor1x.md b/docs/user-documentation/CLAWfor1x.md index cfcac2fac..5e78301ee 100644 --- a/docs/user-documentation/CLAWfor1x.md +++ b/docs/user-documentation/CLAWfor1x.md @@ -27,13 +27,16 @@ Fedora 3 objects are FOXML (Fedora Object eXtensible Markup Language) documents, * `System Properties`: A set of system-defined descriptive properties that is necessary to manage and track the object in the repository. * `Datastream(s)`: The element in a Fedora digital object that represents a content item. -In Fedora 4 , what we would have called `objects` are now referred to as `resources` and are not composed of XML; instead, they are stored in ModeShape as nodes with RDF properties. They can contain the following elements: +In Fedora 4 , what we would have called `objects` are now referred to as [`Resources`](https://www.w3.org/TR/ld-glossary/#resource) (and *everything* in Fedora 4 is a `Resource`). Instead of being composed of XML as they were in Fedora 3, they are stored in [ModeShape](http://modeshape.jboss.org/) as nodes with RDF properties. `Resources` come in two flavors: [`RDF Sources`](https://www.w3.org/TR/ldp/#ldpr-resource), which are `Resources` having only RDF data, and [`Non-RDF Sources`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-non-rdf-source), which are `Resources` that are binary files (HTML, PDFs, images, audio, video, etc). The terms [`RDF Source`] and [`Non-RDF Source`] both come from the [W3C's](https://www.w3.org/) [Linked Data Platform](https://www.w3.org/TR/ldp/) specification, which also defines the idea of [Linked Data Platform Containers](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-container), or `LDPCs`. An `LDPC` is an `RDF Source` that functions as a collection of `Resources`, similar to the way Islandora 7.x-1.x compound objects exist only as a way to tie together its children. An `LDPC` may contain `Non-RDF Sources`, as well as other `RDF Sources` acting as `LDPCs`; you can have a container of containers just like how Islandora 7.x-1.x can have a collection of collections. -* `Resource`: Roughly equivalent to a Fedora 3 object - a conceptual representation of a thing that can contain files or other containers. -* `Non-RDF Source`: Roughly equivalent to a datastream. A Non-RDF Source (or binary) is simply a bitstream (e.g. JPG, PDF, XML, MP3, etc.). +CLAW makes use of the [Portland Common Data Model (PCDM)](https://github.com/duraspace/pcdm/wiki) as a layer of abstraction over `LDPCs` to make containment simpler to understand for users; a `pcdm:Collection` may contain other `pcdm:Collections` or `pcdm:Objects` (similar to an Islandora 7.x-1.x collection content model), and a `pcdm:Object` may contain other `pcdm:Objects` (similar to the way an Islandora 7.x-1.x compound object has child objects) or `pcdm:Files` (similar to the way Islandora 7.x-1.x objects have datastreams). ### Datastreams -In Islandora CLAW, RDF datastreams (RELS-EXT and RELS-INT) are stored as RDF in Fedora. Binary datastreams are files or `nonRdfResources` (see [PCDM](https://github.com/duraspace/pcdm/wiki)). Descriptive metadata datastreams (MODS, DC, DwC, PBCore, etc) are stored as RDF; [`RDFSource`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-rdf-source). +In Islandora 7.x-1.x, every object has a specific content model which defined what datastreams it could have and which were absolutely required. Some of these Islandora 7.x-1.x datastreams contained metadata about the object while others contained binary files (JPG, PDF, MP3, PNG, TIFF, etc). In Islandora CLAW, all metadata about a resource is stored as RDF attributes directly on the resource itself, whether that resource is a `pcdm:Collection`, `pcdm:Object` or a `pcdm:File`, so we no longer need to separate metadata by type (RELS-EXT, RELS-INT, MODS, DC, PREMIS, etc) and store it in binary files as we did in Islandora 7.x-1.x. + +Binary files, such as JPGs, PNGs, MP3s, and PDFs, are handled via `pcdm:Files` which are contained by a parent `pcdm:Object`, similar to how an Islandora 7.x-1.x cmodel may hold a PDF or JPG as a datastream. Unlike Islandora 7.x-1.x, these binary files can actually have their own technical metadata attached them. This is because `pcdm:Collections`, `pcdm:Objects` and even `pcdm:Files` are all `RDF Sources` containing only RDF data, with `pcdm:Files` having links to the URL of the `Non-RDF Source` (binary file) they represent as part of their RDF data in addition to whatever other metadata you may want about the file. Using this system, a `pcdm:Object` can contain as many `pcdm:Files` as necessary, and each `pcdm:File` can have separate metadata about itself and its relationship to other `pcdm:Files` attached to the parent `pcdm:Object`, serving the same purpose RELS-INT datastreams served in Islandora 7.x-1.x. + +Note that you *can* use a `pcdm:File` to represent a file of metadata, such as MODS, DC, or PBCore, in case you would like to preserve a copy of an object's legacy metadata when migrating into Fedora 4. These metadata files will be treated like any other binary file in Islandora CLAW, and will not be indexed or editable through the GUI. #### PIDs Every object in a Fedora 3 repository had a Persistent Identifier following the pattern `namespace:pid`. Fedora 4 resources do not have PIDs. Instead, since Fedora 4 is an [LDP server](https://www.w3.org/ns/ldp), their identifiers are fundamentally their URIs. The PIDs of objects migrated from a Fedora 3 repository can still be stored in Fedora 4, as additional properties on the new Fedora 4 resource. diff --git a/docs/user-documentation/intro-to-claw.md b/docs/user-documentation/intro-to-claw.md index 0c92dc8e4..73f4e5ad0 100644 --- a/docs/user-documentation/intro-to-claw.md +++ b/docs/user-documentation/intro-to-claw.md @@ -2,19 +2,16 @@ Islandora CLAW is the project name for development of Islandora to work with Fedora 4. To fully understand Islandora CLAW, it is best to start by looking at its contrasts to the previous version of Islandora, known as 7.x-1.x. -Islandora 7.x-1.x works as a bridge between Drupal 7.x and Fedora 3. Put simply, Islandora 7.x-1.x is middleware between Fedora 3 and Drupal 7.x, sometimes expressed as a hamburger: ## Islandora 7.x-1.x (with Fedora 3) +Islandora 7.x-1.x is "middleware", it allows Drupal 7.x to talk to a Fedora 3 server instead of Drupal's database. This is sometimes expressed as a hamburger: ![image](../assets/hamburger.png) -Islandora CLAW does more than simply replace that base layer with Fedora 4. It is a total re-architecting of the interaction between the various pieces. Rather than a hamburger, Islandora CLAW is a chimera: ## Islandora CLAW (with Fedora 4) -![image](../assets/claw-chimera.png) - -Or, for a diagram that doesn't involve food or animals: +Islandora CLAW does more than simply replace that base layer with Fedora 4. It is a total re-architecting of the interaction between the various pieces, acting as middleware for not only Drupal 8.x and Fedora 4, but also Solr, Blazegraph, and any [microservices](https://en.wikipedia.org/wiki/Microservices) added to the stack. Islandora CLAW achieves this by implementing a system of "plumbing" using Apache Camel to pass messages between all the different parts of the stack to keep them in sync with each other. Rather than a hamburger, Islandora CLAW is a [chimera](https://en.wikipedia.org/wiki/Chimera_(mythology)): -![image](../assets/claw-diagram.png) +![image](../assets/claw-chimera.png) This new structure has several advantages: diff --git a/docs/user-documentation/intro-to-ld-for-claw.md b/docs/user-documentation/intro-to-ld-for-claw.md new file mode 100644 index 000000000..9ce7c9d08 --- /dev/null +++ b/docs/user-documentation/intro-to-ld-for-claw.md @@ -0,0 +1,70 @@ +# Introduction to Linked Data for CLAW +The purpose of this page is to provide a guided reading list to anyone who wants to get up to speed on the basics of linked data within the Islandora community. Those who make their way through the readings will be able to talk competently about linked data and better understand the design decisions made in Islandora CLAW. The list starts with the fundamentals of linked data (RDF, SPARQL, serializations and ontologies) and moves toward more advanced topics specific to the use cases of a Fedora 4 based digital repository system. + +# Basics of Linked Data +This section seeks to give the reader a foundational understanding of what linked data is, why it is useful, and a very superficial understanding of how it works. +- [Tim Berners-Lee’s description of Linked Data](https://www.w3.org/DesignIssues/LinkedData.html) +- [Manu Sporny's "What is Linked Data?" YouTube Video](https://www.youtube.com/watch?v=4x_xzT5eF5Q) +- [Wikipedia article on Linked Data](https://en.wikipedia.org/wiki/Linked_data) +- [Wikipedia article on Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web) +- [Wikipedia article on URIs](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) +- [Wikipedia article on the W3C](https://en.wikipedia.org/wiki/World_Wide_Web_Consortium) +- [W3C’s description of Linked Data](https://www.w3.org/standards/semanticweb/data) +- [W3C’s Linked Data Glossary](https://www.w3.org/TR/ld-glossary/) +- [W3C’s Architecture of the World Wide Web](https://www.w3.org/TR/webarch/) + +# Understanding RDF +This section is all about RDF, the Resource Description Framework, which defines the way linked data is structured. +- [Wikipedia article on RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework) +- [D-Lib’s Intro to RDF](http://www.dlib.org/dlib/may98/miller/05miller.html) +- [W3C’s RDF 1.1 Primer](https://www.w3.org/TR/rdf11-primer/) +- [W3C’s RDF 1.1 Concepts](https://www.w3.org/TR/rdf11-concepts/) + +# Querying Linked Data with SPARQL +This section takes a look at SPARQL, the query language that allows you to ask linked data very specific questions. The queryable nature of linked data is one of the things that makes it so special. Try some SPARQL queries on DBpedia's endpoint to get some hands-on experience. +- [Wikipedia article on SPARQL](https://en.wikipedia.org/wiki/SPARQL) +- [W3C’s SPARQL 1.1 Overview](https://www.w3.org/TR/sparql11-overview/) +- [W3C’s SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/) +- [DBpedia's SPARQL Endpoint](https://dbpedia.org/sparql) + +# RDF Serialization Formats +RDF data can be translated into many different formats. RDF/XML is the original way that RDF data was shared, but there are much more human-friendly serialization formats like Turtle which is great for beginners. JSON-LD is the easiest format for applications to use, and is the serialization format that CLAW uses internally. Make sure to check out the [JSON-LD Playground](http://json-ld.org/playground/) for an interactive learning experience. +- [Wikipedia article on Serialization](https://en.wikipedia.org/wiki/Serialization) +- [W3C’s RDF/XML Syntax Specification](https://www.w3.org/TR/REC-rdf-syntax/) +- [W3C’s RDF 1.1 Turtle](https://www.w3.org/TR/turtle/) +- [W3C’s JSON-LD 1.0](https://www.w3.org/TR/json-ld/) +- [JSON-LD Website](http://json-ld.org/) +- [JSON-LD Playground](http://json-ld.org/playground/) + +# Ontology & Vocabulary Basics +Ontologies & vocabularies are created by communities of people to describe things, and once created, anyone can use an ontology or vocabulary to describe their resources. This section goes over some of the more popular ontologies & vocabularies in use. +- [Wikipedia article on Ontologies](https://en.wikipedia.org/wiki/Ontology_(information_science)) +- [W3C’s description of Ontologies/Vocabularies (sameish thing)](https://www.w3.org/standards/semanticweb/ontology) +- [Wikipedia article on Friend of a Friend (FOAF) ontology](https://en.wikipedia.org/wiki/FOAF_(ontology)) +- [FOAF 0.99 Vocabulary Specification](http://xmlns.com/foaf/spec/) +- [Socially Interconnected Online Communities Ontology (SIOC)](http://sioc-project.org/) +- [Dublin Core in RDF](http://dublincore.org/documents/dc-rdf/) + +# Building Ontologies +One isn't limited to the ontologies & vocabularies that already exist in the world, anyone is free to create their own. This section goes over ontologies that exist to help those trying to create their own ontologies. +- [Wikipedia article on RDF Schema (RDFS)](https://en.wikipedia.org/wiki/RDF_Schema) +- [W3C’s RDF Schema (RDFS) 1.1](https://www.w3.org/TR/rdf-schema/) +- [Wikipedia article on Simple Knowledge Organization System (SKOS)](https://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System) +- [ALA’s SKOS: A Guide for Information Professionals](http://www.ala.org/alcts/resources/z687/skos) +- [Wikipedia article on Web Ontology Language (OWL)](https://en.wikipedia.org/wiki/Web_Ontology_Language) +- [W3C’s OWL 2 Primer](https://www.w3.org/TR/owl2-primer/) +- [W3C’s OWL 2 Quick Reference](https://www.w3.org/TR/owl2-quick-reference/) + +# Repository-Specific Ontologies +Most ontologies are very specific to certain use cases, and digital repository systems are no different. This section covers ontologies that are of specific interest to users of CLAW, or any Fedora 4 based digital repository system. +- [MODS RDF Namespace Document](http://www.loc.gov/standards/mods/modsrdf/v1/) +- [MODS RDF Ontology Primer](https://www.loc.gov/standards/mods/modsrdf/primer.html) +- [MODS RDF Ontology Primer 2: MODS XML to RDF Conversion](https://www.loc.gov/standards/mods/modsrdf/primer-2.html) +- [PREMIS RDF Namespace Document](http://id.loc.gov/ontologies/premis.html) +- [Linked Data Platform (LDP) 1.0 Primer](https://www.w3.org/TR/ldp-primer/) +- [LDP 1.0 Specification](https://www.w3.org/TR/ldp/) +- [Portland Common Data Model (PCDM) wiki)](https://github.com/duraspace/pcdm/wiki) +- [PCDM ontologies list](http://pcdm.org/) +- [PCDM Models ontology (defines Collections, Objects & Files)](http://pcdm.org/2016/04/18/models) +- [Fedora ontologies](http://fedora.info/) +- [CLAWntology](https://github.com/Islandora-CLAW/CLAWntology)