Skip to content
Mark Jordan edited this page Feb 23, 2016 · 2 revisions

This cookbook contains recipes for using the Islandora Feeds (and Islandora Feeds Derivs) modules. If you have a recipe, please open an issue and we can add it here.

Adding objects with only an OBJ datastream

Start off with a CSV file, like this one for example:

Title, Date published, Description, Rights
"Picture of a dog","2014-09-12", "A black dog.", "This image is in the public domain."
"Picture of a cat","2013-04-20", "An organge cat.", "This image is in the public domain."
"Picture of a snake","2014-12-16", "A scary snake.", "This image is in the public domain."

Create a Drupal content type that contains fields corresponding to the fields in your CSV file:

Basic Drupal content type

Then, configure a Feeds Importer to map the fields in the CSV file to the fields in the Drupal content type, making sure the Importer is attached to the content type you just created:

Feeds field mappings

When you have saved your Feeds Importer settings, you are ready to import your CSV data using the standard methods provided by the Feeds contrib module. A sample Islandora object ingested from this data will look like this:

Sample Object

Adding objects with OBJ and MODS datastreams

Prepare and import CSV data as in the previous recipe. In addition, enable the Islandora Feeds Derivs module. You will need to write an XSL stylesheet like this one that copies the values in your Islandora objects into the corresponding elements in MODS (see comments within the XSL below) and upload it to the Islandora Feeds Derivs' /xml directory:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <!-- Sample XSL stylesheet for creating MODS derivatives from OBJ datastreams created
       by the Islandora Feeds module. -->

  <xsl:param name="DSID">MODS</xsl:param>
  <xsl:param name="DSLABEL">MODS record</xsl:param>

  <xsl:template match="/">
    <mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
      <titleInfo>
        <!-- Add the Islandora object's title element to the MODS title element. -->
        <title><xsl:value-of select="fielddata/title"/></title>
        <subTitle/>
      </titleInfo>
      <name type="personal">
        <namePart/>
        <role>
          <roleTerm authority="marcrelator" type="text"/>
        </role>
      </name>
      <typeOfResource></typeOfResource>
      <genre></genre>
      <originInfo>
        <!-- Add the Islandora object's date_published element to the MODS dateIssued element. -->
        <dateIssued><xsl:value-of select="fielddata/date_published"/></dateIssued>
        <publisher/>
        <place>
          <placeTerm authority="marccountry" type="code"/>
        </place>
        <place>
          <placeTerm type="text"/>
        </place>
      </originInfo>
      <language>
        <languageTerm authority="iso639-2b" type="code"/>
      </language>
      <!-- Add the Islandora object's description element to the MODS abstract element. -->
      <abstract><xsl:value-of select="fielddata/description"/></abstract>
      <identifier type="local"></identifier>
      <physicalDescription>
        <form></form>
        <extent/>
      </physicalDescription>
      <note/>
      <subject>
        <topic/>
        <geographic/>
        <temporal/>
        <hierarchicalGeographic>
          <continent/>
          <country/>
          <province/>
          <region/>
          <county/>
          <city/>
          <citySection/>
        </hierarchicalGeographic>
        <cartographics>
          <coordinates/>
        </cartographics>
      </subject>
      <!-- All of our images are public domain so we can use boilerplate. -->
      <accessCondition type="use and reproduction">Use of this public-domain resource is unrestricted.</accessCondition>
  </mods>
  </xsl:template>
</xsl:stylesheet>

After you have uploaded your stylesheet, configure Islandora Feeds Derivs so that it is selected in the Stylesheet list:

Islandora Feeds Derivs configuration

Now, when you import your objects, they will have MODS datastreams that look like this:

<?xml version="1.0"?>
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
  <titleInfo>
    <title>Picture of a cat</title>
    <subTitle/>
  </titleInfo>
  <name type="personal">
    <namePart/>
    <role>
      <roleTerm authority="marcrelator" type="text"/>
    </role>
  </name>
  <typeOfResource/>
  <genre/>
  <originInfo>
    <dateIssued>2013-04-20 </dateIssued>
    <publisher/>
    <place>
      <placeTerm authority="marccountry" type="code"/>
    </place>
    <place>
      <placeTerm type="text"/>
    </place>
  </originInfo>
  <language>
    <languageTerm authority="iso639-2b" type="code"/>
  </language>
  <abstract>An organge cat. </abstract>
  <identifier type="local"/>
  <physicalDescription>
    <form/>
    <extent/>
  </physicalDescription>
  <note/>
  <subject>
    <topic/>
    <geographic/>
    <temporal/>
    <hierarchicalGeographic>
      <continent/>
      <country/>
      <province/>
      <region/>
      <county/>
      <city/>
      <citySection/>
    </hierarchicalGeographic>
    <cartographics>
      <coordinates/>
    </cartographics>
  </subject>
  <accessCondition type="use and reproduction">Use of this public-domain resource is unrestricted.</accessCondition>
</mods>

Adding objects with only a MODS datastream

Follow the same steps described in the previous recipe, but this time, check the "Delete the OBJ datastream" option:

Delete the OBJ datastream option

Creating OBJ datastreams with complex content

Islandora Feeds only generates OBJ datastreams containing flat XML whose element names directly mirror the machine-readable fieldnames used by Drupal content types, which means you can't use camelcase in the element names, you can't use hierarchical XML structures, and you can't use attributes or namespaces.

If you want your OBJ datastreams to contain anything more complex that that - for example, XML conforming to a specific schema - you will need to create the XML you want as a derivative and then configure Islandora Feeds Derivs to replace the OBJ's XML content with the content from the derivative. Within Islandora Feeds Derivs' configuration settings, check the "Replace the OBJ datastream with the derivative" option and also check the "Delete the derivative datastream" (unless you want to keep both datastreams):

Creating OBJ datastreams with complex content

Of course, you will need to write an XSL stylesheet to generate your derivative datastream's content from the flat OBJ XML and select that stylesheet in the configuration settings. If you select the "Replace the OBJ datastream with the derivative" option, you can only select one XSL stylesheet.

Importing thumbnails

If your Drupal nodes have an image field with the machine name 'field_tn', the image in this field (or the first image in the field if it repeatable) will be added to the corresponding Islandora object as a TN datastream. You can add the image to your nodes manually (via the node add/edit form) or in the feed import; if the latter, you will need to upload the images to your Drupal server to a location the feed fetcher can access it (usually the public files directory) and reference your thumbnail images in your CSV data using either the "public://uri" syntax or the fully qualified URL to your images:

Title, Field 1, Field 2, Field 3, TN
"First object's title","... ", "...", "...", "public://first.png"
"Second object's title","... ", "...", "...", "http://localhost/path_to_drupals_public_files/second.png"
"Third object's title","... ", "...", "...", "public://third.png"

The http:// URLs must point to your local server, they won't work if they point to remote files.

Performing quality control/content edits on source nodes before creating Islandora objects from them

Islandora Feeds provides the option to import your CSV data into Drupal nodes without creating the corresponding Islandora objects. Within your Feeds Importer's Processor settings, you will see:

Processor options for keeping nodes and ingesting objects

If you uncheck the "Ingest objects" option and check the "Keep nodes" option, you can edit the nodes created from your CSV data before creating Islandora objects from them. For example, you could add additional fields, add thumbnails, or allow someone else do perform quality control checks on the nodes (they are ordinary Drupal nodes so any workflow you can configure in Drupal can be implemented on them).

In order to create your Islandora objects, you will need to install Views Bulk Operations and create a view that uses the custom action "Create Islandora objects from nodes" provided by the Islandora Feeds module:

Create Islandora objects from nodes action

To create your objects when you are ready, list them in a View and apply the action.

Populating datastream fields with repeated values

You can import data that contains repeated values, thanks to the Feeds Tamper contrib module. To do so, prepare your CSV as follows, separating repeated values with a character such as a semicolon:

Title, Field 1, Field 2, Field 3
"...", "...", "...","Field 3's first value; field 3's second value; and so on"
"...", "...", "...", "Second item's field 3 value"
"...","...", "...", "Field 3's first value; field 3's second value; field3's third value"

The resulting OBJ XML for the item created from data in the first row of this file will look like this:

<fielddata>
  <title label="Title">A sample Islandora object</title>
  <myfield_1 label="First field">The first field's value</field_1>
  <myfield_2 label="Second field">The second field's value</field_2>
  <myfield_3 label="Third field">Field 3's first value</field_3>
  <myfield_3 label="Third field">field 3's second value</field_3>
  <myfield_3 label="Third field">and so on</field_3>
</fielddata>

To get this to work, you must do the following:

  • Install Feeds Tamper.
  • Configure the field in your Drupal content type that will accept repeated values so that it accepts an unlimited number of values.
  • In your Feeds Importer, click on the Tamper tab and in the list of field mappings, and click on "+Add plugin" for the field that will accept repeated values (same one you configured in the previous step).
  • Add the "Explode" Tamper plugin, and in the String Separator field, enter the same character you used to delimit your subvalues in the CSV file (in the example above, a semicolon).

Importing objects with non-unique titles

The mappings from CSV to node properties used by Feeds requires that one field contain unique values. By default, Feeds provides the option to use the field mapped to node title to be unique. This is impractical for most data that Islandora objects will be created from.

To work around this requirement, add a field to your CSV data that will contain unique numbers (and numbers only), like the "key" field in the following example. It is important to make sure the numbers are all higher than the highest node ID that exists in your Drupal instance.

key, Title, Field 1, Field 2
"9234", "A title","First object's field 1 value", "First object's field 2 value"
"9678", "A title","Second object's field 1 value", "Second object's field 2 value"
"9012", "A title","Third object's field 1 value", "Third object's field 2 value"

In your Feeds Processor mappings, map this field to "Node ID (nid)" and flag it to be unique:

Unique field mapping

Feeds will assign the number in this field to the node ID for the temporary node created by the import. It will not get mapped to your Islandora object.

Importing XML content

Feeds XPath Parser makes this possible. Given an input file like this that contains multiple input items:

<?xml version="1.0"?>
<data>
  <entry>
    <title>First entry's title</title>
    <secondfield>First entry's title</secondfield>
    <thirdfield>First entry's second field</thirdfield>
  </entry>
  <entry>
    <title>Second entry's title</title>
    <secondfield>Second entry's title</secondfield>
    <thirdfield>Second entry's title</thirdfield>
  </entry>
</data>

or a set of XML files where each one contains one input item, like these:

<?xml version="1.0"?>
<data>
  <entry>
    <title>First entry's title</title>
    <secondfield>First entry's title</secondfield>
    <thirdfield>First entry's second field</thirdfield>
  </entry>
</data>
<?xml version="1.0"?>
<data>
  <entry>
    <title>Second entry's title</title>
    <secondfield>Second entry's title</secondfield>
    <thirdfield>Second entry's title</thirdfield>
  </entry>
</data>

it is possible to create corresponding Drupal nodes and Islandora objects. Install Feeds XPath Parser, and configure your Feeds Importer to use the "XPath XML parser". Within the Processor Mappings, use an XPath Expression for each of the target fields:

XPath mappings

Then, within the parser's Settings, configure your XPath queries as illustrated here:

XPath queries

If you are importing data contained in a single XML file, you can upload it from within Feeds' user interface at the time of import or place it on the server under your Drupal's public files directory. If you are importing data contained in multiple XML files, you must place the in their own subdirectory under Drupal's public files directory. If the file(s) are on your Drupal server, you need to address them using the "public://" URI convention (e.g., "public://xmlimport" would be equivalent to "sites/default/files/xmlimport"), as illustrated here:

Specifying an import source directory

You can name your directory whatever you want, but it needs to be below your Drupal's public files directory.

Automating periodic importing of content

Currently it's not possible to use Feeds' "Periodic import" via Drupal's cron to ingest Islandora objects, since Drupal's anonymous user does not (or at least should not) have permission to ingest objects. It is possible to create nodes using this feature, but you will need to ingest the corresponding Islandora objects while logged into Drupal as a user will sufficient permissions.

The included drush script can run Feeds importers that create nodes, for example:

drush islandora-feeds-import --feed-id=foofeed

The corresponding Islandora objects can then be ingested using a workflow similar to the one described in the section "Performing quality control/content edits on source nodes before creating Islandora objects from them" above.

While this drush script can be run as a user with sufficient privileges to ingest Islandora objects, drush cannot use the session variables generated by the Islandora Feeds and Islandora Feeds Derivs modules.

If automating the ingestion of Islandora objects using Feeds in an important use case, please comment on this issue so we can determine the priority of developing a workaround.