Skip to content

Cookbook: Generating MADS and accompanying thumbnails for batch loading entities

Mark Jordan edited this page Apr 22, 2018 · 4 revisions

MIK currently emphasizes generation of MODS XML as the metadata format for ingesting into Islandora. Some toolchains, such as the OAI toolchains in particular, can be configured to generate DC XML. If we want to generate other types of XML for loading into Islandora, such as MADS for entities, we will need to develop a new metadata parser and writer for each new type. On the other hand, MIK provides a convenient way to generate any type of output that doesn't require major development and testing activity - combining the Templated metadata parser with a custom post-write hook script.

This technique has two parts: 1) providing a template for the desired XML (or other text-based) metadata format and 2) renaming MIK's output to be consistent with the datastream IDs and directory structure required by the batch loader used to ingest the content.

A good example is genereating MADS.xml metadata and accopanying thubnail images for people entities. Islandora Scholar provides a way to generate people entities from a CSV file, but this tool doesn't allow ingest of thumbnails, and Scholar's tool requires preparation of the input data in very specific ways that need to be cleaned up post-load. The technique described here offers more flexibility, but requires that the entities are ingested via a drush command (see below for more details).

The MADS template we provide to the Templated metadata parser can look like this:

<?xml version="1.0"?>
<mads xmlns="http://www.loc.gov/mads/v2" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mads="http://www.loc.gov/mads/v2">
  <authority>
    <name type="personal">
      <namePart type="given">{{ Given_name }}</namePart>
      <namePart type="family">{{ Gamily_name }}</namePart>
      <namePart type="date"/>
    </name>
    <titleInfo>
      <title>{{ Title }}</title>
    </titleInfo>
  </authority>
  <variant>
    <name>
      <namePart type="given"/>
      <namePart type="family"/>
    </name>
  </variant>
  <affiliation>
    <organization/>
    <position/>
    <email>{{ Email }}</email>
    <phone/>
    <dateValid point="start"/>
    <dateValid point="end"/>
  </affiliation>
  <fieldOfActivity/>
  <identifier type="u1">{{ ID }}</identifier>
  <note type="status"/>
  <note type="history"/>
  <note/>
  <note type="address"/>
  <url/>
</mads>

We will use this template with the CSV Single File toolchain, providing an input CSV file like this:

"ID","Family_name","Given_name","Title","Email","TN"
"mjordan","Jordan","Mark","Mark Jordan","mark@secretmail.ca","mjordan.png"
"sandgreen","Sandgreen","Margrethe","Margrethe Zeeb Sandgreen","marge@secretmail.ca","sandgreen.png"

The output will be one MADS file per CSV record, with each column's value replacing the corresponding variable in the template. MIK will also copy the file named in the TN column into the output directory. So we have our data.

The post-write hook script is what renames the output data and organizes it so we can load it into Islandora as entities.

Without the post-write hook script:

/tmp/issue_469_output/
├── mjordan.png
├── mjordan.xml
├── sandgreen.png
└── sandgreen.xml

With it:

/tmp/issue_469_output/
├── mjordan
│   ├── MADS.xml
│   └── TN.png
└── sandgreen
    ├── MADS.xml
    └── TN.png

The MIK configuration file that ties this all together is:

[SYSTEM]

[CONFIG]
config_id = MADS generation example
last_updated_on = "2018-04-21"
last_update_by = "Mark Jordan"

[FETCHER]
class = Csv
input_file = issue_469.csv
temp_directory = /tmp/issue_469_temp
record_key = ID

[METADATA_PARSER]
class = templated\Templated
template = MADS_template.xml

[FILE_GETTER]
class = CsvSingleFile
input_directory = /tmp/mads_thumbnails
temp_directory = /tmp/issue_469_temp
file_name_field = TN

[WRITER]
class = CsvSingleFile
preserve_content_filenames = false
output_directory = /tmp/issue_469_output
postwritehooks[] = "php extras/scripts/postwritehooks/make_package_subdirectories.php"

[MANIPULATORS]

[LOGGING]
path_to_log = /tmp/issue_469_output/mik.log
path_to_manipulator_log = /tmp/issue_469_output/manipulator.log

You can then batch load the entities using the Islandora Batch with Derivs module, which at this time only provides a command-line Drush interface.

Clone this wiki locally