-
Notifications
You must be signed in to change notification settings - Fork 2
Tutorial
This tutorial guides you through using MIK to create a set of Islandora basic image objects from metadata in a CSV file. When you finish the tutorial, you will be able to import the objects into Islandora.
To complete the tutorial, you will need a computer that has MIK installed on it.
You will also need a text editor. Any decent editor will do (so, Windows Notepad is not a viable option). If you don't already have a text editor installed, check out Atom. It's free, it works on all major operating systems, and it's easy to use.
Finally, you will also need to know a little bit about Islandora. In particular, the tutorial below assumes that you know what Islandora objects are, and that you are familiar with some of the different types of Islandora objects, like Basic Image objects.
A zip file containing the sample images, metadata, and configuration files used in this tutorial can be downloaded here. It contains everything you need to create the Islandora import packages. When you unzip it, its contents should look like this:
To get ready to start the tutorial,
- unzip the file
- copy the files that aren't images (tutorial_config.ini, tutorial_mappings.csv, and tutorial_metadata.csv) into the same directory where MIK is installed, and
- edit tutorial_config.ini to define your input and output directories.
We will cover editing tutorial_config.ini file in detail in Step 3, below.
Note that Step 1 ("Create your metadata CSV file") and Step 2 ("Create your mappings file") have already been done for you. You don't even need to edit those two files in order to proceed with the tutorial. We include the steps here to represent a typical MIK workflow, and to provide an overview of how those two files are structured. In real life, if you weren't using prepackaged content like that included in this tutorial, you would need to complete those steps. In the Step 1 and Step 2 sections below, we'll describe what you would need to do if you were preparing your own content for use with MIK.
Even though you don't need to edit tutorial_metadata.csv to complete this tutorial, it will be useful to note a few things about the CSV metadata files that MIK can take as input"
- The first row of the CSV file must contain column labels/headings. These are the "fields" of the metadata that MIK will convert to MODS.
- All column headings must be unique, and the heading row cannot contain any empty cells.
- By default, fields are separated by a comma, and enclosed in double quotation marks. However, you can specify other delimiters and enclosure characters in the .ini file if you want.
- Each record in the CSV file corresponds to one Islandora object
- One of the fields must contains a unique identifier for each row in the file. This field must be named in the [FETCHER] section's "record_key" configuration setting.
- One of the fields contains the name of the file that is to be used in each of the created objects. This field must be named in the [FILE_GETTER] section's "file_name_field" configuration setting.
tutorial_metadata.csv illustrates these attributes of CSV metadata files:
Identifier,File,Title,Creator,Date taken,Subjects,Note
"image01","IMG_1410.JPG","Small boats in Havana Harbour","Jordan, Mark","2015-03-08","Boats; water","Taken on vacation in Cuba."
"image02","IMG_2549.JPG","Manhatten Island","Jordan, Mark","2015-09-13","Cityscapes","Taken from the ferry from downtown New York to Highlands, NJ. Weather was windy."
"image03","IMG_2940.JPG","Looking across Burrard Inlet","Jordan, Mark","2011-08-01",,"View from Deep Cove to Burnaby Mountain. Simon Fraser University is visible on the top of the mountain in the distance."
"image04","IMG_2958.JPG","Amsterdam waterfront","Jordan, Mark","2013-01-17",,"Amsterdam waterfront on an overcast day."
"image05","IMG_5083.JPG","Alcatraz Island","Jordan, Mark","2014-01-14","Alcatraz Federal Penitentiary; islands","Taken from Fisherman's Wharf, San Francisco."
You can prepare your CSV metadata files in any application that can save data in a standard CSV format.
The mapping file contains two columns - in fact, it is also a CSV file. The column on the left identifies the field names in the "source" metadata record, and the column on the right defines the "target" MODS XML snippet that takes the value of the corresponding source field. Some important things about the snippets:
- They must be well-formed XML (that is, opening and closing tags must match, and must follow rules defining XML attribute syntax). You can check the well formedness of your snippets by running the
./mik --config=foo.ini --checkconfig=snippets
command. This command does not validate your snippets against a schema. - They must include all XML from the first child of the root element down; that is, they are appended to the root element of the MODS XML.
- The first row of your mapping file should not contain any column headings.
- Snippets can contain the special
%value%
placeholder. MIK replaces this string is with the value of the source metadata field. For example, if your metadata has a Title field and its value is "Amsterdam waterfront" and Title is mapped to the MODS snippet<titleInfo><title>%value%</title></titleInfo>
, the resulting MODS markup will look like<titleInfo><title>Amsterdam waterfront</title></titleInfo>
.
Title,"<titleInfo><title>%value%</title></titleInfo>"
Creator,"<name type=""personal""><namePart>%value%</namePart><role><roleTerm type=""text"">photographer</roleTerm></role></name>"
Date taken,"<originInfo><dateCreated encoding=""w3cdtf"" keyDate=""yes"">%value%</dateCreated></originInfo>"
Subjects,"<subject><topic>%value%</topic></subject>"
Identifier,"<identifier type=""local"" displayLabel=""Local identifier"">%value%</identifier>"
Note, "<note>%value%</note>"
null0,"<genre authority=""marcgt"">picture</genre>"
null1,"<typeOfResource>still image</typeOfResource>"
null2,"<physicalDescription><digitalOrigin>born digital</digitalOrigin></physicalDescription>"
Time for you to start editing a file.
MIK uses a "toolchain", which is groups of MIK components that are brought together to convert a specific type of input (like CSV metadata) into a specific type of output (like import packages for Islandora Basic Image objects). A toolchain is defined in an MIK configuration file, also known as an .ini file since that's the format the files take. All the .ini file contains is groups of configuration settings for your toolchain. MIK configuration files can also contain some comment lines that begin with a semicolon (;
). These lines are ignored by MIK and really only function as inline documentation within the .ini file. You can also comment out a line to disable a configuration setting.
The .ini file below is the one that we'll be using in this tutorial. Even though this section is titled "Create and .ini file", you will only need to edit this one to run MIK, not create a new one. Specifically, you will need to change
- the path to your input directory,
- the path to your output directory,
- the path to your log file.
Different operating systems define paths differently. The .ini file below contains paths Linux paths, which look like this:
temp_directory = "/tmp/miktutorial_temp"
The values for the input_directory
, output_directory
, and path_to_log
settings will need to be compatible with your operating system. For example, on Windows, paths look like this:
temp_directory = "c:\temp\miktutorial_temp"
whereas on a Mac they look like this:
temp_directory = "/Users/mark/miktutorial_temp"
Here is the .ini file as it is provided in the tutorial sample data. Assuming that MIK is installed correctly on your computer, and that you have copied tutorial_config.ini, tutorial_mappings.csv, and tutorial_metadata.csv into the same directory where MIK is installed, you should be able to run MIK after you have updated tutorial_config.ini with your own paths.
; MIK configuration file for the MIK Tutorial.
[CONFIG]
config_id = MIK tutorial
last_updated_on = "2016-02-03"
last_update_by = "Mark Jordan"
[FETCHER]
class = Csv
input_file = "tutorial_metadata.csv"
temp_directory = "/tmp/miktutorial_temp"
record_key = Identifier
[METADATA_PARSER]
class = mods\CsvToMods
mapping_csv_path = "tutorial_mappings.csv"
[FILE_GETTER]
class = CsvSingleFile
input_directory = "/home/mark/Downloads/mik_tutorial_data"
temp_directory = "/tmp/miktutorial_temp"
file_name_field = File
[WRITER]
class = CsvSingleFile
preserve_content_filenames = true
output_directory = "/tmp/miktutorial_output"
; Note that you will need to adjust the path to your system's php executable.
postwritehooks[] = "/usr/bin/php extras/scripts/postwritehooks/validate_mods.php"
[MANIPULATORS]
metadatamanipulators[] = "FilterModsTopic|subject"
[LOGGING]
path_to_log = "/tmp/miktutorial_output/mik.log"
- Open tutorial_config.ini in your text editor.
- In the [FETCHER] section, modify the value of "temp_directory" so that....
- In the [FILE_GETTER] section, modify the value of "temp_directory" so that is has the same value as...
- In the [WRITER] section, modify the value of "output_directory" so that.....
- In the [LOGGING] section, modify the value of "path_to_log" so that ...
- Save your file in the same directory in the MIK installation directory.
Honestly, there's a lot that can go wrong here. MIK configuration files are a little complex.
While it is not absolutely necessary, you can (and in fact should) check your MIK configuration by running MIK and passing it the --checkconfig
option in addition to telling it which .ini file to use. Running MIK with these two options looks like this:
As you can see, MIK provides some simple feedback indicating whether it encountered any problems with your .ini file.
php mik --config=tutorial_config.ini --checkconfig=all
If your configuration check didn't reveal any problems, you are ready to run MIK and generate your Islandora import packages:
When MIK finishes, it tells you where the packages are, where the MIK log is, and how long MIK took to run:
If you look in the output directory indicated in MIK's message, you will see an XML file corresponding to each of the images that we started with:
This set of files is what you load into Islandora to create basic image objects. There is one thing you need to do before loading the files into Islandora, however: delete (or move) the log files (mik.log and problem_records.log) that MIK created. You don't want to load those into Islandora. After you delete or move the log files, the only things that should be in your output directory are image files, each with a corresponding XML file.
The XML files are MODS documents, which MIK has created from the original CSV metadata. Each MODS document describes on image.
php mik --config=tutorial_config.ini
It is very prudent to perform some quality assurance on the Islandora import packages before you import them into Islandora. At a minimum, you should:
- open the problem_records.log file in your text editor to see if anything appears in it (it is normal for there to be a single line that indicates when MIK ran, etc.)
- make sure there are no extra files or directories in the MIK output directory
- open a random sample of MODS XML files to make sure they look like they should, e.g., are all the fields that were included in your mappings file in the MODS?
- make sure your MODS documents are valid.
If your MIK configuration file included the line
postwritehooks[] = "/usr/bin/php extras/scripts/postwritehooks/validate_mods.php"
in its [WRITER] section (which it did unless you commented it out or removed it), MIK has already validated all of your MODS documents. Looking in the mik.log file will reveal whether any of the MODS files didn't validate.
The easiest way to load your content into Islandora is to zip it up and upload the resulting zip file using Islandora's web interface. When you create your zip file, make sure that your zip file contains only the output files and no subdirectories. Also note that you should move or delete all log files that MIK puts in the output directory before creating your zip file.
Once you have your zip file, you can upload it by going to your collection in Islandora, then go to the 'Manage' tab, and then to the Collection subtab, where you will find "Batch Import Objects" link. This is where you upload your zip file.
If you import the basic image objects created throughout this tutorial, they will look something like this:
Now that you can run MIK and you know how to use its output, you may want to try some of the following activities.
- Modify the values in the metadata file (but for now, keep the same column structure) and rerun MIK. Open the XML files in your text editor that MIK creates to see your values in the MODS.
- Add a new field to the mappings file that will have the same value for all objects. For example, add the following line to the end of the file:
null3,"<accessCondition type=""use and reproduction"">Images are in the public domain.</accessCondition>"
- Add a new column to the CSV metadata file and populate it with different values for each image. Then add a mapping for the new field using the special
%value%
token so that your MODS will use the value of the new field for each image. - Learn about some of MIK's plugins, called "manipulators". The Random Set manipulator is really useful for testing your configuration and your source content, and the Piratize Abstract manipulator converts abstracts in your MODS XML to pirate speak. Not that you'd really want to do that, but that manipulator illustrates a potentially useful feature of MIK (automate metadata enhancement) and, well, because pirate speak.