Skip to content

Commit

Permalink
Migration Improvements (#10)
Browse files Browse the repository at this point in the history
* Mapping DSIDs to Media Use terms

* Adding in PDFA

* Adding linked agents.  Pulling from solr

* Does basic and large image, audio, video, and pdf from 7.x.  Extracts metadata and linked agents from MODS.

* Documentation

* Adding images for README

* Update README.md

* Renaming some images

* Pointing to new images in README?

* Add files via upload

* Add files via upload

* Add files via upload

* Update README.md

* Updating images

* Adding sample objects

* Pointing at basic image solution pack by default

* Update README.md

* Updating pictures

* Now working with AUDIT datastreams

* Adding PID to field_identifier

* Migrating PID field separately
  • Loading branch information
dannylamb authored and whikloj committed Dec 21, 2018
1 parent a1f255b commit 052e2cc
Show file tree
Hide file tree
Showing 37 changed files with 1,083 additions and 239 deletions.
166 changes: 112 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,139 @@
## Introduction
This module contains plugins and some example migrations to import data from a Fedora 3 Islandora instance
into an Islandora CLAW instance.

This is a base setup, it requires adjustments to the default Repository Object and configuration changes
for your setup.

## Required changes
The default Repository Object provided with Islandora CLAW requires one additional field to allow for
these migrations (or you can comment out these field migrations).

1. A large text field called `field_mods_text`, this will store the MODS datastream from the source object.

This is defined in the `config/install/migrate_plus.migration.islandora_basic_image.yml` and
can be commented out there.

## Example usage
To use this migration, clone this repo into your Drupal 8 instance `modules/contrib` directory.

DO NOT INSTALL THE MODULE YET!!!

You will need to edit the 3 `migrate_plus.migration.islandora_basic_image*` files in the `config/install` directory.

At a minimum you'll need to set:
1. `solr_base_url: http://10.0.2.2:9080/solr` to your Solr instance
1. `fedora_base_url: &fedora_base_url http://10.0.2.2:9080/fedora` to your Fedora, please leave the `&fedora_base_url`
this is a placeholder and saves re-typing this value in other locations.
1. The `username` and `password` in the block
This module contains plugins to import data from a Fedora 3 Islandora instance
into an Islandora CLAW instance. It also contains a feature as a submodule
that contains some example migrations. The example migrations are based on forms from vanilla Islandora 7.x solution
packs, and are meant to work with the fields defined in `islandora_demo`. If you customized your MODS forms, then you
will also need to customize the example migration and `islandora_demo`.

Currently, the following content models can be migrated over with full functionality:

- Collection
- Basic Image
- Large Image
- Audio
- Video
- PDF
- Binary

If you want some sample Basic Image objects with metadata made from stock forms, check out [this zip
file](docs/examples/sample_objects.zip) that you can use with `islandora_zip_batch_importer`. All the images were
obtained from [Pexels](https://www.pexels.com/) and are free to use for personal or business purposes, with the
original photographers attributed in the MODS.

## Installation

Download this module, its feature, and its dependencies with composer

```
composer require islandora/migrate_7x_claw
```

Install the module and example migrations at the same time using drush

```
drush en islandora_migrate_7x_claw_feature
```

## Configuration

By default, the migrations are configured to work with an `islandora_vagrant` instance running on the same host as a
`claw-playbook` instance, which is convienent for development and testing. But for your Islandora 7.x instance, the
following config will need to be set the same way on the source plugin of each migration (except for the
"7.x Tags Migration from CSV" migration):

- `solr_base_url` should point to your Islandora 7.x Solr instance (i.e. `http://example.org:8080/solr`)
- `fedora_base_url` should point to your Fedora 3 instance (i.e. `http://example.org:8080/fedora`)
- The `username` and `password` for your Fedora 3 instance in the block
```
authentication: &fedora_auth
plugin: basic
username: fedoraAdmin
password: fedoraAdmin
```
- `q` is used to define a Solr query that selects which objects get migrated. From a fresh clone, the
migrations are configured to look for `islandora:sp_basic_image_collection` and all its children with the following query:
```
RELS_EXT_isMemberOfCollection_uri_ms:"info:fedora/islandora:sp_basic_image_collection" OR PID:"islandora:sp_basic_image_collection"
```
You can easily import a collection of your own by changing the PID in the above query, or you can provide your own
query to migrate over objects in other ways (such as per content model, in order by date created, etc...). If you can write a Solr select query for it, you can migrate it into CLAW. Omitting `q` from configuration will default to `*:*`
for the Solr query.

Once you've updated the configuration, you need to re-import the feature to load your changes. You can do this with `drush`:
```
drush -y fim islandora_migrate_7x_claw_feature
```

You can also use the UI to import the feature if you go to `admin/config/development/features` and click on the `Changed` link next to "Migrate 7x Claw Feature".

![Changed Link](docs/images/feature_click_changed.png)

You may also need (or want) to alter the content model field name in Solr.
`content_model_field: RELS_EXT_hasModel_uri_ms`
and the content model to migrate.
`content_model: islandora:sp_basic_image`
From there, you can select all changes and clicking "Import Changes"

These changes need to be made in all 3 migration configuration files.
![Import Changes](docs/images/feature_import_changes.png)

Now you can install the `migrate_7x_claw` module.
## Running the migrations

If you have installed the `migrate_ui` module you can review the process in the `Admin -> Structure -> Migrations`.
You can quickly run all migrations using `drush`:
```
drush -y mim --group islandora_7x
```

You can then see.
![List of Migrations](docs/images/migrations.jpg)
If you want to go through the UI, you can visit `admin/structure/migrate` to see a list of migration groups. The migrations provided by this module have the machine name `islandora_7x`.

If you click **List Migrations** you will see 3 migrations.
![Migrations Groups](docs/images/migrate_groups.png)

![Migration](docs/images/migrate1.jpg)
You will see 8 migrations. _The "7.x Tags Migration from CSV" needs to be run first_.

The _Basic Image Objects OBJ Media_ migration requires the other two be completed first, if you try to run this one it
will run the other two first.
![Migrations](docs/images/migrations.png)

Clicking **Execute** on the _Basic Image Objects_ displays a page like.
Clicking **Execute** on "7.x Tags Migration from CSV" migration displays a page like

![Migration Execute](docs/images/migrate2.jpg)
![Execute Migration](docs/images/execute_migration.png)

The operations you can run are
* **Import** - import the objects
The operations you can run for a migration are
* **Import** - import un-migrated objects (check the "Update" checkbox to re-run previously migrated objects)
* **Rollback** - delete all the objects (if any) previously imported
* **Stop** - stop a long running import.
* **Reset** - reset an import that might have failed.

With _Import_ selected press **Execute**.
If you select "Import", and then click "Execute", it will run the migration. It should process 5 items.

Then you can run the "Islandora Media" migration, which depends on the remaining migrations. Running it effectively
runs the entire group of migrations other than the "7.x Tags Migration from CSV" migration. After they're all done,
you should be able to navigate to the home page of your CLAW instance and see your content brought over from
Islandora 7.x!

When complete, you should see something like below (your number will be different).
![Content in CLAW](docs/images/content_in_claw.png)

![Migration result](docs/images/migrate_result1.jpg)
If you click on any node you should see all its metadata, which has been extracted from its MODS and Solr documents.
Here's the original object in Islandora 7.x:

Once you have completed all 3
![Free Smells in 7x](docs/images/free_smells_in_7x.png)

And here it is in Islandora CLAW:

![Free Smells in CLAW](docs/images/free_smells_in_claw.png)

Clicking on the Media tab will reveal all of the datastreams migrated over from 7.x, which you can now manage through CLAW. Here's the original datastreams in Islandora 7.x:

![Free Smells Datastreams](docs/images/free_smells_datastreams.png)

And here they are in Islandora CLAW as Media:

![Free Smells Media](docs/images/free_smells_media.png)

You can also check out the collection itself, which should have its "Members" block populated:

![Collection in CLAW](docs/images/collection_in_claw.png)

## How this migration works
To allow for the magic Danny content modelling overhaul.
You provide a query, as `q` in the source plugin configuration, that defines which objects get migrated. For each
result in the query, you can choose to use either the Solr doc for an object, the FOXML file for an object, or
a particular datastream for an object by setting the `url_type` configuration. The migrations for subjects, geographics, and agents all target the MODS file of an object. The migration for datastreams uses FOXML, and the migration for the objects themselves use the Solr doc.

All datastreams are migrated over as-is, regardless of what data is extracted by the migrations and applied as fields.

Collection hierarchy is preserved so long as all the collections are in the `q` query results.

1. The migration searches Solr for all of the content models specified.
1. Each is migrated to a new node in Drupal.
Then it creates a file for the OBJ datastream
of each of these objects. Lastly it creates a media object that links the file to the node.
Subject, geographic, and person/corporate agents from MODS all get transformed into taxonomy terms, and content
is tagged with these terms.
Binary file added docs/examples/sample_objects.zip
Binary file not shown.
Binary file added docs/images/collection_in_claw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/content_in_claw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/execute_migration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feature_click_changed.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feature_import_changes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/free_smells_datastreams.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/free_smells_in_7x.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/free_smells_in_claw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/free_smells_media.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/list_all_migration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/migrate1.jpg
Binary file not shown.
Binary file removed docs/images/migrate2.jpg
Binary file not shown.
Binary file added docs/images/migrate_groups.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/migrate_result1.jpg
Binary file not shown.
Binary file removed docs/images/migrations.jpg
Binary file not shown.
Binary file added docs/images/migrations.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions migrate/tags.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ islandora_media_use,"RELS-EXT File","A RELS-EXT file from an Islandora 7.x insta
islandora_media_use,"Dublin Core File","Dublin Core Elements 1.1",http://purl.org/dc/elements/1.1
islandora_media_use,"MODS File","Metadata Object Description Schema",http://www.loc.gov/mods/v3
islandora_media_use,"FITS File","Technical metadata generated by FITS",http://hul.harvard.edu/ois/xml/ns/fits/fits_output
islandora_media_use,"Audit Trail","Audit trail generated by Fedora 3",http://islandora.ca/audit-trail
islandora_media_use,"Collection Policy","Islandora 7.x Collection Policy File",http://islandora.ca/collection-policy
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ dependencies:
enforced:
module:
- islandora_migrate_7x_claw_feature
id: islandora_basic_image_tags
id: islandora_7x_tags
class: null
field_plugin_method: null
cck_plugin_method: null
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
langcode: en
status: true
dependencies:
enforced:
module:
- migrate_7x_claw
- migrate_plus
- islandora
id: islandora_audit_file
class: null
field_plugin_method: null
cck_plugin_method: null
migration_tags: null
migration_group: islandora_7x
label: 'AUDIT File'
source:
plugin: islandora
solr_base_url: 'http://97.107.189.65:8080/solr'
q: 'RELS_EXT_isMemberOfCollection_uri_ms:"info:fedora/islandora:sp_basic_image_collection" OR PID:"islandora:sp_basic_image_collection"'
fedora_base_url: 'http://97.107.189.65:8080/fedora'
data_fetcher_plugin: http
authentication:
plugin: basic
username: fedoraAdmin
password: fedoraAdmin
data_parser_plugin: authenticated_xml
item_selector: '/foxml:digitalObject'
constants:
destination_directory: 'fedora://masters'
mimetype: application/xml
extension: xml
dsid: AUDIT
creator_uid: 1
fields:
-
name: PID
label: PID
selector: '@PID'
-
name: audit_ds
label: Audit Datastream
selector: 'foxml:datastream[@ID = "AUDIT"]/foxml:datastreamVersion/foxml:xmlContent/*'
ids:
PID:
type: string
process:
digital_id:
-
plugin: concat
delimiter: _
source:
- PID
- constants/dsid
-
plugin: str_replace
search: ':'
replace: _
filemime: constants/mimetype
uid: constants/creator_uid
filename:
plugin: concat
delimiter: .
source:
- '@digital_id'
- constants/extension
destination:
plugin: concat
delimiter: /
source:
- constants/destination_directory
- '@filename'
uri:
-
plugin: flatten
source:
- '@destination'
- audit_ds
-
plugin: file_blob
destination:
plugin: 'entity:file'
default_bundle: file
migration_dependencies:
required: { }
optional: { }
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
langcode: en
status: true
dependencies:
enforced:
module:
- migrate_7x_claw
- migrate_plus
- islandora
id: islandora_audit_media
class: null
field_plugin_method: null
cck_plugin_method: null
migration_tags: null
migration_group: islandora_7x
label: 'AUDIT Media'
source:
plugin: islandora
solr_base_url: 'http://97.107.189.65:8080/solr'
fedora_base_url: 'http://97.107.189.65:8080/fedora'
data_fetcher_plugin: http
authentication:
plugin: basic
username: fedoraAdmin
password: fedoraAdmin
content_model_field: RELS_EXT_hasModel_uri_ms
content_model: 'islandora:sp_basic_image'
data_parser_plugin: authenticated_xml
item_selector: '/foxml:digitalObject'
constants:
destination_directory: 'fedora://masters'
mimetype: application/xml
extension: xml
dsid: AUDIT
fedora_base_url: 'http://97.107.189.65:8080/fedora'
creator_uid: 1
audit_url: http://islandora.ca/audit-trail
fields:
-
name: PID
label: PID
selector: '@PID'
ids:
PID:
type: string
process:
digital_id:
-
plugin: concat
delimiter: _
source:
- PID
- constants/dsid
-
plugin: str_replace
search: ':'
replace: _
name:
plugin: concat
delimiter: .
source:
- '@digital_id'
- constants/extension
field_media_use:
plugin: migration_lookup
migration: islandora_7x_tags
source: constants/audit_url
no_stub: true
field_media_file:
plugin: migration_lookup
migration: islandora_audit_file
source: PID
no_stub: true
field_media_of:
plugin: migration_lookup
migration: islandora_objects
source: PID
no_stub: true
uid: constants/creator_uid
destination:
plugin: 'entity:media'
default_bundle: file
migration_dependencies:
required:
- migrate_plus.migration.islandora_objects
- migrate_plus.migration.islandora_audit_file
- migrate_plus.migration.islandora_7x_tags
optional: { }
Loading

0 comments on commit 052e2cc

Please sign in to comment.