Skip to content

Bulkrax imports

Dan Kerchner edited this page Sep 16, 2024 · 4 revisions

Importing ETDs from ProQuest

TODO: Adapt instructions from https://github.com/gwu-libraries/scholarspace-hyrax/pull/555

Set up ProQuest S3 bucket mount

Create Bulkrax manifest

Run the ingest_bulkrax_prep rake task either inside the container or from the outside using docker exec. The task requires an argument, which is the path to the directory containing the ProQuest zips you wish to include in the ingest. For example, bundle exec rails gwss:ingest_pq_etds['/opt/scholarspace/scholarspace-ingest/etd-zips'] if etds are in /opt/scholarspace/scholarspace-ingest/etd-zips.

The Bulkrax manifest will be written in a bulkrax_zips directory, inside the directory corresponding to the value of the TEMP_FILE_BASE environment variable (typically set in .env). The manifest contains:

  • a metadata.csv Bulkrax-compliant manifest file
  • a files directory, containing a directory for each ETD zip, which itself contains:
    • the ProQuest XML file
    • the main ETD PDF
    • optionally, a folder containing additional attachments for the ETD

Import the Bulkrax manifest

Within the GW ScholarSpace web application, log in as an administrative user. On the Dashboard, click on Importers. Create a New importer with the following values:

  • Name = any name
  • Administrative Set = ETDs
  • Frequency = Once (on save)
  • Limit = leave blank
  • Parser = CSV - Comma Separated Values
  • Visibility = Public
  • Rights Statement = leave blank
  • Add CSV File to Import: Specify a Path on the Server. Import file path = {TEMP_FILE_BASE}/bulkrax_zip/metadata.csv
  • Before starting the import, open a tab to the Sidekiq administrator (at /sidekiq) so that you can watch progress of the queues and monitor for any problems.

Then proceed and click Create and Import.

*If you wish to re-run the task to generate the bulkrax-ready metadata and files, then you'll need to first clear out the results of the previous run: rm -r {TEMP_FILE_BASE}/bulkrax_zip

Importing works in general

TODO

Troubleshooting FAQ

Q: When I create an importer, the administrative set that I wish to import to isn't showing up in the dropdown list.

A: This can occur when your user has the admin role and can therefore access /importers but does not have the contentadmin role; contentadmins can import to any admin set. Try adding the contentadmin role to your administrative user.