Used for blackbox testing and data-ingestion procedures.
Make sure that your email server is NOT running because some of the endpoints that are used send emails to the input email addresses. For example, when using the endpoint for creating new registration data, an automatic function exists that sends emails, which we don't want because we use this endpoint for importing existing data.
-
Python 3.8+ (tested with 3.8.10 and 3.11)
-
Install CLARIN-DSpace7.*. (PostgreSQL, Solr, DSpace backend) 2.1. Clone python-api: https://github.com/ufal/dspace-python-api (branch
main) 2.2. Clone submodules:git submodule update --init libs/dspace-rest-python/ -
Install Python dependencies:
pip install -r requirements.txt pip install -r libs/dspace-rest-python/requirements.txt
-
Get database dump (old CLARIN-DSpace) and unzip it into
input/dumpdirectory indspace-python-apiproject. -
Prepare
dspace-python-apiproject for migration: copy the files used during migration intoinput/directory:
> ls -R ./input
input:
dump icon
input/dump:
clarin-dspace.sql clarin-utilities.sql
input/icon:
aca.png by.png gplv2.png mit.png ...
Note: input/icon/ contains license icons (PNG files).
-
Copy
assetstorefrom dspace5 to dspace7 (for bitstream import).assetstoreis in the folder where you have installed DSpacedspace/assetstore. -
Create
dspacedatabase with extensionpgcrypto. -
Go to the
dspace/binin DSpace 7 installation and run the commanddspace database migrate force(force because of local types). NOTE:dspace database migrate forcecreates default database data that may not be in the database dump, so after migration, some tables may have more data than the database dump. Data from the database dump that already exists in the database is not migrated. -
Create an admin by running the command
dspace create-administratorin thedspace/bin -
Create CLARIN-DSpace5.* databases (dspace, utilities) from dump. Run
scripts/start.local.dspace.db.bator usescripts/init.dspacedb5.shdirectly with your database.
-
Update
project_settings.py -
Make sure that handle prefixes are configured in the backend configuration (
dspace.cfg):
- Set your main handle prefix in
handle.prefix - Add all other handle prefixes to
handle.additional.prefixes - Note: The main prefix should NOT be included in
handle.additional.prefixes - Example:
handle.prefix = 123456789 handle.additional.prefixes = 11858, 11234, 11372, 11346, 20.500.12801, 20.500.12800
REQUIRED: Configure version date fields in project_settings.py for version migration. This configuration is mandatory and must be explicitly set.
Add the following to your project_settings.py:
"version_date_fields": ["dc.date.issued", "dc.date.accessioned", "dc.date.created"]- Purpose: When migrating item versions, the system needs a date field to set the version date
- Fallback mechanism: Fields are tried in order until one with a value is found
- Supported formats:
"dc.element.qualifier"(e.g.,"dc.date.issued")"dc.element"(e.g.,"dc.date")
- Error handling: If no configured field contains a date value for an item, that item's version migration is skipped with a critical error
"version_date_fields": ["dc.date.issued", "dc.date.accessioned"]- Import: Run command
cd ./src && python repo_import.py
- NOTE: database must be up to date (
dspace database migrate forcemust be called in thedspace/bin) - NOTE: dspace server must be running
- The values of table attributes that describe the last modification time of DSpace objects (for example attribute
last_modifiedin tableItem) have a value that represents the time when that object was migrated and not the value from the migrated database dump. - If you don't have valid and complete data, not all data will be imported.
- Check if license link contains XXX. This is of course unsuitable for production runs!
Use tools/repo_diff utility, see README.
The migration script supports testing functionality with empty tables to verify the import process without actual data.
Before using the --test option, you need to create the test JSON file:
-
Create the test JSON file: Create a file named
test.jsonin theinput/test/directory with the following content:null -
Configure the test settings: The test configuration is set in
src/project_settings.py:"input": { "test": os.path.join(_this_dir, "../input/test"), "test_json_filename": "test.json", }
You can change the
test_json_filenameto use a different filename if needed.
To run the migration with empty table testing, use the --test option followed by the table names you want to test with empty data.
cd ./src && python repo_import.py --test usermetadatascd ./src && python repo_import.py --test usermetadatas resourcepoliciesWhen the --test option is specified with table names:
- Instead of loading actual data from database exports, the system loads the configured test JSON file (default:
test.json) which containsnull - This simulates empty tables during the import process
- The migration logic is tested without requiring actual data
- The test JSON filename can be customized in
project_settings.pyunder"input"["test_json_filename"]