bundle
yarn
./bin/dev
The following command will start a local Solr instance at localhost:8983
, with a pre-loaded core named blacklight-core
.
docker compose up
Data for the solr and redis services are persisted using docker named volumes. You can see what volumes are currently present with:
docker volume ls
If you want to remove a volume (e.g. to start with a fresh database or solr core), you can run:
docker volume rm stanford-arclight_solr-data # to remove the solr data
You can load fixture data locally:
rake seed
This command will loop through all the directories under spec/fixtures/ead
, for example spec/fixtures/ead/ars
and spec/fixtures/ead/uarc
, and index all the .xml files present. The names of these subdirectories must correspond with a top-level key in the repositories.yml
file. For example, uarc
is a top-level key in respositories.yml
, as well as the title of a subdirectory under spec/fixtures/ead
. A mis-match will cause indexing issues.
The easiest way to load data other than the fixtures is to use the DownloadEadJob
and/or the IndexEadJob
. See below for instructions about how to use Sidekiq to run these jobs in development. Under most circumstances it's fine to use the default :async
adapter to run these jobs without Sidekiq in development.
By default the DownloadEadJob
will store EAD files in the directory set in ./config/settings.yml
as Settings.data_dir
. You can choose a different location by setting the DATA_DIR
environment variable, passing data_dir:
argument to the job method, or by setting a different location for data_dir
in ./config/settings.local.yml
The DownloadEadJob
will attempt to use the ASpace API to download EADs. You will need to configure the API URL with username and password in order to connect to ASpace. To do this you will need to add the following to config/settings.local.yml
with the correct URL, port, and account information:
aspace:
url: "http://USERNAME:PASSWORD@ARCHIVESPACE_URL:PORT"
Important Note: ArcLight core includes a number of rake tasks for loading data into Solr, such rake arclight:index
, rake arclight:index_dir
, rake arclight:index_url
, and rake arclight:index_url_batch
. Using these rake tasks will use the default Traject indexing rules from ArcLight core only and WILL NOT apply any of the local Traject indexing rules. It's important to use either the local app's IndexEadJob
or the Traject command (REPOSITORY_ID={REPO_ID} bundle exec traject -u {SOLR_URL} -i xml -c ./lib/traject/sul_config.rb {FILE_PATH}
) to index data that will work correctly with stanford-arclight.
By default in development Rails will run the DownloadEadJob
and IndexEadJob
jobs with the :async
adapter. If you prefer to run these jobs in the background you can use Sidekiq.
- In
config/environments/development.rb
, add the line:config.active_job.queue_adapter = :sidekiq
- Make sure Redis and Solr are running. The included Docker enviroment will start both Redis and Solr for you.
- Start Sidekiq:
bundle exec sidekiq
- Run a job. For example, to download and index all the
ars
(Archive of Recorded Sound) collections updated after March 1, 2024, run:
bin/rails runner 'DownloadEadJob.enqueue_one_by(aspace_repository_code: "ars", updated_after: "2024-03-01")'
- You can monitor job progress in the Sidekiq admin UI, which is available at:
http://localhost:3000/sidekiq
There is a rake task for deleting a single collection and all of its components from the Solr index.
- Find the Solr document id for the collection (which is a form of the EAD ID)
- Run the rake task:
# Some shells (such as zsh) require that the brackets are escaped.
bundle exec rake stanford_arclight:delete_by_id\['ars0167'\]
- Enter YES at the prompt to delete the collection and its components.
Finding aid PDFs can be automatically generated from EAD XML. The following are needed:
- Saxon
- Apache FOP
- Java
Paths to those tools must be configured in ./config/settings.yml
.
Settings.pdf_generation.fop_path
to specify the path to the fop executableSettings.pdf_generation.saxon_path
to specify the path to the saxon jar
The path to the referenced fonts must be set in config/pdf_generation/fop-config.xml
. They are not bundled in this repository. They can be found in ArchivesSpace.
PDFs can be automatically generated as part of DownloadEadJob
by setting Settings.pdf_generation.create_on_ead_download
.
The GeneratePdfJob
can be used to generate PDFs not created automatically via DownloadEadJob
.
For example, the following generates all missing PDFs but does not regenerate existing PDFs:
bin/rails runner 'GeneratePdfJob.enqueue_all'