Stanford Arclight Demo

Starting the development server

bundle
yarn
./bin/dev

Starting Solr for development

The following command will start a local Solr instance at localhost:8983, with a pre-loaded core named blacklight-core.

docker compose up

Managing data

Data for the solr and redis services are persisted using docker named volumes. You can see what volumes are currently present with:

docker volume ls

If you want to remove a volume (e.g. to start with a fresh database or solr core), you can run:

docker volume rm stanford-arclight_solr-data   # to remove the solr data

Working with data in development

Fixture data (also used by the test suite)

You can load fixture data locally:

rake seed

This command will loop through all the directories under spec/fixtures/ead, for example spec/fixtures/ead/ars and spec/fixtures/ead/uarc, and index all the .xml files present. The names of these subdirectories must correspond with a top-level key in the repositories.yml file. For example, uarc is a top-level key in respositories.yml, as well as the title of a subdirectory under spec/fixtures/ead. A mis-match will cause indexing issues.

Loading more data

The easiest way to load data other than the fixtures is to use the DownloadEadJob and/or the IndexEadJob. See below for instructions about how to use Sidekiq to run these jobs in development. Under most circumstances it's fine to use the default :async adapter to run these jobs without Sidekiq in development.

By default the DownloadEadJob will store EAD files in the directory set in ./config/settings.yml as Settings.data_dir. You can choose a different location by setting the DATA_DIR environment variable, passing data_dir: argument to the job method, or by setting a different location for data_dir in ./config/settings.local.yml

The DownloadEadJob will attempt to use the ASpace API to download EADs. You will need to configure the API URL with username and password in order to connect to ASpace. To do this you will need to add the following to config/settings.local.yml with the correct URL, port, and account information:

aspace:
  url: "http://USERNAME:PASSWORD@ARCHIVESPACE_URL:PORT"

Important Note: ArcLight core includes a number of rake tasks for loading data into Solr, such rake arclight:index, rake arclight:index_dir, rake arclight:index_url, and rake arclight:index_url_batch. Using these rake tasks will use the default Traject indexing rules from ArcLight core only and WILL NOT apply any of the local Traject indexing rules. It's important to use either the local app's IndexEadJob or the Traject command (REPOSITORY_ID={REPO_ID} bundle exec traject -u {SOLR_URL} -i xml -c ./lib/traject/sul_config.rb {FILE_PATH}) to index data that will work correctly with stanford-arclight.

Using Sidekiq for development

By default in development Rails will run the DownloadEadJob and IndexEadJob jobs with the :async adapter. If you prefer to run these jobs in the background you can use Sidekiq.

Steps to enable Sidekiq

In config/environments/development.rb, add the line: config.active_job.queue_adapter = :sidekiq
Make sure Redis and Solr are running. The included Docker enviroment will start both Redis and Solr for you.
Start Sidekiq:

bundle exec sidekiq

Run a job. For example, to download and index all the ars (Archive of Recorded Sound) collections updated after March 1, 2024, run:

bin/rails runner 'DownloadEadJob.enqueue_one_by(aspace_repository_code: "ars", updated_after: "2024-03-01")'

You can monitor job progress in the Sidekiq admin UI, which is available at: http://localhost:3000/sidekiq

Deleting a collection

There is a rake task for deleting a single collection and all of its components from the Solr index.

Find the Solr document id for the collection (which is a form of the EAD ID)
Run the rake task:

# Some shells (such as zsh) require that the brackets are escaped.
bundle exec rake stanford_arclight:delete_by_id\['ars0167'\]

Enter YES at the prompt to delete the collection and its components.

PDF Generation

Requirements

Finding aid PDFs can be automatically generated from EAD XML. The following are needed:

Configuration

Paths to those tools must be configured in ./config/settings.yml.

Settings.pdf_generation.fop_path to specify the path to the fop executable
Settings.pdf_generation.saxon_path to specify the path to the saxon jar

The path to the referenced fonts must be set in config/pdf_generation/fop-config.xml. They are not bundled in this repository. They can be found in ArchivesSpace.

PDFs can be automatically generated as part of DownloadEadJob by setting Settings.pdf_generation.create_on_ead_download.

Running a PDF Generation Job

The GeneratePdfJob can be used to generate PDFs not created automatically via DownloadEadJob.

For example, the following generates all missing PDFs but does not regenerate existing PDFs:

bin/rails runner 'GeneratePdfJob.enqueue_all'

Name		Name	Last commit message	Last commit date
Latest commit History 1,303 Commits
.github		.github
app		app
bin		bin
config		config
db		db
lib		lib
log		log
public		public
solr/conf		solr/conf
spec		spec
storage		storage
tmp		tmp
vendor		vendor
.gitattributes		.gitattributes
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.rubocop_todo.yml		.rubocop_todo.yml
.solr_wrapper.yml		.solr_wrapper.yml
Capfile		Capfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
Procfile.dev		Procfile.dev
README.md		README.md
Rakefile		Rakefile
compose.yaml		compose.yaml
config.ru		config.ru
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stanford Arclight Demo

Starting the development server

Starting Solr for development

Managing data

Working with data in development

Fixture data (also used by the test suite)

Loading more data

Using Sidekiq for development

Steps to enable Sidekiq

Deleting a collection

PDF Generation

Requirements

Configuration

Running a PDF Generation Job

About

Releases 102

Packages

Contributors 14

Languages

License

sul-dlss/stanford-arclight

Folders and files

Latest commit

History

Repository files navigation

Stanford Arclight Demo

Starting the development server

Starting Solr for development

Managing data

Working with data in development

Fixture data (also used by the test suite)

Loading more data

Using Sidekiq for development

Steps to enable Sidekiq

Deleting a collection

PDF Generation

Requirements

Configuration

Running a PDF Generation Job

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 102

Packages 0

Contributors 14

Languages

Packages