-
Notifications
You must be signed in to change notification settings - Fork 0
Imports Validation
The validation works comparing the metadata of the work in its source or original form, against its current or destination form.
It iterates over every metadata field and add diff entries for every difference it finds. We call them diff entries because they
can represent metadata additions (fields that are present in the destination metadata but not in the source), deletions
(present in the source buit not in the destination) or modifications of the value. These diffs are stored in the error_backtrace
column of the latest Bulkrax::Status
created for the Bulkrax::Entry
, as we consider that the validation is run against the
result of the last import attempt.
These diffs are then presented in the "Validation issues" panel of the importer entries pages. Please remember to enable "Bulkrax validations" on the tenant settings page to be able to see the validation results. In order to avoid making the customers afraid of seeing validations issues, you may want to switch it off again after checking that the import went OK.
The base class of validations is HykuAddons::Validations::EntryValidationService
at app/services/hyku_addons/validations/entry_validation_service.rb.
It takes an account and any subclass of Bulkrax::Entry
as params and we just invoke the validate
method to do the work.
This class serves as base to make more specific validators overwriting the excluded_fields
, renamed_fields
, the separator
and the data source of the metadata to compare.
The excluded fields are not going to trigger any validation issues. The excluded_fields_with_names
will do the same
but also check that the fields have the values passed in the hash to ignore them. The renamed fields are used to map "old" field names
to the ones defined in Hyku.
For example, CsvEntryValidationService
is a validation subclass that customises part of its behaviour (overwrites the
destination_metadata
method to get it from the export of the current work metadata instead of SolrService).
RedlandsEntryValidationService
subclasses CsvEntryValidationService
mentioned before to basicaly customise the excluded
and renamed fields.
SolrEntryValidationService
is a subclass (with a very bad name, it should be Hyku1MigrationValidationService)
fitted to make validations against migrations from Hyku 1 to 2, and hopefully Hyku 2 to 3 as well.
It allows connecting to a Blacklight endpoint as source metadata using HTTP Authentication or a valid cookie passed as
params of the constructor.
Apart of its definition of fields to rename and exclude, this files points shows how we can apply data transformations
to each of the metadata fields to ensure we make semantically valid comparisons. This means we need to ensure that,
for example, the value for creator and contributor fields are compared as hashes instead of strings, because
"{'a': 1, 'b': 2}"
is a different string than "{'b': 2, 'a': 1}"
even if they represent the same hash. The same happens
with fields that convert to different values in the destination, like the resource_type
, which change from "ArticleWork"
to just "Article" in Hyku2.
All these transformations are done defining methods with name reevaluate_#{field_name}
, like reevaluate_resource_type_tesim
or reevaluate_creator_tesim
. The validator will detect if any there is any method with name reevaluate_#{field_name}
before comparing the values of this field in the source and destination metadata.
All the valdiations are run using the rake tasks included at lib/tasks/imports.rake
. It is divided into two namespaces:
hyku:validations:importers
and hyku:validations:entries
where the former has tasks to validate Importers and the later
has tasks to validate a single entry. Both namespaces has tasks to launch HTTP, Cookies or CSV validations.
HTTP validations and Cookie validations are used for Hyku 1 to 2 migrations using HTTP Authentication or a valid Cookie on the blacklight endpoints.
The most versatile validation is ValidateCsvImporterEntryJob
which allows passing 3 arguments: An account, an entry and
a string with the validator class name that the job instantiates to run the validation. This allows running any validator
subclass directly from the rake tasks, using an invocation with the format of:
rake hyku:validations:importers:csv[tenant_uuid:entry_id:validator_class_name]
like:
rake hyku:validations:importers:csv[123abc:42:HykuAddons::Validations::RedlandsCsvValidationsService]
Which will run a RedlandsCsvValidationsService
against the Bulkrax::Entry
with id 42 included in the account with
tenant uuid 123abc