Fedora
is nowRepo
FedoraResource
is nowRepoResource
In the repo-php-util configuration was stored in a dedicated singleton class which had to be initialized before the Fedora
class constructor was called:
use acdhOeaw\util\RepoConfig;
use acdhOeaw\fedora\Fedora;
RepoConfig::init('path/to/config.ini');
$fedora = new Fedora();
Now the Repo
class constructor explicitely takes all the required configuration data (see the API documentation).
It allows to instantiate many Repo
objects using different configurations which was impossible with the repo-php-util singleton design.
To allow a straightforward Repo
object creation a static method Repo::factory()
is provided which calls the Repo
class constructor
with a configuration extracted from a given configuration file (which is now a YAML file):
use acdhOeaw\arche\lib\Repo;
$repo = Repo::factory('path\to\config.yaml');
In the repo-php-util FedoraResource
objects where created using Fedora::getResourceByUri()
method:
use acdhOeaw\util\RepoConfig;
use acdhOeaw\fedora\Fedora;
RepoConfig::init('path/to/config.ini');
$fedora = new Fedora();
$res = $fedora->getResourceByUri('https://resource.url');
Now you simply call the RepoResource
object constructor:
use acdhOeaw\arche\lib\Repo;
use acdhOeaw\arche\lib\RepoResource;
$repo = Repo::factory('path/to/config.yaml');
$res = new RepoResource('https://resource.url', $repo);
There are two important changes in regard to metadata access:
- new metadata getters and setters;
- new metadata fetch modes.
RepoResource::getMetadata() vs RepoResource::getGraph() and RepoResource::setMetadata() vs RepoResource::setGraph()
In the repo-php-util FedoraResource::getMetadata()
and FedoraResource::setMetadata()
methods always created a deep copy of returned/taken metadata objects, e.g.:
use acdhOeaw\arche\lib\Repo;
use acdhOeaw\arche\lib\RepoResource;
$repo = Repo::factory('path/to/config.yaml');
$res = new RepoResource('https://resource.url', $repo);
$meta1 = $res->getMetadata();
$meta1->addLiteral('https://my.property', 'my value');
$meta2 = $res->getMetadata();
echo (int) ($meta1->getGraph->serialise('ntriples') === $meta2->getGraph->serialise('ntriples'));
// displays 0 because $meta1 contains the additional triple, $meta2 does not
This approach is safe and protects you from shooting your own foot but it leads to quite a lot data copying.
If you know you will use the metadata read only (or you are aware what you are doing) you can avoid this overhead by returning/passing references to metadata objects.
This is what RepoResource::getGraph()
and RepoResource::setGraph()
methods are meant for, e.g.:
use acdhOeaw\arche\lib\RepoResource;
// initialization code skipped
$res = $repo->getResourceByUrl('https://very.large/collection/url');
$res->loadMetadata();
$meta1 = $res->getGraph();
$meta1->addLiteral('https://my.property', 'my value');
$meta2 = $res->getMetadata();
echo (int) ($meta1->getGraph->serialise('ntriples') === $meta2->getGraph->serialise('ntriples'));
// displays 1, also $res->getGraph() is much faster than $res->getMetadata()
Another important use case for the RepoResource::getGraph()
and RepoResource::setGraph()
is getting/setting
metadata broader then triples having the resource as a subject, e.g. getting all the data fetched in broad metadata fetch modes (see below)
or setting search results including connected resources metadata.
The RepoResource::getMetadata()
doesn't take the $force
parameter any longer.
Use the RepoResource::loadMetadata()
method to reload the metadata.
The new repository solution offers many metadata fetch modes:
- resource - same as in repo-php-util - only resource metadata are returned;
- neighbors - metadata of the resource and all resources pointed to by the resource metadata are returned (convenient e.g. when you want to display a particular resource view);
- relatives - metadata of the resource and all resources pointed recursively (in any direction) by a given metadata property are returned (convenient e.g. when you want to display a whole collection tree).
To make it possible to select the fetch mode the RepoResource::loadMetadata(bool $force, string $mode = RepoResource::META_NEIGBOURS, string $parentProperty = null)
method has been introduced.
Also Repo::getResourcesBy...()
methods take the $mode
and $parentProperty
parameters allowing to specify a desired metadata fetch mode.
You can use RepoResource::META_RESOURCE
, RepoResource::META_NEIGBOURS
and ``RepoResource::META_RELATIVES` constants to denote the desired metadata mode, e.g.:
use acdhOeaw\arche\lib\Repo;
use acdhOeaw\arche\lib\RepoResource;
$repo = Repo::factory('path/to/config.yaml');
$res = new RepoResource('https://resource.url', $repo);
$res->loadMetadata(true, RepoResource::META_NEIGBOURS);
$meta = $res->getGraph();
$authorName = $meta->getResource('https://author.property')->getLiteral('https://name.property')->getValue();
// it's worth to mention that it won't work with:
$meta = $res->getMetadata();
$authorName = $meta->getResource('https://author.property')->getLiteral('https://name.property')->getValue();
If you use the the RepoResource::getMetadata()
or the RepoResource::getGraph()
on an object without metadata,
they will call the RepoResource::loadMetadata()
with a default parameter values (meaning obtaining metadata in the RepoResource::META_RESOURCE
mode).
On one hand the search API has been simplified to only two methods:
Repo::getResourcesBySearchTerms()
Repo::getResourcesBySqlQuery()
On the other hand many new features have been introduced:
- various metadata fetch modes (see above)
- paging
- full text search highlighting
The Repo::getResourcesBySqlQuery(string $query, array $parameters, SearchConfig $config)
allows you to execute parameterized SQL queries.
Parameters are denoted in the query with the ?
sign and substituted based on the order.
The SQL query must return an id column with repository resource identifiers matching the search. Fetching other columns with the query is useless, they will be discarded anyway.
The database structure is as follows:
- resources:
- id - resource primary identifier
- transaction_id - id of the transaction having lock on a given resource (null if not locked by any transaction)
- state - state of a given resource -
active
,tombstone
ordeleted
(resources are kept indeleted
state until the end of transaction, then they are removed for sure)
- identifiers - stores identifiers:
- id - resource primary identifier
- ids - secondary identifier value
- relations - stores RDF triples pointing to other resources:
- id - resource primary identifier
- target_id - target resource identifier
- property - RDF property of the triple
- metadata - stores all literal RDF triples:
- mid - row table primary identifier
- id - resource primary identifier
- property - RDF property of the triple
- type - RDF triple value type
- lang - RDF triple value language
- value_n - triple value casted to a number (to allow proper comparisons)
- value_t - triple value casted to a timestamp (to allow proper comparisons)
- value - triple value as a string
- full_text_search - search full text search indices:
- ftsid - row table primary identifier
- id - resource primary identifier (only for binary content)
- mid - metadata row identifier - foreign key to the metadata table (only for non-binary content)
- segments - segmentized and indexed content of the metatadata triple / resource binary content
- raw - string value of the metatadata triple / resource binary content (required for highlighting)
Advanced search options are controlled by the SearchConfig
object (see below).
The Repo::getResourcesBySearchTerms()
method allows to perform search without constructing an SQL query.
Search criteria are described by SearchTerm
objects. A repository must match all criteria to be included in the search results.
Every SearchTerm
object can describe any combination of an RDF subject, property, value, type and language as well as an operator used to compare the value, e.g.
- To search for resources having given value of a given triple
new SearchTerm('https://my.property', 'desired value', '=')
- To limit to particular language
new SearchTerm('https://my.property', 'desired value', '=', null, 'en')
- To limit to particular language
- To search for resources having any value of a given triple
new SearchTerm('https://my.property')
- To search for resources having any triple with a given value
new SearchTerm(null, 'desired value', '=')
- To search for resources having given triple with a value greater then a give one
new SearchTerm('https://my.property', 'desired value', '>=')
- To make sure numbers and dates are compared properly it's better to explicitely provide a type
new SearchTerm('https://my.property', 10, '>=', \zozlak\RdfConstants::XSD_DECIMAL)
ornew SearchTerm('https://my.property', '2019-01-01', '>=', \zozlak\RdfConstants::XSD_DATE)
- To make sure numbers and dates are compared properly it's better to explicitely provide a type
- To perform a regex search on a given property
new SearchTerm('https://my.property', '[a-z]+', '~')
- To perform a full text search on a given property
new SearchTerm('https://my.property', 'desired value', '@@')
Advanced search options are controlled by the SearchConfig
object (see below).
Use the search term search with @@
as an operator.
The $config
parameter allows to control advanced search configuration:
- paging
- sorting
- metadata fetch modes
- full text search highlighting
Just set the offset
and limit
properties of the SearchConfig
object.
Remember paging depends on sorting.
orderBy
property can be set to an array of RDF properties to sort by.
If descending sort order is needed, prefix the property URL with the ^
.
Only literal property values are used for ordering.
You can specify the desired language with the orderByLang
property.
It is applied to all orderBy
properties but of course only to ones having string values.
If there are many values of an indicated property for a given resource, minimum value is used for sorting.
See the description above.
All the parameters beginning with fts
refer to the full text search results highlighting.
For a detailed description see https://www.postgresql.org/docs/11/textsearch-controls.html#TEXTSEARCH-HEADLINE
The only required property to be set is the ftsQuery
which is basically the search string used for the full text search hihghlighting.
The ftsProperty
can be used to limit highlighting results to a particular metadata property.
A special value BINARY
can be used to indicated resource binary payload.
Fts highlighting result is returned as a special RDF property of the resources metadata (see the example below).
Internally the highlighted results are obtained with the ts_headline('simple', raw, websearch_to_tsquery('simple', {ftsQuery}), {ftsOptions})
where {ftsQuery}
is a corresponding SearchConfig
object property value and {ftsOptions}
is a concatenation of ftsStartSel
, ftsStopSel
,
ftsMaxWords
, ftsMinWords
, ftsShortWord
, ftsHighlightAll
, ftsMaxFragments
and ftsFragmentDelimiter
properties.
Remember above-mentioned parameters refer only to the full text search results highlighting which are technically independent from the search filters.
Setting ftsQuery
doesn't filter for resources matching it. To get this behaviour you should:
- If using
Repo::getResourcesBySearchTerms()
use aSearchTerm
withproperty
equal toftsProperty
,operator
equal to@@
andvalue
equal toftsQuery
. - If using
Repo::getResourcesBySqlQuery()
it should be something likeSELECT id FROM full_text_search WHERE property = {ftsProperty} AND websearch_to_tsquery('simple', {ftsQuery}) @@ segments
.
An example search for all resources containing a given phrase in its binary content and display full text search highlighting results for all matched ones.
use acdhOeaw\arche\lib\Repo;
use acdhOeaw\arche\lib\SearchConfig;
use acdhOeaw\arche\lib\SearchTerm;
$repo = Repo::factory('path/to/config.yaml');
$config = new SearchConfig();
$config->ftsQuery = 'my phrase';
$results = $repo->getResourcesBySearchTerm([new SearchTerm('BINARY', 'my phrase', '@@')], $config);
foreach ($results as $res) {
echo (string) $res->getGraph()->getLiteral($repo->getSchema()->searchFts) . "\n";
}
To make it easy to remove a given RDF property from resources metadata a special syntax has been introduced:
$repo = \acdhOeaw\arche\lib\Repo::factory('config.yaml');
$repo->getResourceById('https://my.id', '\acdhOeaw\arche\disserv\RepoResource');
$meta = $repo->getGraph();
$meta->addResource($repo->getSchema()->delete, 'https://unwanted.property');
$repo->updateMetadata();
The repo-php-util contained everything from the raw Fedora API wrappers up to highly abstract ACDH concepts like dissemination services.
The acdh-repo-lib is different. It provides only a new repository solution API wrappers while ACDH-specific features were moved to separate libraries [acdh-repo-acdh](https://github.com/zozlak/acdh-repo-acdh] and acdh-repo-ingest.
As in the new solution objects representing repository resources are instantiated directly it's enough to call its constructor to get a specialized object, e.g.:
$repo = \acdhOeaw\arche\lib\Repo::factory('config.yaml');
$res = new \acdhOeaw\arche\disserv\RepoResource('https://my.url', $repo);
$res->getDissServices();
Things are more complex when it comes to search results. To instantiate search result repository objects with a particular class
you should use the $class
property, e.g.:
$repo = \acdhOeaw\arche\lib\Repo::factory('config.yaml');
$repo->getResourceById('https://my.id', '\acdhOeaw\arche\disserv\RepoResource');
$term = new \acdhOeaw\arche\lib\SearchTerm('https://my.property', 'my value');
$config = new \acdhOeaw\arche\lib\SearchConfig();
$config->class = '\acdhOeaw\arche\disserv\RepoResource';
$results = $repo->getResourcesBySearchTerms([$term], $config);