This Feature Info Agent (FIA) acts as a single access point for TWA Visualisations to query for both meta and time series data of an individual feature (i.e. a single geographical location) so that it can then be displayed within the side panel of the visualisation.
Please see the CHANGELOG file for details on recent changes; the latest available image of the FIA can be determined by viewing its GitHub package page.
The FIA is a relatively simple HTTP agent built using the TWA agent framework. Its goal is to take in the IRI of a single feature then use it to query the knowledge graphs for metadata, and the relational databases for time series data before formatting and returning it as a JSON object.
At the time of writing, automatic discovery of data is not feasible, as such the developer deploying an instance of the FIA is responsible for writing SPARQL queries to both return the raw metadata as well as the data IRIs of time series data (so that these can then be looked up in the relational databases to actually get the time series data).
These SPARQL queries are written on a class-by-class (TBox) basis; this should mean that, for example, all IRIs that are ABox instances of the https://theworldavatar.io/ontobuildings/Building
TBox class will reuse the same SPARQL query as they should have data in the same format.
At the time of writing, the FIA has a few restrictions that all deploying developers should be aware of. These are as follows:
- The FIA can only be run within a TWA Stack.
- The FIA can only report meta and time data that is contained within the same stack as the agent itself.
- The FIA can only return time series data on series that uses the Instant class.
In addition to the above restrictions, the FIA uses a hardcoded SPARQL query to ask the KG what classes the received ABox IRI belongs to. In essence, the query asks what rdf:type
the ABox IRI has, and what the super class of any returned TBox IRI is, producing a list of the class hierarchy all the way up to rdf:Resource
. It has been written in a way that uses all of the Blazegraph and Ontop endpoints within the stack, to be robust to the ABoxes and TBoxes being stored separately.
If the query fails to return any results, then the FIA will not function; developers may need to update their triples/mapping until at least one of the queries does return something.
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?class WHERE {
VALUES ?all_endpoints {
[ENDPOINTS-ALL]
}
VALUES ?kg_endpoints {
[ENDPOINTS-BLAZEGRAPH]
}
SERVICE ?all_endpoints {
[IRI] a ?type.
}
SERVICE ?kg_endpoints {
?type rdfs:subClassOf* ?class .
}
FILTER (!isBlank(?class))
}
Before diving into the details on how to write the queries to retrieve the aforementioned data, it's worth clarifying some terms used within this documentation. For more details on time series, check out the documentation for the JPS Base Lib project.
-
Time Series:
- As per the time series ontology, a time series is defined as a collection of entities (each of which represents a column of dependent, or Y, values) that are grouped by a single column of independent (or X) time values.
- Confusingly, the entity representing the independent values is referred to as a "Time Series"; there is no generic name for the entities representing the dependent values (see below).
- Neither the entities representing the independent or dependent columns have values within the knowledge graph, only the IRIs are present and are used to link to the actual values stored within the relational database.
-
Measurable:
- As there's no ontological term for the entities representing dependent columns, the FIA (within the code and documentation) is going to referred to them as "Measurables".
- Each of these "Measurable" instances will normally have some sort of
hasUnit
predicate/ - A "Measurable" can be a series of values of any time (number, string, boolean etc.).
- Any existing entity in the KG can become a "Measurable", as such one cannot write a generic "show me all measurables" SPARQL query that works for all data sets.
For the FIA to function, a number of configuration steps need to take place before deployment, these are detailed in the subsections below. It is also necessary for users to have good knowledge of Docker, JSON, and to be familiar with management of the TWA Stack system.
Note: As of version 3.0.0
of the FeatureInfoAgent, the configuration format has changed to support new options. The new format is documented below, but the older format is also supported. To support the newer features, it is recommended that developers write new configurations using the new format, and existing configurations are manually updated wherever possible.
Follow the below configuration steps within the fia-queries
subdirectory of the TWA stack manager's data directory. Volumes that are used by containers running with the TWA Stack are populated by named subdirectories within the stack manager's data directory. For more details, read the TWA Stack Manager documentation.
The configuration file should be a JSON file named fia-config.json
, contained within it should be:
entries
: This is a required array of objects defining a mapping between (TBox) class IRIs and the names of files containing pre-written SPARQL queries. Each object needs to contain the following parameters:- Required:
class
: Full IRI of the class.
- Optional:
meta
: Object containing configurations for meta data retrieval.time
: Object containing configurations for time series retrieval.trajectory
: Object containing configurations for trajectory retrieval.
- Required:
The meta
object should contain the following parameters:
- Required:
queryFile
: Location of file with SPARQL query used to get meta data (relative to configuration file).
The time
object should contain the following parameters:
- Required:
queryFile
: Location of file with SPARQL query used to get measurable IRIs (relative to configuration file).database
: Name of PostGRES database containing values.
- Optional:
limit
: Non-zero integer, defaults to "24". Limit used when calculating boundaries of time data to query.unit
: Unit of above limit, defaults to "hours". One of:- "days"
- "hours"
- "minutes"
- "seconds"
- "milliseconds"
reference
: Reference point for bounds calculation. Can be a string representing a time in ISO instant format, e.g."2011-12-03T10:15:30Z"
or one of:- "all" (limit parameter unused in this case)
- "now" (server time at request)
- "latest" (furthest forward time value in RDB)
- "first" (furthest back time value in RDB) Defaults to "now".
The trajectory
object should contain the following parameters, more descriptions of the file contents are given in here:
- Required:
pointIriQuery
: Location of file with SPARQL query used to get point IRIs containing time series (relative to configuration file).featureIriQuery
: Location of file with SQL/SPARQL query to obtain the intersected feature IRIs (relative to configuration file).metaQuery
: Location of file with SPARQL query to obtain metadata of the intersected features (relative to configuration file).
For clarification, the limit
value supports both positive and negative integers. For reference types of now
and latest
it is multiplied by -1 then added to the reference time during the calculation of retrieval times. For references of first
is is simply added to the reference time.
For example, a value of 24
with a reference of now
will provide all values generated within the last real-world day. Whereas a value of 24
with a reference of first
will return all values generated between the first data point and one real-world day afterwards.
Within the samples/fia/fia-config.json file, a mock configuration can be found.
To properly parse the meta data and time series queries, the agent requires the results from queries to fulfil a set formats. For each type of query a number of placeholder tokens can be added that will be populated by the agent just before execution. These are:
[IRI]
: The IRI of the feature (ABox) of interest, i.e. the feature selected within the TWA-VF (the IRI will be injected by the agent).[ONTOP]
: The internal URL of the Ontop service within the stack (the URL will be injected by the agent).[ENDPOINTS-ALL]
: Internal URLs of all Blazegraph and Ontop endpoints, good for use with "SERVICE" keyword.[ENDPOINTS-BLAZEGRAPH]
: Internal URLs of all Blazegraph endpoints, good for use with "SERVICE" keyword.[LINE_WKT]
: Only used in trajectory query, placeholder to insert WKT literal of trajectory.
Queries for meta data should not concern themselves with data relating to time series. Queries here need to return a table with two (or optionally three) columns. The first column should be named Property
and contains the name of the parameter we're reporting, the second should be Value
and contain the value. The optional third column is Unit
; any other columns are currently ignored.
Queries that generate multiple rows with the same property name are supported, their values will be combined into a single JSON array by the agent.
Property | Value | Unit |
---|---|---|
Elevation | 100 | m |
Station Reference | 0001 | |
Station Reference | 0001A | |
Catchment Name | Cotswolds | |
Up Time | 7 | Days |
An example of a meta data SPARQL query can be seen here; note that this is for a sample data set defined in a simple ontology here.
Queries for measurable entities need to return the IRIs of the entities representing the dependent value columns (i.e. "Measurable" instances), rather than that of the time series instance itself. Those IRIs will be used to grab the actual values from the relational database as well as parameters associated with each measurement/forecast.
Required columns are Measurable
(Measurement
also supported for backwards compatibility) containing the entity IRI, Name
containing a user facing name for this entry, and Unit
containing the unit (which can be blank); any other columns are currently ignored
Measurement | Name | Unit |
---|---|---|
https://theworldavatar.io/measurement-iri-one/ | Flow Rate | m^3/s |
https://theworldavatar.io/measurement-iri-two/ | Speed | m/s |
https://theworldavatar.io/measurement-iri-three/ | Ownership |
An example of a meta data SPARQL query can be seen here; note that this is for a sample data set defined in a simple ontology here.
In summary, the FIA runs three queries sequentially to generate the final metadata for a selected trajectory. The first query should return IRIs that contain time series data in the form of PostGIS points, the FIA will use these IRIs to obtain the recorded points using the TimeSeriesClient. The FIA will construct a line based on the queried points, and use this line in the second query to find the list of intersected features with this line. The FIA will inject the IRIs from the second query (intersected features) into the final query to obtain the final metadata for display.
pointIriQuery
: Contents must contain one SELECT parameter, can be named anything, this is an example - point_query.sparql. The returned instances of this query must contain time series data stored as PostGIS points, the IRIs should be the measurables, similar to queries for measurables. If the query returns more than one IRI, results will be combined and sorted according to time.
featureIriQuery
: Both SQL and SPARQL are allowed, the FIA is able to detect the query type. Contents must contain one SELECT parameter, can be named anything. Should contain the placeholder [LINE_WKT]
for FIA to insert the WKT literal of trajectory. Two examples are given - SPARQL version and SQL version. Be sure to handle any SRID transformation if necessary.
metaQuery
: Query template must contain a variable ?Feature
in the WHERE clause. FIA will add a VALUES clause with the feature IRIs from the previous query, e.g. VALUES ?Feature {<http://feature1> <http://feature2>}
. The SELECT parameters follow the requirements of the standard meta data queries, i.e. the first column should be named Property
and contains the name of the parameter we're reporting, the second should be Value
and contain the value. The optional third column is Unit
; any other columns are currently ignored. Here is an example.
The following HTTP request routes are available for the agent:
-
/get
- Run algorithm to gather metadata and time series.
- Requires the
iri
parameter. - Supports optional
endpoint
parameter to direct KG queries to a specific endpoint rather than federating across all of them. - Supports optional
lowerbound
andupperbound
specifically for trajectories, these are the time limits for the points time series.
-
/status
- Reports the agent's current status.
-
/refresh
- Forces the agent to re-scan for available Blazegraph endpoints.
The FIA container is an optional built-in service in the stack; to enable it you need to create/modify the configuration file for that stack. An example of the changes required are described in the stack-manager readme file here. After spinning up the stack the agent should be accessible via the /feature-info-agent
route.
Note that the version of the FIA run by the stack is determined by the stack manager itself; to use a custom (or newer) version, developers will need ensure the newer FIA image is built (either locally or uploaded to GitHub), then provide a custom service configuration (ideally a near-copy of the stack's default configuration for the FIA, found here) within the stack manager's inputs/config/services
directory.
The FIA is currently set up with two automated GitHub actions:
-
Test the FeatureInfoAgent:
- Only runs when files within the agent have changed AND on commits that are part of a non-draft PR to the main branch.
- Tests the FIA by running its unit tests and compiling a Docker image (which is NOT pushed at this stage).
-
Release the FeatureInfoAgent:
- Only runs when files within the agent are changed AND on commits to the main branch (i.e. after a PR is approved and merged).
- Builds the FIA's Docker image (inc. running the unit tests again) AND pushes it to the TWA GitHub image registry.
A number of different projects have made use of the FeatureInfoAgent, some good examples to use as starting points are:
- UK Base World: Project showing power plant locations across the UK.
- TWA-VF Tutorial: The Mapbox tutorial from the TWA-VF documents using the FIA in a simple case with NHS data.
Packaged within this directory is also a number of configuration and data files used to spin up a small sample stack used to manually test the FIA. Whilst this has not been put together to act as a shining example of the FIA, one is free to look at the configuration files to determine proper syntax.
It should be noted that no specialist tutorial for the FeatureInfoAgent exists at the time of writing; however, the FeatureInfoAgent is a core component of the aforementioned examples. These examples (along with the documentation on this page) can be used as an introduction/tutorial to the FeatureInfoAgent.
For troubleshooting and FAQs, please see the FIA Troubleshooting document.
The FIA is a simple HTTP agent written using the existing TWA agent framework. The core functionality of the agent is split across 4 classes; the central FeatureInfoAgent
class that acts as the receiver and transmitter for HTTP requests, and classes that actually run logic (which should be self-explanatory): ClassHandler
, MetaHandler
, and TimeHandler
.
The algorithm used to find, format, and return data after a request is received is detailed in the Mermaid diagram here (although you can also read the in-code documentation for more details).
Building the Docker image for the FIA is automatically triggered under certain conditions (see above), but developers can also build a local copy using the provided build.sh
script after supplying the required repo_username.txt
and repo_password.txt
files within the credentials
directory.
For support, please file an issue in GitHub using the FeatureInfoAgent
project, or contact the CMCL technical team.