-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
The newton_chymistry web application is an XProc pipeline, which is hosted by a Java web Servlet called XProc-Z, which in turn is hosted in Apache Tomcat. The web application also uses an instance of Apache Solr as a search engine.
When the application receives an HTTP request from a browser, Tomcat invokes the xproc-z Servlet to handle the request. In turn the Servlet invokes an XProc pipeline, and passes it the details of the HTTP request. The pipeline is responsible for generating each HTTP response.
- XProc pipelines are stored in files with the extension
.xpl
, in thexproc
folder. - XSLT transformations are stored in the
xslt
folder. - Figure images from the manuscripts are stored (broken down by MS identifier) in the
figure
folder. These files are transmitted directly to the browser without being transformed. - Other static resources, including icons and other images, JavaScript libraries, are stored in the
static
folder. These files are transmitted directly to the browser without being transformed. - The
schema
folder contains a TEI ODD file derived from the TEI corpus, and a RelaxNG schema derived from the ODD. - The
p4
folder contains TEI P4 files downloaded from the Xubmit P4 repository, along with several external entity files. - The
p5
folder contains just TEI P5 files, either derived from the P4 files in thep4
folder, or directly downloaded from the Xubmit P5 repository. - The root folder also contains a metadata schema definition called
search-fields.xml
which defines the Solr schema and the search and browse interface, as well as amenus.json
file which defines the site menus and the "site index" page.
In the chymistry web application the main XProc pipeline, called main
, is defined in the file xproc-z.xpl
. See the installation page for details on how the pipeline is specified.
The main pipeline examines each HTTP request and delegates it to one of a number of sub-pipelines, each of which handles a particular class of request.
As well as dispatching the requests to the sub-pipelines, the main pipeline is responsible for adding the global navigation and branding banners to HTML responses.
In the case of manuscript HTML pages, the pipeline calls several sub-pipelines and integrates the results: converting the P5 into HTML, performing hit-highlighting using Solr, inserting the image viewer, and converting annotations into popup HTML details
elements.
The add-site-navigation
pipeline is used as the last step on any pipeline which produces HTML. This pipeline transforms the output HTML by adding a global header and footer, including menus, and finally inserts the IU institutional page header.
The site menus are generated from the menus.json
file.
This XProc file contains several generic and low-level utility pipelines, for serving static files, making HTTP responses, etc.
This XProc file contains pipelines responsible for converting TEI files from P4 to P5.
-
download-p4
downloads the TEI P4 corpus from Xubmit to the P4 folder -
convert-to-p5
converts all the P4 files in thep4
folder into P5 and saves them in thep5
folder -
transform-p4-to-p5
transforms a single P4 file into P5, through a series of XSLT transformations
This XProc file contains pipelines for site administration.
-
admin-form
generates an administrative user interface, containing buttons and links for invoking other pipelines to download TEI, perform format conversions, reindex, etc. -
download-p5
downloads the TEI P5 corpus from Xubmit to the P5 folder -
download-bibliography
downloads the bibliography file from Xubmit to the P5 folder
This XProc file contains pipelines for analyzing the TEI corpus.
-
list-classification-attributes
lists the values of "classification" attributes (rend
,type
, andplace
) used in the TEI corpus -
sample-xml-text
generates a "representative" sample TEI file by extracting one of every distinct piece of markup from the entire corpus -
list-attributes-by-element
generates a list of all the attributes used for a given element type -
list-elements
generates a list of all the elements used -
list-metadata
generates a list of the document id and title metadata.
This XProc file contains the bulk of the application; mostly pipelines responsible for processing TEI P5 files in different ways.
-
update-schema
pushes a new schema definition (fromsearch-fields.xml
) to the Solr search engine -
reindex
reindexes the TEI corpus as metadata records in Solr -
generate-indexer
converts thesearch-fields.xml
metadata definition file into an XSLT transformation which can then be used to convert a TEI document into a Solr metadata record -
p5-as-solr
extracts the search fields defined insearch-fields.xml
from a single TEI document into a Solr metadata record -
convert-p5-to-solr
converts a single TEI document into a Solr metadata record, including search fields defined insearch-fields.xml
as well as full text fieldsintroduction
,diplomatic
,normalized
, and the search result fieldmetadata-summary
. -
p5-as-iiif
converts a single TEI document into a IIIF manifest -
iiif-annotation-list
generates a IIIF annotation list for a particular folio in a TEI P5 file -
bibliogaphy-as-html
converts the TEI bibliography file to HTML -
p5-as-html
converts a TEI P5 manuscript file to HTML -
p5-as-xml
serves a TEI P5 file verbatim, as XML -
list-p5
generates a page listing of the TEI P5 files
Several pages in the site are specified as plain XHTML pages, stored in the html
folder. The sub-pipeline html-page
is used to display the contents of these pages. That pipeline attempts to load the requested page, and if the page is not found, displays a 404.
This XProc file contains pipelines which performs queries against the Solr search engine.
This pipeline is invoked when a user either clicks the "search" button or clicks on a facet value in the search form.
The facet values which appear on the search form are submit buttons, each of which has its own target
URL containing the currently selected set of facets; this allows the user to incrementally specify a query by clicking a facet value which then is added to the set. However, this also means that the form must be submitted using the HTTP POST method (the GET method does not permit the target
URL to contain its own parameters). In order to retain a bookmarkable or shareable URL at each stage of the browse process, the search pipeline includes a sub-pipeline which redirects these POST requests to equivalent GET requests in which the parameters are encoded in the URL.
When the pipeline receives a GET request, it parses the parameters in the request URL, and makes use of the field definitions in the search-fields.xml
to generate a query to Solr, using Solr's JSON Facet API. The pipeline then formats the result of the Solr query into an HTML page which includes the results alongside the search and browse interface in which search field and facet values are set to the desired values.
This pipeline is used to add hit highlighting to HTML renditions of the P5 manuscripts. The pipeline is invoked from the main
pipeline to post-process the HTML renditions of the P5. If the page URL does not include a highlight
parameter, the pipeline simply copies the HTML unchanged. If a highlight
parameter is present in the URL, it is interpreted as the text to highlight. The pipeline queries Solr to generate a list of "snippets" of the text, in which the highlighted text appears in context. The pipeline then searches the HTML page to find each snippet, generating HTML highlights using the HTML mark
elements, and hyperlinks linking each mark
element to the next and previous.
This pipeline does not perform Latent Semantic Analysis; it simply delegates all lsa
requests to a back end server, and reformats the resulting HTML to include the site's global navigation.