Skip to content

Architecture

Conal Tuohy edited this page Feb 19, 2020 · 1 revision

The newton_chymistry web application is an XProc pipeline, which is hosted by a Java web Servlet called XProc-Z, which in turn is hosted in Apache Tomcat. The web application also uses an instance of Apache Solr as a search engine.

When the application receives an HTTP request from a browser, Tomcat invokes the xproc-z Servlet to handle the request. In turn the Servlet invokes an XProc pipeline, and passes it the details of the HTTP request. The pipeline is responsible for generating each HTTP response.

File locations

  • XProc pipelines are stored in files with the extension .xpl, in the xproc folder.
  • XSLT transformations are stored in the xslt folder.
  • Figure images from the manuscripts are stored (broken down by MS identifier) in the figure folder. These files are transmitted directly to the browser without being transformed.
  • Other static resources, including icons and other images, JavaScript libraries, are stored in the static folder. These files are transmitted directly to the browser without being transformed.
  • The schema folder contains a TEI ODD file derived from the TEI corpus, and a RelaxNG schema derived from the ODD.
  • The p4 folder contains TEI P4 files downloaded from the Xubmit P4 repository, along with several external entity files.
  • The p5 folder contains just TEI P5 files, either derived from the P4 files in the p4 folder, or directly downloaded from the Xubmit P5 repository.
  • The root folder also contains a metadata schema definition called search-fields.xml which defines the Solr schema and the search and browse interface, as well as a menus.json file which defines the site menus and the "site index" page.

The main pipeline file xproc-z.xpl

The main pipeline (main)

In the chymistry web application the main XProc pipeline, called main, is defined in the file xproc-z.xpl. See the installation page for details on how the pipeline is specified.

The main pipeline examines each HTTP request and delegates it to one of a number of sub-pipelines, each of which handles a particular class of request.

As well as dispatching the requests to the sub-pipelines, the main pipeline is responsible for adding the global navigation and branding banners to HTML responses.

In the case of manuscript HTML pages, the pipeline calls several sub-pipelines and integrates the results: converting the P5 into HTML, performing hit-highlighting using Solr, inserting the image viewer, and converting annotations into popup HTML details elements.

Global navigation and branding (add-site-navigation)

The add-site-navigation pipeline is used as the last step on any pipeline which produces HTML. This pipeline transforms the output HTML by adding a global header and footer, including menus, and finally inserts the IU institutional page header.

The site menus are generated from the menus.json file.

The XProc-Z library file xproc-z-library.xpl

This XProc file contains several generic and low-level utility pipelines, for serving static files, making HTTP responses, etc.

The P5 conversion file convert-to-p5.xpl

This XProc file contains pipelines responsible for converting TEI files from P4 to P5.

  • download-p4 downloads the TEI P4 corpus from Xubmit to the P4 folder
  • convert-to-p5 converts all the P4 files in the p4 folder into P5 and saves them in the p5 folder
  • transform-p4-to-p5 transforms a single P4 file into P5, through a series of XSLT transformations

The site administration file administration.xpl

This XProc file contains pipelines for site administration.

  • admin-form generates an administrative user interface, containing buttons and links for invoking other pipelines to download TEI, perform format conversions, reindex, etc.
  • download-p5 downloads the TEI P5 corpus from Xubmit to the P5 folder
  • download-bibliography downloads the bibliography file from Xubmit to the P5 folder

The corpus analysis XProc file analyze-corpus.xpl

This XProc file contains pipelines for analyzing the TEI corpus.

  • list-classification-attributes lists the values of "classification" attributes (rend, type, and place) used in the TEI corpus
  • sample-xml-text generates a "representative" sample TEI file by extracting one of every distinct piece of markup from the entire corpus
  • list-attributes-by-element generates a list of all the attributes used for a given element type
  • list-elements generates a list of all the elements used
  • list-metadata generates a list of the document id and title metadata.

The P5-processing XProc file p5-processing.xpl

This XProc file contains the bulk of the application; mostly pipelines responsible for processing TEI P5 files in different ways.

  • update-schema pushes a new schema definition (from search-fields.xml) to the Solr search engine
  • reindex reindexes the TEI corpus as metadata records in Solr
  • generate-indexer converts the search-fields.xml metadata definition file into an XSLT transformation which can then be used to convert a TEI document into a Solr metadata record
  • p5-as-solr extracts the search fields defined in search-fields.xml from a single TEI document into a Solr metadata record
  • convert-p5-to-solr converts a single TEI document into a Solr metadata record, including search fields defined in search-fields.xml as well as full text fields introduction, diplomatic, normalized, and the search result field metadata-summary.
  • p5-as-iiif converts a single TEI document into a IIIF manifest
  • iiif-annotation-list generates a IIIF annotation list for a particular folio in a TEI P5 file
  • bibliogaphy-as-html converts the TEI bibliography file to HTML
  • p5-as-html converts a TEI P5 manuscript file to HTML
  • p5-as-xml serves a TEI P5 file verbatim, as XML
  • list-p5 generates a page listing of the TEI P5 files

Serving "static" HTML pages (html.xpl)

Several pages in the site are specified as plain XHTML pages, stored in the html folder. The sub-pipeline html-page is used to display the contents of these pages. That pipeline attempts to load the requested page, and if the page is not found, displays a 404.

Searching (search.xpl)

This XProc file contains pipelines which performs queries against the Solr search engine.

The search pipeline

This pipeline is invoked when a user either clicks the "search" button or clicks on a facet value in the search form.

The facet values which appear on the search form are submit buttons, each of which has its own target URL containing the currently selected set of facets; this allows the user to incrementally specify a query by clicking a facet value which then is added to the set. However, this also means that the form must be submitted using the HTTP POST method (the GET method does not permit the target URL to contain its own parameters). In order to retain a bookmarkable or shareable URL at each stage of the browse process, the search pipeline includes a sub-pipeline which redirects these POST requests to equivalent GET requests in which the parameters are encoded in the URL.

When the pipeline receives a GET request, it parses the parameters in the request URL, and makes use of the field definitions in the search-fields.xml to generate a query to Solr, using Solr's JSON Facet API. The pipeline then formats the result of the Solr query into an HTML page which includes the results alongside the search and browse interface in which search field and facet values are set to the desired values.

The highlight-hits pipeline

This pipeline is used to add hit highlighting to HTML renditions of the P5 manuscripts. The pipeline is invoked from the main pipeline to post-process the HTML renditions of the P5. If the page URL does not include a highlight parameter, the pipeline simply copies the HTML unchanged. If a highlight parameter is present in the URL, it is interpreted as the text to highlight. The pipeline queries Solr to generate a list of "snippets" of the text, in which the highlighted text appears in context. The pipeline then searches the HTML page to find each snippet, generating HTML highlights using the HTML mark elements, and hyperlinks linking each mark element to the next and previous.

The Latent Semantic Analysis XProc file lsa.xpl

This pipeline does not perform Latent Semantic Analysis; it simply delegates all lsa requests to a back end server, and reformats the resulting HTML to include the site's global navigation.