Skip to content

Commit

Permalink
feat: Add triggering of preflight script for non-image file types (DE…
Browse files Browse the repository at this point in the history
…V-1664) (#381)

* preflight for files (…/file)

* …

* Typo fixed

* Added Lua img:gps method to read exif GPS data

* …

* …

* ...

* Better & more complete  EXIF processing

* Imroving tidyness

* Refactoring EXIF tag processing

Simplifying code using templates

---------

Co-authored-by: lrosenth <lukas.rosenthaler@unibas.ch>
  • Loading branch information
subotic and lrosenth authored Feb 9, 2023
1 parent 74980c8 commit b86428a
Show file tree
Hide file tree
Showing 20 changed files with 1,601 additions and 1,400 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,4 @@ icc.zip
/Testing
/test/nginx/logs/error.log
/site
/test/e2e/diff.tif
57 changes: 31 additions & 26 deletions docs/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@

#### Generic format conversions

- image format conversion are supported between TIFF, JPEG2000, JPG, PNG and PDF (PDF with some limitations). SIPI can
be used either as standalone command line tool or in server mode using [LUA](https://www.lua.org) scripting. SIPI
preserves most embedded metadata (EXIF, IPTC, TIFF, XMP) and is preserving and/or converting ICC color profiles.
- image format conversion are supported between TIFF, JPEG2000, JPG and PNG. SIPI can
be used either as standalone command line tool or in server mode using [LUA](https://www.lua.org) scripting.
- SIPI preserves most embedded metadata (EXIF, IPTC, TIFF, XMP) and is preserving and/or converting ICC color profiles.

#### Preservation metadata (SIPI specific)

Expand All @@ -40,51 +40,56 @@
- `original mimetype`: The mimetype of the original image before conversion
- `pixel checksum`: A checksum (e.g. SHA-256) of the original pixel values. This checksum can be used to verify that
a format conversion didn't alter the image content.
- `icc profile`: (optional) The raw ICC profile as binary string. This field is added if the fileformat has no
standard way to embed ICC color profiles (e.g. JPEG).
- `icc profile`: (optional) The raw ICC profile as binary string. This field is added if the destination file
format has no standard way to embed ICC color profiles (e.g. JPEG).

### 4. Integrated sqlite3 Database
SIPI has an integrated sqlite3 database that can be used with special LUA extensions. Thus, SIPI can be used as a
standalone media server with extended functionality. The sqlite3 database may be used to store metadata about
images, user data etc.

## Who is behind SIPI?
SIPI is developed and maintained by the "Data and Service Center for the Humanities" [(DaSCH)](https://dasch.swiss),
a Swiss national research infrastructure financed by the Swiss National Science Foundation [(SNSF)](http://www.snf.ch/) with contributions by the
Universities of Basel and Lausanne.
SIPI is developed and maintained by Lukas Rosenthaler, professor for Digital Humanities at the University of Basel, in
collaboration with the "Data and Service Center for the Humanities" ([DaSCH](https://dasch.swiss).

## How to get SIPI?
- The easiest way is to use the docker image provided on dockerhup [daschswiss/sipi](https://hub.docker.com/r/daschswiss/sipi).
The dockerized version has the binary kakadu library compiled in.
- You can compile SIPI from the sources on [github](https://github.com/dasch-swiss/sipi). Since SIPI uses many
third-party open source libraries, compiling Yourself is tedious and my be frustrating (but possible). *You have to
provide the licensed source of kakadu by Yourself*. See [kakadu software](https://kakadusoftware.com) on how to get a
licensed version of the kakadu code. SIPI should compile on Linux (Ubuntu) and (with some hand-work) OS X.
licensed version of the kakadu code. SIPI should compile on Linux (Ubuntu) and Apple OS X.

## SIPI as IIIF-Server
### Extensions to the IIIF-Standard

#### Access to PDF Pages
SIPI is able to deliver PDF's either as full file or as images using an extended IIIF-URL to access a specific page
of a multipage PDF as image using the usual IIIF syntax with small extensions:
- In case of a PDF, `info.json` includes a field `numpages` that indicates the total number of pages
the PDF has
- the image-ID given in the IIIF URL must incude a pagenumber specificer
`@pagenum` with an integer between 1 and the maximum number of pages, e.g.
```
https://iiif.dummy.org/images/test.pdf@12/full/,1000/default.jpg
```
The given URL would return page #12 of the PDF `test.pdf` with a height of 1000 pixels.
Thus, all IIIF URL parts will work as expected.
#### Preflight script
- Before executing a IIIF request, a freely configurable LUA-script is being called. This script must return the
permission to access the resource ("allow", "restrict" "deny") and the final path to the resource. This allows
to handle access rights etc. Within the LUA-script, permission databases etc. may be accessed through RESTful
services or using the internal SQLite database. In addition, the path to the resource may be redirected or other
limitations imposed (size, watermark etc.).
- The preflight script has access to the full HTTP(s) header including cookies and Authorization information. There are
also utility functions to decode JSON Web Tokens ([JWT](https://jwt.io)).


#### Access to non-image files
Sometimes it would be helpful to deliver non-image files such as XML, CSV etc. from the same directory tree as the
IIIF-conformant images:
- The url to download a file must have the form ```http(s)://{server}/{prefix}/{fileid}/file```. The clause
_/file_ at the end indicates that the file should bypass any IIIF URl processing and just be served as file.
- Also in this case, a *preflight script* may be configured to control access to such file resources.
- if the url has the form ```http(s)://{server}/{prefix}/{fileid}/info.json```, SIPI returns a JSON containing
information about the file. The JSON has the from
- `@context: "http://sipi.io/api/file/3/context.json"`
- `id: "http(s)://{server}/{prefix}/{fileid}"`
- `mimeType: {mimetype}` . Please note that SIPI determines the mimetype using the magic number. Due to the
limitations thereof the mimetype cannot be determined exactly.
information about the file. The JSON has the from:
```json
{
"@context": "http://sipi.io/api/file/3/context.json",
"id": "https://localhost:1025/images/csv_test.csv",
"internalMimeType": "text/csv",
"fileSize": 36
}
```
Please note that SIPI determines the mimetype using the magic number. Due to the limitations thereof the mimetype
- may not be determined exactly.


55 changes: 46 additions & 9 deletions docs/lua-image.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,44 @@ use the md5-algorithm for the has of the pixel values.

### SipiImage.dims()

success, dims = img.dims()
if success then
server.print('nx=', dims.nx, ' ny=', dims.ny)
end

Returns the pixel dimensions of the image as dims object.
success, dims = img.dims()
if success then
server.print('nx=', dims.nx, ' ny=', dims.ny, ' ori=', dims.orientation)
end

This method returns basic information about the image. It returns a Lua table withg the following items:
- _nx_: Number of pixels in X direction (image width)
- _ny_: Number of pixels in Y direction (image height)
- _orientation_: Orientation of image which is an integer with the following meaning:
- _1_: (TOPLEFT) The 0th row represents the visual top of the image, and the 0th column represents the visual left-hand side.
- _2_: (TOPRIGHT) The 0th row represents the visual top of the image, and the 0th column represents the visual right-hand side.
- _3_: (BOTRIGHT) The 0th row represents the visual bottom of the image, and the 0th column represents the visual right-hand side.
- _4_: (BOTLEFT) The 0th row represents the visual bottom of the image, and the 0th column represents the visual left-hand side.
- _5_: (LEFTTOP) The 0th row represents the visual left-hand side of the image, and the 0th column represents the visual top.
- _6_: (RIGHTTOP) The 0th row represents the visual right-hand side of the image, and the 0th column represents the visual top.
- _7_: (RIGHTBOT) The 0th row represents the visual right-hand side of the image, and the 0th column represents the visual bottom.
- _8_: (LEFTBOT) The 0th row represents the visual left-hand side of the image, and the 0th column represents the visual bottom.

### SipiImage.exif(&lt;EXIF-parameter-name&gt;)

success, value-or-errormsg = img:exit(<EXIF-parameter-name>)

Return the value of an exif parameter. The following EXIF parameters are supported:
- _"Orientation"_: Orientation (integer)
- _"Compression"_: Compression method (integer)
- _"PhotometricInterpretation"_: The photometric interpretation (integer)
- _"SamplesPerPixel"_: Samples per pixel (integer)
- _"ResolutionUnit"_: 1=none, 2=inches, 3=cm (integer)
- _"PlanarConfiguration"_: Planar configuration, 1=chunky, 2=planar (integer)
- _"DocumentName"_: Document name (string)
- _"Make"_: Make of camera or scanner (string)
- _"Model"_: Model of camera or scanner (string)
- _"Software"_: Software used for capture (string)
- _"Artist"_: Artist that created the image (string)
- _"DateTime"_: Date and time of creation (string)
- _"ImageDescription"_: Image description
- _"Copyright"_: Copyright info
-

### SipiImage.crop(&lt;iiif-region-string&gt;)

Expand All @@ -82,14 +114,19 @@ valid IIIF-region string.

success, errormsg = img.scale(<iiif-size-string>)

Resizes the image to the given size as iiif-conformant size string.
Resizes the image to the given size as IIIF-conformant size string.

### SipiImage.rotate(&lt;iiif-rotation-string&gt;)

success, errormsg = img.rotate(<iiif-rotation-string>)

> Rotates and/or mirrors the image according the given iiif-conformant
> rotation string.
Rotates and/or mirrors the image according the given iiif-conformant rotation string.

### SipiImage.topleft()
Rotates an image to the standard TOPLEFT orientation if necessary. Please note
that viewers using tiling (e.g. [openseadragon](https://openseadragon.github.io)) require images in TOPLEFT rotation.
Thus, it is highly recommended that all images served by SIPI IIIF will be set to TOPLEFT orientation. This process may
involve rotation of 90, 180 or 270 degrees and possible mirroring which does _not_ change the pixel values through interpolation.

### SipiImage.watermark(wm-file-path)

Expand Down
94 changes: 68 additions & 26 deletions docs/lua.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# SIPI Lua Interface
SIPI has an embedded [LUA](http://www.lua.org) interpreter. LUA is a simple script language that was deveopped
specifically to be embedded into applications. For example the game [minecraft](https://www.minecraft.net)
makes extensive use of LUA scripting
SIPI has an embedded [LUA](http://www.lua.org) interpreter. LUA is a simple script language that was developped
specifically to be embedded into applications. For example the games [minecraft](https://www.minecraft.net) and
[World of Warcraft](https://worldofwarcraft.com/de-de/) make extensive use of LUA scripting for customization and programming extensions.

Each HTTP request to SIPI invokes a new, independent
lua-instance. Therefore LUA may be used in the following contexts:
Each HTTP request to SIPI invokes a recent, independent
lua-instance (Version 5.3.5). Therefore, LUA may be used in the following contexts:

- Preflight function (mandatory)
- Preflight function
- Embedded in HTML pages
- RESTful services using the SIPI routing

Expand All @@ -20,7 +20,7 @@ Each lua-instance in SIPI includes additional SIPI-specific information:
- querying and changing the SIPI runtime configuration (e.g. the cache)

In general, the SIPI LUA function make use that a Lua function's return value may consist of
more than one element (see [Multiple Results](http://www.lua.org/pil/5.1.html)):
more than one element (see [Multiple Results](http://www.lua.org/pil/5.3.html)):

Sipi provides the [LuaRocks](https://luarocks.org/)
package manager which must be used in the context of SIPI.
Expand All @@ -30,22 +30,33 @@ request runs in its own thread and has its own Lua interpreter.
Therefore, only Lua packages that are known to be thread-safe may be
used!*

## Pre-flight function
The pre-fight function is mandatory and located in the init-script (see
## Preflight function
It is possible to define a LUA pre-flight function for *IIIF*-requests and independently one for *file*-requests
(indicated by a _/file_ postfix in the URL). Both are optional and are best located in the init-script (see
[configuarion options](../sipi/#setup-of-directories-needed) of SIPI). It is executed after the incoming
IIIF HTTP request data has been processed but before an action to respond to the request has been taken. It should
be noted that the pre-flight script is only executed for IIIF-specific requests. All other HTTP requests are being
directed to "normal" HTTP-server part of SIPI. These can utilize the lua functionality by embedding LUA commands
within the HTML.
HTTP request data has been processed but before an action to respond to the request has been taken. It should
be noted that the pre-flight script is only executed for IIIF-specific requests (either using the IIIF URL-syntax or the
_/file_ postfix). All other HTTP requests are being directed to "normal" HTTP-server part of SIPI.
These can utilize the lua functionality by embedding LUA commands within the HTML.

The pre-flight function takes 3 parameter:
### IIIF preflight function
The IIIF preflight function must have the name **pre_flight** with the following signature:

```lua
function pre_flight(prefix,identifier,cookie)

return "allow", filepath
end
```
The preflight function takes 3 parameter:

- `prefix`: This is the prefix that is given on the IIIF url [mandatory]
*http(s)://{server}/__{prefix}__/{id}/{region}/{size}/{rotation}/{quality}.{format}*
Please note that the prefix may contain several "/" that can be used as path to the repository file
- `identifier`: The image identifier (which must not correspond to an actual filename in the media files repositoy)
- `identifier`: The image identifier (which must not correspond to an actual filename in the media files repositoy
of the SIPI IIIF server)
[mandatory]
- `cookie`: A cookie containing authorization information. Usually the cookie cntains a Json Web Token [optional]
- `cookie`: A cookie containing authorization information. Usually the cookie contains a Json Web Token [optional]

The pre-flight function must return at least 2 parameters:

Expand All @@ -66,27 +77,37 @@ The pre-flight function must return at least 2 parameters:
The most simple working pre-flight looks as follows assuming that the `identifier`is the name of the master image
file in the repository and the `prefix` is the path:
```lua
function pre_flight(prefix, identifier, cookie) {
filepath = config.imgroot .. '/' .. prefix .. '/' .. identifier
function pre_flight(prefix, identifier, cookie)
if config.prefix_as_path then
filepath = config.imgroot .. '/' .. prefix .. '/' .. identifier
else
filepath = config.imgroot .. '/' .. identifier
end
return 'allow', filepath
}
end
```
Above function allows all files to be served without restriction.
Above example preflight function allows all files to be served without restriction.

#### More complex example of preflight function

The following example uses some SIPI lua funtions
to access a authorization server to check if the user (identified by a cookie) is allowed to see the specific image. We are
to access an authorization server to check if the user (identified by a cookie) is allowed to see the specific image. We are
using [Json Web Tokens](https://jwt.io) (JWT) which are supported by SIPI specific LUA functions. Please note that the
SIPI JTW-functions support an arbitrary payload that has not to follow the JWT recommendations. In order to encode, the
JWT_ALG_HS256 is beeing used together with the key that is defined in the SIPI configuration as
[jwt_secret](../sipi/#jwt-secret).
```lua
function pre_flight(prefix, identifier, cookie) {
function pre_flight(prefix, identifier, cookie)
--
-- make up the file path
--
local filepath = config.imgroot .. '/' .. prefix .. '/' .. identifier
if config.prefix_as_path then
filepath = config.imgroot .. '/' .. prefix .. '/' .. identifier
else
filepath = config.imgroot .. '/' .. identifier
end
--
-- we need a cookie containing the user inforamtion that will be
-- we need a cookie containing the user information that will be
-- sent to the authorization server. In this
-- example, the content does not follow the JWT rules
-- (which is possible to pack any table into a JWT encoded token)
Expand Down Expand Up @@ -132,7 +153,7 @@ function pre_flight(prefix, identifier, cookie) {
else
return 'deny', filepath
end
}
end
```
Above example assumes that the cookie data is a string that contains encrypted user data from a table (key/value pair).
Jason Web Token. This token is decoded and the information about the image to be displayed is added. Then the information
Expand All @@ -145,6 +166,27 @@ The pre-flight function uses the following SIPI-specific LUA global variables an
- [server.generate_jwt()](#servergenerate_jwt): (Function) Create a new JWT token from a key/value table.
- [server.json_to_table()](#serverjson_to_table): (function) Convert a JSON into a LUA table.

### File preflight function
An URL in the form ```http(s)://{server}/{prefix}/{identifier}/file``` serves the given file as binary object (including
propere mimetype in the header etc.). The file has to reside in the directory tree defined for IIIF requests. In these
cases, a preflight script name `file_pre_flight` is being called if defined. Its signature is as follows:
```lua
function file_pre_flight(filepath, cookie)

end
```
A simple example allowing access only to the file _"unit/test.csv"_ would be:
```lua
function file_pre_flight(filepath, cookie)
if filepath == "./images/unit/test.csv" then
return "allow", filepath
else
return "deny", ""
end
end
```
This script would deny all other file access and the SIPI IIIF server responds with a `401 Unauthorized` error.

## LUA embedded in HTML
The HTTP server that is included in SIPI can serve any type of file which are just transfered as is to the client.
However, if a file has an extension of `.elua`, it is assumed to be a HTML file with embedded LUA code. ALL SIPI-specific
Expand Down Expand Up @@ -252,7 +294,7 @@ client info could like follows:
```

### Embedded LUA and enforcing SSL
The supplied init-file offers a LUA function that enforces the use of a SSL encryption page
The supplied example initialization file offers a LUA function that enforces the use of a SSL encryption page
proteced by a user name and password. It is used as follows by adding the following code
*before the `<html>` opening tag*:

Expand Down
Loading

0 comments on commit b86428a

Please sign in to comment.