Skip to content

Commit

Permalink
Service: add SRT2SRTXML/SRTXML2TTML modules
Browse files Browse the repository at this point in the history
Also includes:
- add config file mechanism
- add additional REST request to retrieve available TTML templates
- distinguish between separate groups of formats
- introduce the ability that a desired conversion is unavailable
  • Loading branch information
spoeschel committed Apr 7, 2020
1 parent 0e71223 commit 2e6b7b3
Show file tree
Hide file tree
Showing 5 changed files with 223 additions and 49 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
.basex
.basex
webapp/scf_service_config.xml
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ RUN wget https://sourceforge.net/projects/saxon/files/Saxon-HE/9.9/SaxonHE9-9-1-
&& rm saxon.zip

# copy Webapp
COPY webapp/*.xqm webapp/
COPY webapp/*.xqm webapp/scf_service_config*.xml webapp/
COPY webapp/static/error.xsl webapp/static/
COPY webapp/WEB-INF/*.xml webapp/WEB-INF/
COPY modules modules
Expand Down
80 changes: 68 additions & 12 deletions README-SCF-SERVICE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# SCF service

The SCF service allows to convert subtitles between EBU STL and
The SCF service allows to convert subtitles e.g. between EBU STL and
different EBU-TT profiles (and vice versa). The conversion is done using
IRT's Subtitle Conversion Framework (SCF) which consists of different
conversion modules. These modules allow to convert a subtitle file
Expand All @@ -16,6 +16,23 @@ The following subtitle formats/profiles are supported:
- EBU-TT
- EBU-TT-D
- EBU-TT-D-Basic-DE
- SRT
- SRTXML (an SCF internal intermediate format)
- TTML (based on a provided TTML template)


## Configuration

Some aspects of the SCF service can be configured through a config file
named `scf_service_config.xml` which is located in the `webapp`
subfolder. By default this file doesn't exist. In that case the default
configuration in `scf_service_config.xml` is used instead. This file is
part of the SCF distribution.

The following settings are available:
- `templates_path`: The location (absolute path) of the TTML templates.
By default the `templates` subfolder of the `SRTXML2TTML` module is
used.


## Building the Docker image
Expand Down Expand Up @@ -71,10 +88,38 @@ automatically proposed by the service, based on the source file's name.
A conversion can also be executed using the REST interface. The process
is started using the `convert` POST request and returns the result.

To aid conversions which imply a conversion from SRTXML towards TTML,
the `templates` GET request provides a list of all available TTML
templates.

Note that Cross-Origin Resource Sharing (CORS) is enabled i.e. requests
from any origin are processed. This behaviour can be disabled by
removing the paragraph related to CORS in `/webapp/WEB-INF/web.xml`.

### `templates` GET request

To carry out a conversion which implies a conversion from SRTXML towards
TTML, a TTML template has to be specified.

This request is a helper request that provides an (ordered) list of all
available template files in the configured template folder. Currently
this includes all files that have an `.xml` extension.

No parameters are available for this request.

Upon success the response is in JSON format and simply an array of
strings, for example:

```json
[
"ebu-tt-d-basic-de.xml",
"ttml_custom.xml"
]
```

Each string is the filename of a template available in the configured
template folder.

### `convert` POST request

This request executes a subtitles conversion and returns the conversion
Expand All @@ -99,15 +144,24 @@ The following request parameters are available:
the special value `normal`) for the line height in EBU-TT-D.
- `ignore_manual_offset_for_tcp`: If present, any manual offset (seconds
or frames) will *not* be subtracted from the TCP value.
- `template`: If present, the TTML template to be used (only affects
conversions which imply a conversion from SRTXML towards TTML - in
such a case this field is mandatory!). Only values returned by the
`templates` GET request can be used.
- `language`: If present, the language identifier to override the
general language of the used template (only affects conversions which
imply a conversion from SRTXML towards TTML).
- `indent`: If present, the output will be indented in case of a target
format based on XML.

For the two format fields, the following values are supported:
`stl`, `stlxml`, `ebu-tt`, `ebu-tt-d`, `ebu-tt-d-basic-de`.
`stl`, `stlxml`, `ebu-tt`, `ebu-tt-d`, `ebu-tt-d-basic-de`, `srt`,
`srtxml`, `ttml`.

Note that option fields may not be supported for all possible conversion
chains. If an option is not supported, the conversion will abort with a
descriptive error message then.
chains. Furthermore not every combination of source/target format may be
supported. In one of these cases the conversion will abort with a
descriptive error message.

On success, the response is in the related file format (as specified by
the transmitted media type in the response header), depending on the
Expand Down Expand Up @@ -164,6 +218,8 @@ The following files are relevant for the actual application:
- `.basexhome`: empty BaseX helper file to indicate home directory.
- `modules`: the SCF modules
- `webapp/scf_service.xqm`: application source code as XQuery module.
- `webapp/scf_service_config.xml`: configuration (if file present).
- `webapp/scf_service_config_default.xml`: default configuration.
- `webapp/static/error.xsl`: XSLT used for rendering by the error result page.
- `webapp/WEB-INF/jetty.xml`: Jetty web server config
- `webapp/WEB-INF/web.xml`: web application config
Expand All @@ -173,17 +229,17 @@ step-by-step from the source format to the target format. The FSM's
transitions are determined by a transition function. This function is
provided with the current format and the target format, and returns the
next format to which the subtitles shall (and can) be converted, towards
the target format. After all the necessary transitions/conversions,
finally the subtitles are available in the desired target format.
Depending on success/failure and the target format of the conversion,
the different header fields are set accordingly.
the target format, if possible. After all the necessary transitions and
conversions, finally the subtitles are available in the desired target
format. Depending on success/failure and the target format of the
conversion, the different header fields are set accordingly.

Most of the SCF modules are implemented in XSLT or XQuery. Such modules
can be natively invoked by the SCF service. Thus no temporary files are
required to perform the actual conversion. The module `STL2STLXML`
however is implemented in Python and requires to execute a Python
interpreter in a separate process. Furthermore the input/output data has
to be stored in (temporary) files, in order to prevent problems
required to perform the actual conversion. The modules `STL2STLXML` and
`SRT2SRTXML` however are implemented in Python and require to execute a
Python interpreter in a separate process. Furthermore the input/output
data has to be stored in (temporary) files, in order to prevent problems
regarding character encoding.

In case of a conversion option, the option is part of the status that is
Expand Down
Loading

0 comments on commit 2e6b7b3

Please sign in to comment.