..............................................
..... ______....______....___...___.... ___...
.... / /.../ ___/..../ /../ /... / /..
... / // /___/ /.._____/ /../ /____/ /_..
.../ ___/ / /../ / / \/ / / \.
../ / / // / /__/ // / / / / //_/ // /.
./__/ /____/\____/____/________/_____\_____/..
..............................................
Pocoweb is a postcorrection tool for (historical) OCR data for the web. It is based on PoCoTo.
If you encounter any problems, make sure to check the misc notes section on the bottom of this page.
Pocoweb should be deployed using docker-compose. The main configuration can be found in the docker-compose.yml file.
In theory it should be possible to deploy Pocoweb on any platform that supports docker. But there apears to be a problem with the mariadb docker image on Windows.
For the remainder of the section, it is assumed that the Pocoweb source directory is checked out at some location. All examples are meant to be run from the project's source directory.
Pocoweb uses a base directory to store its data (database tables, line
images, etc.). It defaults to /srv/pocoweb
, but one can change the
location of the base directory. In this case change all instances of
/srv/pocoweb
to the desired location for Pocoweb's base directory.
Create all required directories using mkdir -p /srv/pocoweb{www,project,db,tmp,language}-data
.
Copy the default configuration file to the source directory using cp misc/config/env.sh ./
and adjust it accordingly. In the first block
you can configure the access parameters to the internal database of
Pocoweb. The database is not accessible from outside the host
machine, but you should at least customize the database's password.
The second block in the configuration file contains settings for the root administrator account. This account is created at the first start of Pocoweb and must be used to access Pocoweb the first time. You should change all the settings accordingly.
If you really want to use Pocoweb's default settings you can skip the edition of the configuration file and just use the default one (in this case you do not even have to copy it).
Pocoweb uses https so according certificate files are needed. If you
have some, copy the key file to services/nginx/key.pem
and the cert
file to services/nginx/cert.pem
.
You can create a self signed certificate using e.g.
openssl or any other tool to generate an
according certificate. Using openssl just execute openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout services/nginx/key.pem -out services/nginx/cert.pem
from the root directory to generate it. You
can skip this step, since the build process will generate a self
signed certificate using openssl automatically if no certificate was
provided.
Last but not least checkout some external
dependencies using git submodule update --init
.
After having gone through all the above steps, one can build and start pocoweb using the provided Makefile. If no configuration file is provided the build process will use the default configuration file. If no certificate is provided the build process will automatically generate one. You will be prompted some initial configuration in this case.
To build and start Pocoweb the following dependencies have to be installed:
docker
docker-compose
make
openssl
(if no certificate is provided)nodejs
To deploy Pocoweb just execute make docker-start
. This command will
build all required docker images and start up the docker service.
Depending on your configuration you might need to used an
administrator account for you machine or use sudo
.
To stop the service execute make docker-stop
. If you want to inspect
the log files execute docker-compose logs -f
(you can use CTRL-c to
exit the log).
If you did no use the default base directory /srv/pocoweb
you have
to set the according path in the call to make: make PCW_BASE_DIR=/my/custom/path docker-start
(note: you have to use an
absolute path).
If you encouter any issues deploying Pocoweb feel free to open an issue on github.
It is possible to use pocoweb without https. You should not do it, since the pocoweb frontend has to send emails, password and authentification tokens in order to do its work. If you still want to disable https for pocoweb, you have to edit the nginx configuration and Dockerfile manually and then start the server normally (see above).
user nginx;
worker_processes auto;
pid /run/nginx.pid;
events {
worker_connections 768;
# multi_accept on;
}
http {
##
# Basic Settings
##
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
server_tokens off;
# server_names_hash_bucket_size 64;
# server_name_in_redirect off;
include /etc/nginx/mime.types;
default_type application/octet-stream;
##
# SSL Settings
##
ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
ssl_prefer_server_ciphers on;
##
# Logging Settings
##
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
##
# Gzip Settings
##
gzip on;
gzip_disable "msie6";
server { # redirect http -> https
listen 80;
server_name pocoweb.cis.lmu.de;
client_max_body_size 2000M;
root /www-data/public_html;
index index.html;
location / {
try_files $uri $uri/ =404;
}
location /rest/ {
rewrite ^/rest/(.*)$ /$1 break;
proxy_pass http://pcwauth;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $server_name;
}
location /project-data {
alias /project-data;
}
}
}
FROM nginx:alpine
MAINTAINER Florian Fink <fink@cis.lmu.de>
VOLUME /www-data /project-data
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
EXPOSE 443
CMD ["nginx-debug", "-g", "daemon off;"]
The basic usage and API Documentation is part of Pocoweb. It can be found here.
The back-end of Pocoweb is written in c++ and depends on the following (ubuntu) libraries and tools (see also the dependency listing):
- libboost-all-dev
- make
- g++
- git
- cmake
- libmysqlclient-dev
- mysql-client
- libleptonica-dev
- libssl-dev
- libcurl4-gnutls-dev
- python-pyparsing
If you are on ubuntu, you can do
sudo apt-get install $(cat misc/docker/pocoweb/dependencies)
to automatically install all dependencies.
Pocoweb manages some of its internal dependencies using git
submodules. It depends on the following git submodules (found in the
modules
directory):
- crow
- pugixml
- sqlpp11
- sqlpp11-connector-mysql
- date
- utfcpp
In order to load the internal dependencies, in the source directory of
Pocoweb issue the following command: git submodule update --init
.
This will clone the submodules and download the internal dependencies
of Pocoweb. You must download all internal dependencies before you can
proceed to build Pocoweb.
After all dependencies have been installed (see above),
build the back-end using the command make
(use -jN
to speed up the
compilation, where N
stands for the number of parallel build processes).
After the compilation has finished, you should test the back-end.
Execute Pocoweb's unit tests with make test
.
Pocoweb uses a mysql database to store its data. You need to create a
database called pocoweb
with an according user account and use the
according settings to start pocoweb.
In order to create the tables for Pocoweb you can issue the command:
mysql -h dbhost -u dbuser -p < db/tables.sql
.
You have to insert the password for your mysql user.
Pocoweb's front-end is written in Javascript. They reside under
/srv/pocoweb/www-data/
. To update them use make -C frontend install
. If you use a custom path use make -C frontend INSTALL_DIR=my/custom/path/to/www-data install
.
rest/src
contains the back-end c++ implementationdb
contains the database table definitionsfrontend
contains the web frontend filesfrontend/public_html
contains images, html, javascript front-end filesmake
contains various helper makefilesmodules
contains the git submodulesservices
contains helper services (profiler, http-server, ...) need to run pocoweb
- connect to the database requires the following steps:
- find the container's ID of the mariadb container using
docker ps
or use the container's default namedocker_db_1
- find the container's IP address using
docker inspect --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
docker inspect $(docker ps | grep mariadb | awk -e '{print $1}')
prints all information about the database docker container- from the server connect to the database container with the
according ip address
mysql -h 172.18.0.2 -u pocoweb -p pocoweb
- get info about mariadb container:
sudo docker inspect $(sudo docker ps | awk -e '/mariadb/{print $1}')
- Connect to a running docker container:
- Get container ID:
docker ps
- Connect to running container:
docker exec -it <Container-ID> bash
- Get container ID:
- find the container's ID of the mariadb container using
Attributed is free software, and may be redistributed under the terms specified in the LICENSE file.