This is Data Catalogue for eFlows4HPC project
This work has been supported by the eFlows4HPC project, contract #955558. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Spain, Germany, France, Italy, Poland, Switzerland, Norway.
The project has recieved funding from German Federal Ministry of Education and Research agreement no. 16GPC016K.
Find architecture in arch folder.
This part is the frontend for the Data Catalogue. It will be the user interface, so that no one is forced to manually do http calls to the api. Since the content is managed by the api-server, this can be deployed as a static website, containing only html, css and javascript. To make the different pages more uniform and avoid duplicate code, the static pages will be generated by the jinja2 template engine.
To compile the static pages to the ./site/
directory (will be created if required), simply run
pip install -r requirements.txt
python frontend/createStatic.py
The site can then be deployed to any webserver that is capable of serving files, as no other server functionality is strictly required. However, in a proper deployment, access and certificates should be considered.
For development (and only for development), an easy way to deploay a local server is
python -m http.server <localport> --directory site/
The python http.server package should not be used for deployment, as it does not ensure that current security standards are met, and is only intended for local testing.
This part is the the API-server for the Data Catalogue, which will provide the backend functionality.
It is implemented via fastAPI and provides an api documentation via openAPI.
For deployment via docker, a docker image is included.
Some server settings can be changed. This can either be used during testing, so that a test api server can be launched with testing data, or for deployment, if the appdata or the userdb is not in the default location.
These settings can be set either via environment variables, changed in the apiserver/config.env
file, or a different .env
file can be configured via the DATACATALOG_API_DOTENV_FILE_PATH
environment variable.
At the moment, the settings are considered at launch, and can not be updated while the server is running.
Variable Name | Default Value | Description |
---|---|---|
DATACATALOG_API_DOTENV_FILE_PATH | apiserver/config.env |
Location of the .env file considered at launch |
DATACATALOG_APISERVER_JSON_STORAGE_PATH | ./app/data |
Directory where the data (i.e. dataset info) is stored |
DATACATALOG_APISERVER_USERDB_PATH | ./app/userdb.json |
Location of the .json file containing the accounts |
DATACATALOG_APISERVER_CLIENT_ID | Client ID for a configured OIDC server | |
DATACATALOG_APISERVER_CLIENT_SECRET | Client Secret for a configured OIDC server | |
DATACATALOG_APISERVER_SERVER_METADATA_URL | Metadata URL for a configured OIDC server |
There is also the logging configuration to consider:
The apiserver/log_conf.yaml
contains the settings for the loggers. Information on how to change these settings can be found here.
Certain operations will only be possible, if the request is authenticated. The API has an endpoint at /token
where a username/password login is possible. The endpoint will return a token, which is valid for 1 hour. This token has to be provided with every api call that requires authentication. Currently, these calls are GET /me
- PUT /dataset
- PUT /dataset/dataset-id
- DELETE /dataset/dataset-id
. The passwords are stored as bcrypt hashes and are not visible to anyone.
A CLI is provided for server admins to add new users. It will soon be extended to allow direct hash entry, so that the user does not have to provide their password in clear text.
For testing, a default userdb.json is provided with a single user "testuser" with the password "test".
If the api-server is running, you can see the documentation at <server-url>/docs
or <server-url>/redoc
.
These pages can also be used as a clunky frontend, allowing the authentication and execution of all api functions.
First ensure that your python version is 3.6 or newer.
Then, if they are not yet installed on your machine, install the requirements via pip:
pip install -r requirements.txt
To start the server, run
uvicorn apiserver:app --reload --reload-dir apiserver
while in the project root directory.
Without any other options, this starts your server on <localhost:8000>
.
The --reload --reload-dir apiserver
options ensure, that any changes to files in the apiserver
-directory will cause an immediate reload of the server, which is especially useful during development. If this is not required, just don't include the options.
If you want to have more detailed logs and/ or store the logs in a file, add the --log-level (debug|info|...) --log-config=./apiserver/log_conf.yaml
options.
The details of the logging behavior can be changed via the ./apiserver/log_conf.yaml
file, and most logging entries will be debug
or info
.
More information about uvicorn settings (including information about how to bind to other network interfaces or ports) can be found here.
First ensure that the pytest
package is installed (It is included in the testing_requirements.txt
).
Tests are located in the apiserver_tests
directory. They can be executed by simply running pytest
while in the project folder. You can also use
nose for test (also included in testing_requirements.txt
), for instance for tests with coverage report in html format run following:
nosetests --with-coverage --cover-package=apiserver --cover-html
If more test-files should be added, they should be named with a test_
prefix and put into a similarily named folder, so that they can be auto-detected.
The context.py
file helps with importing the apiserver-packages, so that the tests function independent of the local python path setup.
To build the docker image of the current version, simply run
docker build -t datacatalog-apiserver -f ./apiserver/Dockerfile .
while in the project root directory.
datacatalog-apiserver
is a local tag to identify the built docker image. You can change it if you want.
To run the docker image in a local container, run
docker run -d --name <container_name> -p <local_port>:8000 datacatalog-apiserver
<container_name>
is the name of your container, that can be used to refer to it with other docker commands.
<local_port>
is the port of your local machine, which will be forwarded to the docker container. For example, if it is set to 8080
, you will be able to reach the api-server at http://localhost:8080.
For more production ready deployments consider using --restart=always
flag, as well as inject path for data:
docker run -d --name <container_name> --restart=always -v /localvol/:/app/data/ -p <local_port>:8000 datacatalog-apiserver
To stop the docker image, run
docker stop <container name>
Note, that this will only stop the container, and not delete it fully. To do that, run
docker rm <container name>
For more information about docker, please see the docker docs
The gitlab repository is set up to automatically build the datacat image and deploy to the production and testing environment. The pipeline and jobs for this are defined in the .gitlab-ci.yml file. In general, pushes to the master branch update the testing deployment, and tags containing "stable" update the production deployment.
To avoid unneeded downtime, the VMs hosting the deployments are usuallly not re-created, and instead only the updated docker image, as well as updated config is uploaded to the VM. After this, the docker containers are restarted.
If a "full-deployment" is required (i.e. the VMs shuld be newly created), the pipeline has to be started with a variable MANUAL_FULL_DEPLOY=true
. This can be done while starting the pipeline via the web interface.