Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a Dockerfile to pycsw repo #534

Merged

Conversation

ricardogsilva
Copy link
Member

@ricardogsilva ricardogsilva commented Jun 14, 2017

Overview

This PR adds a Dockerfile to pycsw.
This work may be continued afterwards in order to enable automated builds on docker hub.

Related Issue / Discussion

Contributions and Licensing

(as per https://github.com/geopython/pycsw/blob/master/CONTRIBUTING.rst#contributions-and-licensing)

  • I have already previously agreed to the pycsw Contributions and Licensing Guidelines

@codecov
Copy link

codecov bot commented Jun 14, 2017

Codecov Report

Merging #534 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #534   +/-   ##
=======================================
  Coverage   55.93%   55.93%           
=======================================
  Files          29       29           
  Lines        6338     6338           
  Branches     1342     1342           
=======================================
  Hits         3545     3545           
- Misses       2412     2413    +1     
+ Partials      381      380    -1
Flag Coverage Δ
#integrationtests 54.57% <ø> (ø) ⬆️
#unittests 7.63% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pycsw/ogc/fes/fes2.py 39.13% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 310f2fd...782d4eb. Read the comment docs.

@ricardogsilva
Copy link
Member Author

For now this image can be tested by running:

cd pycsw
PYCSW_VERSION=$(cat VERSION.txt)

# build the image (later on it will be available on docker hub)
docker build -t geopython/pycsw:$PYCSW_VERSION .

# launch a container that uses the default config
docker run --rm -ti -p 8000:8000 geopython/pycsw:$PYCSW_VERSION
# pycsw is now running on port 8000, stop the docker container by pressing ctrl+c

A more involved setup, where we create a custom configuration file and also set up a docker volume to hold the database:

# copy and tweak the config file, 
# specifying repository.url as sqlite:////home/pycsw/data/pycsw-db.sqlite
cp default-sample.cfg ~/my-custom-config.cfg

#create a docker volume in order to persist the data even when the container is destroyed
docker volume create my-pycsw-data

# run a container with our custom settings and data volume
docker run \
    --name pycsw-test \
    --detach \
    --publish 8000:8000 \
    --volume $HOME/my-custom-config.cfg:/etc/pycsw/pycsw.cfg
    --volume my-pycsw-data:/home/pycsw/data
    geopython/pycsw:$PYCSW_VERSION

# use pycsw-admin to initialize the data repository
docker exec -ti pycsw-test pycsw-admin.py  -f /etc/pycsw/pycsw.cfg -c setup_db

# inspect logs
docker logs -f pycsw-test

# pycsw is running on port 8000 of the host, stop it by running
# docker rm -f pycsw-test

Things to notice:

  • The image:
    • Creates and makes use of an unprivileged pycsw user;
    • Is based on alpine linux which makes for small-sized images (currently at about 400MB uncompressed)
    • Uses python 3.5
  • pycsw runs in port 8000 of the container, you can then publish this port to a port on the host (in these examples I'm using port 8000 on the host too, but it could be different)
  • The default configuration uses the tests/functionaltests/suites/cite/data/cite.db sqlite database
  • It uses the PYCSW_CONFIG environment variable in order to get the configuration for pycsw. The default value is /etc/pycsw/pycsw.cfg, but this can be configured by using the -e flag of the docker run command. It is possible to mount this file as a docker volume and supply a custom configuration, as shown on the second example
  • If using an sqlite database it is possible to mount it as a docker volume so that it becomes persistent
  • It installs requirements-pg.txt and also the external dependencies enabling also the usage of postgres repositories. I plan to also include a docker-compose file that uses postgres in order to demo this feature;
  • The image uses gunicorn instead of python's wsgiref.simple_server - This is a more robust server for production settings
  • gunicorn logs are redirected to stdout, as is customary in docker.

@ricardogsilva
Copy link
Member Author

ricardogsilva commented Jun 16, 2017

Running the tests on a container can be done by:

  • running the container as root
  • installing the dev requirements
  • changing back to the pycsw user
  • running the tests
$ docker run \
    --rm 
    -ti 
    --user root 
    --entrypoint sh 
    geopython/pycsw:$PYCSW_VERSION

pip3 install -r requirements-dev.txt
su pycsw
py.test -m unit
py.test -m functional -k 'not harvesting'

@kalxas kalxas self-requested a review June 16, 2017 17:03
@ricardogsilva
Copy link
Member Author

This PR is nearing completion.

I'm still working on ensuring that the image works OK with postgis containers, but other than that it seems to be going good.

I plan to ask for review soon

@ricardogsilva ricardogsilva changed the title [WIP] - Adds a Dockerfile to pycsw repo Adds a Dockerfile to pycsw repo Aug 30, 2017
@ricardogsilva
Copy link
Member Author

@kalxas

I think this PR is ready. Could you please review?

Notable stuff:

  • The PR features a Dockerfile that builds a pycsw docker image. The image is fronted by an entrypoint.py module that acts as the launcher for pycsw inside the image. This module was necessary in order to ensure that a proper database is created/exists when the docker container is launched. This is a common technique found in other docker images. In our case I chose to implement the entrypoint script directly in Python, rather than using shell scripting.

    The image can be used by running:

    # build the image
    docker build -t geopython/pycsw .
    # run a container
    docker run --rm --name pycsw -p 8000:8000 geopython/pycsw
    

    In the future we'll be able to setup automated builds on docker hub so that a new image will automatically be built whenever we tag a release.

  • It is possible to use a custom pycsw configuration file by mounting it as a volume at /etc/pycsw/pycsw.cfg:

    docker run \
      --rm \
      --name pycsw \
      -p 8000:8000 \
      -v <custom-pycsw-config-file>:/etc/pycsw/pycsw.cfg \
      geopython/pycsw
    
  • The image creates and uses a user named pycsw with an uid of 1000. This user has a proper home directory at /home/pycsw.

  • The image supports both sqlite and postgresql databases. The default configuration will load the sqlite database that has test data for the CITE suite. A new sqlite database should be created inside /home/pycsw.

  • I've included a docker-stack.yml file that can be used by docker-compose or in docker swarms. This file demos using the pycsw image together with a postgis image. I'm hoping that by making it easy to set up a pycsw + postgis combo we'd be finally able to get all functional tests passing with postgis.

  • I've chosen to make this image be based on alpine linux. This makes for quite smaller images than using debian or ubuntu as a base.

  • It is using gunicorn to run pycsw, as opposed to the wsgiref.simple_server that is used when running pycsw's pycsw/core/wsgi.py standalone.

Missing stuff for closing #530:

  • I cannot seem to be able to create the geopython/pycsw repository on docker hub. Perhaps @tomkralidis must do it himself?

  • I have intentionally not updated the docs yet, since we do not have docker hub support done. They'll need to be updated too, later on.

@tomkralidis
Copy link
Member

@ricardogsilva thanks for the info. I've created (an empty) Docker project at https://hub.docker.com/r/geopython/pycsw/ and assigned a pycsw team (currently yourself, @kalxas and myself) with write access. Let me know if this works for you. We need to update the project short and full description on the Docker page (or does this get picked up from somewhere in the Dockerfile or pycsw GitHub repo?).

@ricardogsilva
Copy link
Member Author

@tomkralidis Thanks for that :)

I think I'll wait for @kalxas review before proceeding with the setup of the dockerhub repo.

"--error-logfile=-",
"--workers={}".format(gunicorn_workers)
]
pycsw_server_command.append("--workers={}".format(gunicorn_workers))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not needed, I forgot to delete it. I'll do it shortly



def _create_pycsw_schema(database_url, table):
admin.setup_db(database=database_url, table=table, home="/home/pycsw")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end this function became shorter than intended. I guess I'll remove it and just use the single line above

- db-data:/var/lib/postgresql/data/pgdata

pycsw:
image: ricardogsilva/pycsw:${PYCSW_DOCKER_VERSION}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image identifier is wrong, I should be using geopython/pycsw instead of ricardogsilva/pycsw. I'll update this too.

Copy link
Member Author

@ricardogsilva ricardogsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I still have some final touch ups to do.

logger.debug("Reading pycsw config...")
config = SafeConfigParser()
config.read("/etc/pycsw/pycsw.cfg")
db_url = config.get("repository", "database")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should retrieve config from the environment instead of assuming it is still the default value


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--verbose", action="store_true")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull log level from the pycsw config instead of making it a call flag

@kalxas
Copy link
Member

kalxas commented Sep 3, 2017

Hi @ricardogsilva
Thanks for working on this!
I just got back from vacations, I added this review on my todo list.

@ricardogsilva ricardogsilva changed the title Adds a Dockerfile to pycsw repo [WIP] Adds a Dockerfile to pycsw repo Sep 7, 2017
@ricardogsilva
Copy link
Member Author

@kalxas

I changed the status of this PR back to WIP because I am having trouble mounting volumes for being able to persist sqlite databases. I'll take a closer look at it later on

@kalxas
Copy link
Member

kalxas commented Sep 7, 2017

Thanks @ricardogsilva

There was a problem with using the python:3.5-alpine image together
with geos and shapely. The `shapely.wkt.loads` function
could crash with a segmentation fault if it received an
invalid input (we have a unit test that checks this)
- Docker image does not keep pycsw source code in a local dir anymore
- entrypoint script is starting to work with postgresql as well
- added docker-stack file in order to use docker-compose and docker swarms
- entrypoint script now makes use of the PYCSW_CONFIG environment
variable
- log level for the entrypoint script is taken from pycsw config
Now providing a custom pycsw.cfg inside the docker directory
@ricardogsilva ricardogsilva force-pushed the 530-provide-a-canonical-docker-image branch from 4088e74 to 782d4eb Compare October 11, 2017 21:45
@ricardogsilva ricardogsilva changed the title [WIP] Adds a Dockerfile to pycsw repo Adds a Dockerfile to pycsw repo Oct 11, 2017
@ricardogsilva
Copy link
Member Author

ricardogsilva commented Oct 11, 2017

Allright I think this is ready.

Custom sqlite databases can be mounted on /var/lib/pycsw. For example, one can produce a custom pycsw.cfg file that specifies that an sqlite database will be used. This database must be placed under /var/lib/pycsw/ - for example /var/lib/pycsw/records.db. The following works:

# build the docker image
docker build -t geopython/pycsw:$(cat VERSION.txt) .

# create a docker volume
docker volume create pycsw-database

# run a container, mounting a custom pycsw config and the previously created volume too
docker run \
    -v <local-path-to-custom-pycsw.cfg>:/etc/pycsw/pycsw.cfg \
    -v pycsw-database:/var/lib/pycsw \
    -p 8000:8000 \
    geopython/pycsw:$(cat VERSION.txt)

In the following example, a new volume is created by docker to store the contents of the pycsw sqlite database. This volume is persisted after the container is destroyed.

@kalxas can you pease review, at your convenience?

@kalxas kalxas merged commit c182204 into geopython:master Oct 16, 2017
This was referenced Oct 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants