Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: docker #3909

Closed
wants to merge 15 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
# Invenio is free software; you can redistribute it and/or modify it
# under the terms of the MIT License; see LICENSE file for more details.


include *.rst
include *.sh
include *.txt
Expand Down
11 changes: 7 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,14 @@

**Open Source framework for large-scale digital repositories.**

.. image:: https://img.shields.io/travis/inveniosoftware/invenio.svg
:target: https://travis-ci.org/inveniosoftware/invenio
.. image:: https://img.shields.io/pypi/v/invenio.svg
:target: https://pypi.org/project/invenio/

.. image:: https://img.shields.io/coveralls/inveniosoftware/invenio.svg
:target: https://coveralls.io/r/inveniosoftware/invenio
.. image:: https://img.shields.io/github/license/inveniosoftware/invenio.svg
:target: https://github.com/inveniosoftware/invenio/blob/master/LICENSE

.. image:: https://travis-ci.org/inveniosoftware/invenio.svg?branch=master
:target: https://travis-ci.org/inveniosoftware/invenio

.. image:: https://badges.gitter.im/Join%20Chat.svg
:target: https://gitter.im/inveniosoftware/invenio
Expand Down
Binary file added docs/architecture/_static/infrastructure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/architecture/_static/infrastructure.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<mxfile modified="2018-12-13T11:04:38.770Z" host="www.draw.io" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/9.3.1 Chrome/66.0.3359.181 Electron/3.0.6 Safari/537.36" etag="PtbgYbnZo9TO3ZzdTmO0" version="9.6.2" type="device"><diagram name="Page-1" id="822b0af5-4adb-64df-f703-e8dfc1f81529">7Vxbc9o4GP01PNZj3aXHJul2H5pOZ+hsZ/elI4wC3hpEfUlIf/3K+AK2TOqAwSaLHwjIulg6R9J3Pn3xCN0u1h9DuZrf66kKRtCdrkfobgQhIJyZP2nKc57iEpqlzEJ/mqdtE8b+L1VkzFMTf6qiSsZY6yD2V9VETy+XyosraTIM9VM124MOqq2u5ExZCWNPBnbqN38az/NU4LrbG38qfzbPm+YkvzGR3o9ZqJNl3t4IoofNld1eyKKuPH80l1P9tJOEPozQbah1nH1brG9VkA5uMWxZuT/23C2fO1TLuE2Bd//81E+fl/d/ffsaivv7z19+/v3xHcxqeZRBko/HJy2nm94FcumpMH/2+LkYr0cVxr4Zvk9yooIvOvJjXy/NrYmOY70YoRsZrTKcHvy1Mm3fzONFYH4C87Uo/D7wZ2mhWK9MahSH+oe61YEOTdpSL1VaTZ7FM90zj4FudBIH/tJkK3jgpkXncpU+2GI9S/npeH4c+msHJN9h8j1S4eOmaN5J07ha7x09UGJiyK70QsXhs8mSF8CYZ0VynpewPm1Jw/K5MN+hC8qzyZyms7LmLVTmS47WK5AjV+TaIUfYwJCjFnLf1MTUJFerFLCs61fwMhAQqICHaM/gsSt47ddMOjDw+BW89sumGBh44gpea/AYHBh4AFjo3clYTmSkRpAGpvmbiUGPztJv9zJKB9FKv0x0p0U/u9kQC92UI4s5tZDF0CE2tuRk2NoiYqxk6M1NmlrOzDjZUHpBcsV4774JUQHggFBGFsr3KopSdQ3dn4lKrii/coNFbHAYYwvjW+nNr8i+DllK3AHOX9thMI51mM3fK8CvM68oHhy8DV4FHf5oRvBtYHp6bwPFfZvNDe6GvaheMWxyOvSPYYPX4Yrhy6aRGBqGDc6HHMMrYk0Oh94Rg7ZcsbBSy+n79AQxHcZARpHvVeHICqipdYBYGxVTqU5CT730OFm+WIYzFb+QjzaP8s4oFhZEk1URqkDG/mP1cZuGNm/hi/ZNR3Z8C1UQGauhk3UzL7UFyK6otgYzWKsoGwerog3SZbePAN/WMZcAPusTfMyRIyiGnDCQftYOzeihVBDQcVFeZ/pZqRbVGXZqYtgi6BKIwXslhmAOFyUvGKpqn4OZQVzuAHdLDEr7pYYtoC6BGqJPahAAHA5LBDnuiBoUYgewknGgZ2bYIuzu89gkBLsBF/5yZvMlCPzVxulQGGJeoJPU6Hua+7Ear+SGA0/GNmsi0pEBK6zmhBK2SQZEAzn4yWwyWwn1OsVYyzlGep1jkDisCuQR86p6eGSZ36eeSbaMuggCwF4JwLDjusIVAplLFGNYxFsy7rgQmBuGKCxdNA+kBsDOzopbs9POTZSiuWOIMpXRPBXYXbOm19WA1o5/D14KcK0iVK/o1AjbAQHDQnjoeo26wGFAlFdHeg3XtohzG1sIXj4vepVrFJxEx9cXnrOvFx348/rmRa9ajcKajO+IF6iq4qsxDWdfPYbl+KMtiQH2SLk35fnDVRFzdmoMy/XXmhq9ag/EiAM5gLVoFvdQSmDeVJ2g52WC7c+5CCagXpnAmcOEoAgiAgUuYj6PZQTGriMAg8xUCRjF9cCac1PD9lDtD1MeB/KxIdni0kUcCXcbIFX6C4YTIIWG5Xsqjlp/O+1Rr2riTW4AhXNrIFRovQHgPplgMHcQ4C7AzCWcF0N2LCMIrioG4hKHcs6w2RQAJxidlxnDihppzYxefZLGGHA4okhkxKidDx/MDErrUdTpcSNGHDHTiuENKYyF35DDoCWfd7Kt0gzR/v4wWN+8Kv9Yb75kNXbLvBbK9SIMiSRSYdRROK5bxR+4pcmww2yKG+2I2jlIZ3YEHpaOLHjz+zOsXr3Vb9OOGJaQBJfhbnqbVBhWaEN7KsC+qUAAoTXwaF0EtudC6uN26/URfGYyDExqXo6HCSFj8gO6uVjVjjyYFBgDoygIo4hvrmr8xLm5QTo49+6FG72KTySEAyjAhKPNBasRgQdzw1KfiDkQAAFdAoXRuu3U52sFBhFVgUHOITAIPJ54/yaL1TjPnmuF8q1mbl/E7Fn7uo7gGLGOdrAG0YuAA4x5RAVmZm/jp6EkrVNSnIOStrfFcZwWHnUjKeMqM0MV+b/kZJMhZWLeX5Ob3IzIXSpqk9go5s3bB8GOxg3UQ7xfHa9kGtf7Nf1x9w53o2rLFwiWkRS2pkUNjK3/D013b5KzI9//DzDUjigoPhkM5uf2NY/Z5Nm+TBN9+A8=</diagram></mxfile>
File renamed without changes.
211 changes: 211 additions & 0 deletions docs/architecture/infrastructure.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
..
This file is part of Invenio.
Copyright (C) 2018 CERN.

Invenio is free software; you can redistribute it and/or modify it
under the terms of the MIT License; see LICENSE file for more details.

Infrastructure architecture
===========================
This guide provides a general overview of the Invenio infrastructure
architecture. It is not meant to be a comprehensive guide for each subsystem.

Over all, the Invenio infrastructure is a pretty standard web application
infrastructure. It consists of:

- **Load balancers:** HAProxy, Nginx or others.
- **Web servers:** Nginx, Apache or others.
- **Application servers:** UWSGI, Gunicorn or mod_wsgi.
- **Distributed task queue:** Celery
- **Database:** PostgreSQL, MySQL or SQLite.
- **Search engine:** Elasticsearch (v5 and v6).
- **Message queue:** RabbitMQ, Redis or Amazon SQS.
- **Cache system:** Redis or Memcache.
- **Storage system:** Local, S3, XRootD, WebDAV and more.

.. image:: _static/infrastructure.png
:align: center

Request handling
----------------
A client making a request to Invenio will usually first hit a load balancer.
For high availability you can have more load balancers and balance traffic
between them with e.g. DNS load balancing.

Load balancer
~~~~~~~~~~~~~

**Request types**

The load balancer usually (if it supports SSL termination) allows you to split
traffic into three categories of requests:

- static files requests: e.g. javascript assets
- application requests: e.g. search queries
- record files requests: e.g. downloading very large files

This way you can dimension the number connection slots between different types
of requests according to available resources. For instance a static file
request can usually be served extremely efficient, while an application request
usually takes longer and requires more memory.

Similar, downloading a very file depends on the clients available bandwidth
and can thus take up a connection slot for a significant amount time. If your
storage system supports it, it is possible with Invenio to completely offload
the serving of large files to your storage system (e.g. S3).

All in all, the primary job of the load balancer is to manage traffic to your
servers according to available resources. For instance during traffic floods
the load balancer takes care of queue requests to the webservers.

**Backup pages**

A load balancer can also direct traffic to a static backup site in case your
main web servers a down. This is useful in order to communicate with users
during major incidents.

Web servers
~~~~~~~~~~~
The load balancer proxies traffic to one of several web servers. The web
server's primary job is to manage the connections into your application server.
A web server like Apache and Nginx is usually much better than an application
server to manage connections. Also, you can use the web server to configure
limits on specific parts of your application so that for instance you can
upload a 1TB file on the Files REST API, but not on the search REST API.


Application servers
~~~~~~~~~~~~~~~~~~~
The web server proxies traffic usually (but not necessarily) to a single
application server running on the same machine. The application server
is responsible for hanlding the application requests. Invenio is a Python
application, and thus make use of the WSGI standard. There exists several
application servers capable of running WSGI python application, e.g. Gunicorn,
uWSGI and mod_wsgi.

Storing records
---------------
Invenio store records as JSON documents in an SQL database. Most modern SQL
databases today have a JSON type, that can efficiently store JSON documents in
a binary format.

**Transactional databases**

The primary reason using an SQL database is that they provide transactions,
which is very important since data consistency for a repository is of outmost
importance. Also, database servers can handle very large amounts of data
as long as they are scaled and configured properly. Last but not least, they
are usually highly reliable as compared to some NoSQL solutions.

**Primary key lookups**

Most access from Invenio to the database is via primary key look ups, which
are usually very efficient in database. Search queries and the like are all
sent to the search engine cluster which can provide much better performance
than a database.

Search and indexing
-------------------
Invenio uses Elasticsearch as it's underlying search engine since Elasticsearch
is fully JSON-based, and thus fit well together with storing records internally
in the database as JSON documents.

Elasticsearch furthermore is highly scalable and provide very powerful search
and aggergation capabilities. You can for instance make geospatial queries with
Elasticsearch.

Direct indexing
~~~~~~~~~~~~~~~
Invenio has the option to directly index a record in Elasticsearch when
handling a request, and thus make the record immediately available for
searches.

**Bulk indexing**

In addition to direct indexing, Invenio can also do bulk indexing which is
significantly more efficient when indexing large number of records. The bulk
indexing works by the application sending a message to the message queue, and
at regular intervals a background job will consume the queue and index the
records. Also, several bulk indexing jobs can run concurrently at the same time
on multiple worker nodes and thus you can achieve very high indexing rates
during bulk indexing.

Background processing
---------------------
Invenio relies on an application called Celery for distributed background
processing. In order for an application server to reply faster to a request,
it can offload some task to asynchronous jobs. It works by the application
sending a message to the message queue (e.g. RabbitMQ), which several Celery
worker nodes continuously consume tasks from.

An example of background tasks can for instance be sending an email or
registering a DOI.

**Multiple queues**

The background processing have support for multiple queues and advanced
workflows. You could for instance have a low priority queue that constantly
run x number of file integrity checks per day, and another normal queue which
for other tasks like DOI registration.

**Cronjobs and retries**

Celery also have supports for running jobs at scheduled intervals as well as
retrying tasks in case the fail (e.g. if a remote service is temporarily down).

Caching and temporary storage
-----------------------------
Invenio uses an in-memory cache like Redis or Memcache for fast temporary
storage. The cache is for instance used for:

- User session storage
- Results from background jobs
- Caching rendered pages

Storing files
-------------
Invenio comes with a default object storage REST API to expose files.
Underneath the hood, Invenio can however store files in multiple different
storage systems due to a simple storage abstraction layer. Also, it possible
to completely by-pass the Invenio object storage and directly use another
storage system like S3. In this case, you just have to be careful to manage
access correctly on the external system.

**Multiple storage systems**
One force of Invenio is that you can store files on multiple systems at the
same time. This is useful if you for instance need to use muliple system or
do live migration from one system to another.

Running with Docker
-------------------
Start by creating an Invenio instance with `cookiecutter-invenio-instance <https://github.com/inveniosoftware/cookiecutter-invenio-instance>`_.

Among the generated files, there are two Dockerfiles `Dockerfile.base <https://github.com/inveniosoftware/cookiecutter-invenio-instance/blob/master/%7B%7Bcookiecutter.project_shortname%7D%7D/Dockerfile.base>`_
and `Dockerfile <https://github.com/inveniosoftware/cookiecutter-invenio-instance/blob/master/%7B%7Bcookiecutter.project_shortname%7D%7D/Dockerfile>`_,
which are used in that sequence to build the image for your application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that sequence is this phrase needed?


The complete process is depicted in the following diagram:

.. image:: resources/dockerfile-build-process.png
:align: center

In respect to the diagram above the commands for each step of the process
1. For the first step you don't have to take any actions since it is build with `docker-invenio <https://github.com/inveniosoftware/docker-invenio>`_ repository
2. :code:`docker build -f Dockerfile.base -t my-site-base .`
3. :code:`docker build -f Dockerfile -t my-site .`

The last image is a ready-to-run Invenio instance. If you wish to install
extra dependencies that are not included in the Dockerfiles, you can create
a Dockerfile based on Dockerfile.base to handle the modifications.

Leveraging Docker image layer caching can offer a significant speedup in your
development process. This is the reason for maintaining two Dockerfiles,
Dockerfile.base to create an image with the installed dependencies, and the
final Dockerfile to install the application code and rebuild the static
assets, which tend to change more frequently.

In order to incorporate your latest changes, you can repeat the last step (3)
of the build process.

You can find more information on base images and how to incorporate your latest
dependencies in repository.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions docs/community/_getting-help.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
..
This file is part of Invenio.
Copyright (C) 2017-2018 CERN.

Invenio is free software; you can redistribute it and/or modify it
under the terms of the MIT License; see LICENSE file for more details.

Getting help
============

Didn't find a solution to your problem the Invenio documentation? Here's how
you can get in touch with other users and developers:

.. rubric:: Forum/Knowledge base

- https://github.com/inveniosoftware/troubleshooting

Ask questions or browse answers to exsiting questions.

.. rubric:: Chatroom

- https://gitter.im/inveniosoftware/invenio

Probably the fastest way to get a reply is to join our chatroom. Here most
developers and maintainers of Invenio hangout during their regular working
hours.

.. rubric:: GitHub

- https://github.com/inveniosoftware

If you have feature requests or want to report potential bug, you can do it by
opening an issue in one of the individual Invenio module repositories. In each
repository there is a ``MAINTAINERS`` file in the root, which lists the who
is maintaining the module.
21 changes: 21 additions & 0 deletions docs/community/development-environment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,24 @@ The key plugins you should look for in your editor of choice are:

.. todo: docker, git, cli tools (hub), git aliases, getting pull-requests,
virtualenv, virtualenv-wrapper, debugging pdb/ipdb, homebrew


Working with Git and GitHub
---------------------------
There are a couple of utilities that allow you to work more efficiently with
Git and GitHub.

Hub
~~~
`Hub <https://hub.github.com>`_ is a command-line wrapper for git that makes it
easier to work with GitHub. See the
`installation instructions <https://hub.github.com>`_ for how install Hub.

Here is a short overview of possibilities:

```console
# Clone one of your personal repositories from GitHub
$ git clone invenio-app
# Fetch the upstream inveniosoftware/invenio-app
$ git fetch inveniosoftware
```
29 changes: 1 addition & 28 deletions docs/community/getting-help.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,31 +7,4 @@

.. _getting-help:

Getting help
============

Didn't find a solution to your problem the Invenio documentation? Here's how
you can get in touch with other users and developers:

.. rubric:: Forum/Knowledge base

- https://github.com/inveniosoftware/troubleshooting

Ask questions or browse answers to exsiting questions.

.. rubric:: Chatroom

- https://gitter.im/inveniosoftware/invenio

Probably the fastest way to get a reply is to join our chatroom. Here most
developers and maintainers of Invenio hangout during their regular working
hours.

.. rubric:: GitHub

- https://github.com/inveniosoftware

If you have feature requests or want to report potential bug, you can do it by
opening an issue in one of the individual Invenio module repositories. In each
repository there is a ``MAINTAINERS`` file in the root, which lists the who
is maintaining the module.
.. include:: _getting-help.rst
Loading