Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Commit

Permalink
XP-456: replace the CLI arguments with a config file
Browse files Browse the repository at this point in the history
Cc: finiteprods <kwok.doc@gmail.com>
Cc: Robert Steiner <robertt.debug@gmail.com>
Cc: janpetschexain <58227040+janpetschexain@users.noreply.github.com>

References
==========

https://xainag.atlassian.net/browse/XP-456
https://xainag.atlassian.net/browse/DO-58

Rationale
=========

The CLI is getting complex so it is worth loading the configuration
from a file instead.

Implementation details
======================

TOML
----

We decided to use TOML for the following reasons:

  - it is human friendly, ie easy to read and write
  - our configuration has a pretty flat structure which makes TOML
    quite adapted
  - it is well specified and has lots of implementation
  - it is well known

The other options we considered:

  - INI: it is quite frequent in the Python ecosystem to use INI for
    config files, and the standard library even provides support for
    this. However, INI is not as powerful as TOML and does not have a
    specification
  - JSON: it is very popular but is not human friendly. For instance,
    it does not support comments, is very verbose, and breaks
    easily (if a trailing comma is forgotten at the end of a list for
    instance)
  - YAML: another popular choice, but is in my opinion more complex
    than TOML.

Validation
----------

We use the third-party `schema` library to validate the
configuration. It provides a convenient way to:

- declare a schema to validate our config
- leverage third-party libraries to validate some inputs (we use the
  `idna` library to validate hostnames)
- define our own validators
- transform data after it has been validated: this can be useful to
  turn a relative path into an absolute one for example
- provide user friendly error message when the configuration is
  invalid

The `Config` class
------------------

By default, the `schema` library returns a dictionary containing a
valid configuration, but that is not convenient to manipulate in
Python. Therefore, we dynamically create a `Config` class from the
configuration schema, and instantiate a `Config` object from the data
returned by the `schema` validator.

Package re-organization
-----------------------

We moved the command line and config file logic into its own `config`
sub-package, and moved the former `xain_fl.cli.main` entrypoint into
the `xain_fl.__main__` module.

Docker infrastructure
---------------------

- Cache the xain_fl dependencies. This considerably reduces
  "edit->build-> debug" cycle, since installing the dependencies takes
  about 30 minutes.
- Move all the docker related files into the `docker/` directory

Current limitations and future work
-----------------------------------

1. The documentation generated for the `ServerConfig`,
   `AiConfig` and `StorageConfig`  classes is wrong. Each attribute is
   documented as "Alias for field number X". This can be fixed by
   having `create_class_from_schema()` setting the `__doc__` attribute
   for each attribute. However, we won't be able to automatically
   document the type of each attribute.

2. When the configuration contains an invalid value, the error message
   we generate does not contain the invalid value in question. I think
   it is possible to enable this in the future but haven't really
   looked into it.
  • Loading branch information
little-dude committed Jan 28, 2020
1 parent d8cbbf4 commit 326f8e3
Show file tree
Hide file tree
Showing 17 changed files with 811 additions and 265 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,7 @@ dmypy.json
# Editors
.idea
.vscode

# In the README, we document how to use custom config. Let's ignore
# the files mentioned in that section
custom_config.toml
3 changes: 3 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ ENV USER="xain"
ENV HOST="0.0.0.0"
ENV PORT="50051"
ENV PATH="/home/${USER}/.local/bin:${PATH}"
ENV CONFIG_FILE="/app/xain-fl.toml"

RUN addgroup -S ${USER} && adduser -S ${USER} -G ${USER}
RUN apk update && apk add python3-dev build-base git
Expand All @@ -19,6 +20,8 @@ RUN pip install -v .
# Remove everything, including dot files
RUN rm -rf ..?* .[!.]* *

COPY configs/development.toml ${CONFIG_FILE}

# Drop down to a non-root user
USER ${USER}

Expand Down
10 changes: 6 additions & 4 deletions Dockerfile.dev
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ FROM python:3.6-alpine
RUN apk update && apk add python3-dev build-base git

WORKDIR /app

# Some dependencies require a very long compilation: protobuf, numpy,
# grpcio. To avoid having to re-install these packages every time, we
# pre-install them so that they are cached.
Expand All @@ -18,10 +19,11 @@ RUN mkdir xain_fl && \
python setup.py egg_info && \
cat *.egg-info/requires.txt | grep -v '^\[' | uniq | pip install -r /dev/stdin

RUN rm -rf xain_fl
COPY xain_fl xain_fl/
RUN rm -rf xain_fl README.md
COPY README.md .
COPY xain_fl xain_fl/
RUN pip install -e .

RUN pip install -v -e .
COPY configs/development.toml /app/xain-fl.toml

CMD ["python3", "setup.py", "--fullname"]
CMD ["coordinator", "--config", "/app/xain-fl.toml"]
49 changes: 39 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,16 +68,17 @@ $ make show

### Running the Coordinator locally

To run the Coordinator on your local machine, use the command:
To run the Coordinator on your local machine, you can use the
`example-config.toml` file:

```shell
$ python xain_fl/cli.py --storage-endpoint <url_for_storage> --storage-key-id <id_for_storage> --storage-secret-access-key <password_for_storage> --storage-bucket <name_of_storage_bucket>
```

For more information about the CLI and its arguments, run:
# If you have installed the xain_fl package,
# the `coordinator` command should be directly available
coordinator --config example-config.toml

```shell
$ python xain_fl/cli.py --help
# otherwise the coordinator can be started by executing the
# `xain_fl` package:
python xain_fl --config example-config.toml
```

### Run the Coordinator from a Docker image
Expand All @@ -92,13 +93,28 @@ To run the coordinator's development image, first build the Docker image:
$ docker build -t xain-fl-dev -f Dockerfile.dev .
```

Then run the image, mounting the directory as a Docker volume, and call the
entrypoint:
Then run the image, mounting the directory as a Docker volume:

```shell
$ docker run -v $(pwd):/app -v '/app/xain_fl.egg-info' xain-fl-dev coordinator
```

The command above uses a default configuration but you can also use a
custom config file:

For instance, if you have a `./custom_config.toml` file that you'd
like to use, you can mount it in the container and run the coordinator
with:

```shell
docker run \
-v $(pwd)/custom_config.toml:/custom_config.toml \
-v $(pwd):/app \
-v '/app/xain_fl.egg-info' \
xain-fl-dev \
coordinator --config /custom_config.toml
```

#### Release image

To run the coordinator's release image, first build it:
Expand All @@ -115,10 +131,23 @@ $ docker run -p 50051:50051 xain-fl

### Docker-compose

The coordinator needs a storage service that provides an AWS S3
API. For development, we use `minio`. We provide `docker-compose`
files that start coordinator container along with a `minio` container,
and pre-populate the appropriate storage buckets.

#### Development

To start both the coordinator and the `minio` service use:

```shell
docker-compose -f docker-compose-dev.yml up
```

It is also possible to only start the storage service:

```shell
$ docker-compose -f docker-compose-dev.yml up
docker-compose -f docker-compose-dev.yml up minio-dev initial-buckets
```

#### Release
Expand Down
29 changes: 29 additions & 0 deletions configs/development.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[server]
# Address to listen on for incoming gRPC connections. By listening on
# 0.0.0.0, the coordinator will open a port reachable from the
# outside.
host = "0.0.0.0"
# Port to listen on for incoming gRPC connections
port = 50051

[ai]
# Number of global rounds the model is going to be trained for. This
# must be a positive integer.
rounds = 2
# Number of local epochs per round
epochs = 1
# Minimum number of participants to be selected for a round.
min_participants = 1
# Fraction of total clients that participate in a training round. This
# must be a float between 0 and 1.
fraction_participants = 1.0

[storage]
# URL to the storage service to use
endpoint = "http://minio-dev:9000"
# Name of the bucket for storing the aggregated models
bucket = "xain-fl-aggregated-weights"
# AWS access key ID to use to authenticate to the storage service
access_key_id = "minio"
# AWS secret access to use to authenticate to the storage service
secret_access_key = "minio123"
27 changes: 27 additions & 0 deletions configs/example-config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[server]
# Address to listen on for incoming gRPC connections
host = "localhost"
# Port to listen on for incoming gRPC connections
port = 50051

[ai]
# Number of global rounds the model is going to be trained for. This
# must be a positive integer.
rounds = 2
# Number of local epochs per round
epochs = 1
# Minimum number of participants to be selected for a round.
min_participants = 1
# Fraction of total clients that participate in a training round. This
# must be a float between 0 and 1.
fraction_participants = 1.0

[storage]
# URL to the storage service to use
endpoint = "http://minio-dev:9000"
# Name of the bucket for storing the aggregated models
bucket = "xain-fl-aggregated-weights"
# AWS access key ID to use to authenticate to the storage service
access_key_id = "minio"
# AWS secret access to use to authenticate to the storage service
secret_access_key = "minio123"
10 changes: 3 additions & 7 deletions docker-compose-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,19 +48,15 @@ services:
"
xain-fl-dev:
environment:
MINIO_ACCESS_KEY: minio
MINIO_SECRET_KEY: minio123
build:
context: .
dockerfile: Dockerfile.dev
command: sh -c "coordinator --storage-endpoint http://minio-dev:9000 --storage-key-id $${MINIO_ACCESS_KEY} --storage-secret-access-key $${MINIO_SECRET_KEY} --storage-bucket xain-fl-aggregated-weights"
volumes:
# don't use the local egg-info, if one exists
- /app/xain_fl.egg-info
- ./xain_fl:/app/xain_fl
- /app/xain_fl.egg-info # don't use the local egg-info, if one exists
- ${PWD}/xain_fl:/app/xain_fl
- ${PWD}/setup.py:/app/setup.py
- ${PWD}/README.md:/app/README.md
- ${PWD}/configs/development.toml:/xain-fl.toml
networks:
- xain-fl-dev
ports:
Expand Down
2 changes: 1 addition & 1 deletion docker/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ set -o nounset
# set -o xtrace

if [ $# -eq 0 ]; then
exec coordinator --host ${HOST} --port ${PORT}
exec coordinator -f ${CONFIG_FILE}
else
exec coordinator "$@"
fi
5 changes: 4 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@
"structlog==19.2.0", # Apache License 2.0
"xain-proto==0.3.0", # Apache License 2.0
"boto3==1.10.48", # Apache License 2.0
"toml==0.10.0", # MIT
"schema~=0.7", # MIT
"idna==2.8", # BSD
]

dev_require = [
Expand Down Expand Up @@ -96,5 +99,5 @@
"docs": docs_require,
"dev": dev_require + tests_require + docs_require,
},
entry_points={"console_scripts": ["coordinator=xain_fl.cli:main"]},
entry_points={"console_scripts": ["coordinator=xain_fl.__main__:main"]},
)
10 changes: 8 additions & 2 deletions tests/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@

import numpy as np

from xain_fl.coordinator.store import Store, StoreConfig
from xain_fl.config import StorageConfig
from xain_fl.coordinator.store import Store


class FakeS3Resource:
Expand Down Expand Up @@ -71,7 +72,12 @@ class TestStore(Store):
#
# pylint: disable=super-init-not-called
def __init__(self):
self.config = StoreConfig("endpoint_url", "access_key_id", "secret_access_key", "bucket")
self.config = StorageConfig(
endpoint="endpoint",
access_key_id="access_key_id",
secret_access_key="secret_access_key",
bucket="bucket",
)
self.s3 = FakeS3Resource()

def assert_wrote(self, round: int, weights: np.ndarray):
Expand Down
Loading

0 comments on commit 326f8e3

Please sign in to comment.