Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Commit

Permalink
XP-456: replace the CLI arguments with a config file
Browse files Browse the repository at this point in the history
Cc: finiteprods <kwok.doc@gmail.com>
Cc: Robert Steiner <robertt.debug@gmail.com>
Cc: janpetschexain <58227040+janpetschexain@users.noreply.github.com>

References
==========

https://xainag.atlassian.net/browse/XP-456
https://xainag.atlassian.net/browse/DO-58

Rationale
=========

The CLI is getting complex so it is worth loading the configuration
from a file instead.

Implementation details
======================

TOML
----

We decided to use TOML for the following reasons:

  - it is human friendly, ie easy to read and write
  - our configuration has a pretty flat structure which makes TOML
    quite adapted
  - it is well specified and has lots of implementation
  - it is well known

The other options we considered:

  - INI: it is quite frequent in the Python ecosystem to use INI for
    config files, and the standard library even provides support for
    this. However, INI is not as powerful as TOML and does not have a
    specification
  - JSON: it is very popular but is not human friendly. For instance,
    it does not support comments, is very verbose, and breaks
    easily (if a trailing comma is forgotten at the end of a list for
    instance)
  - YAML: another popular choice, but is in my opinion more complex
    than TOML.

Validation
----------

We use the third-party `schema` library to validate the
configuration. It provides a convenient way to:

- declare a schema to validate our config
- leverage third-party libraries to validate some inputs (we use the
  `idna` library to validate hostnames)
- define our own validators
- transform data after it has been validated: this can be useful to
  turn a relative path into an absolute one for example
- provide user friendly error message when the configuration is
  invalid

The `Config` class
------------------

By default, the `schema` library returns a dictionary containing a
valid configuration, but that is not convenient to manipulate in
Python. Therefore, we dynamically create a `Config` class from the
configuration schema, and instantiate a `Config` object from the data
returned by the `schema` validator.

Package re-organization
-----------------------

We moved the command line and config file logic into its own `config`
sub-package, and moved the former `xain_fl.cli.main` entrypoint into
the `xain_fl.__main__` module.

Docker infrastructure
---------------------

- Cache the xain_fl dependencies. This considerably reduces
  "edit->build-> debug" cycle, since installing the dependencies takes
  about 30 minutes.
- Move all the docker related files into the `docker/` directory

Current limitations and future work
-----------------------------------

1. The documentation generated for the `ServerConfig`,
   `AiConfig` and `StorageConfig`  classes is wrong. Each attribute is
   documented as "Alias for field number X". This can be fixed by
   having `create_class_from_schema()` setting the `__doc__` attribute
   for each attribute. However, we won't be able to automatically
   document the type of each attribute.

2. When the configuration contains an invalid value, the error message
   we generate does not contain the invalid value in question. I think
   it is possible to enable this in the future but haven't really
   looked into it.
  • Loading branch information
little-dude committed Jan 23, 2020
1 parent 8b3e765 commit 5a60096
Show file tree
Hide file tree
Showing 16 changed files with 777 additions and 273 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,8 @@ dmypy.json
# Editors
.idea
.vscode

# In the README, we document how to use custom config. Let's ignore
# the files mentioned in that section
custom_config.toml
custom_initial_weights.npy
24 changes: 21 additions & 3 deletions Dockerfile.dev
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,28 @@ FROM python:3.6-alpine
RUN apk update && apk add python3-dev build-base git

WORKDIR /app

# Some dependencies require a very long compilation: protobuf, numpy,
# grpcio. To avoid having to re-install these packages every time, we
# pre-install them so that they are cached.
#
# However, we cannot just pre-install a few packages, because we don't
# know exactly which version `pip` will pick for them. Instead, we
# give `pip` all the package's dependencies, and let it resolve and
# install them.
COPY setup.py .
COPY xain_fl xain_fl/
RUN mkdir xain_fl && \
printf '__version__ = "0"\n__short_version__ = "0"' > xain_fl/__version__.py && \
touch README.md && \
python setup.py egg_info && \
cat *.egg-info/requires.txt | grep -v '^\[' | uniq | pip install -r /dev/stdin

RUN rm -rf xain_fl README.md
COPY README.md .
COPY xain_fl xain_fl/
RUN pip install -e .

RUN pip install -v -e .
COPY docker/dev/xain-fl.toml /xain-fl.toml
COPY docker/dev/initial_weights.npy /initial_weights.npy

CMD ["python3", "setup.py", "--fullname"]
CMD ["coordinator", "--config", "/xain-fl.toml"]
33 changes: 23 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,16 +68,13 @@ $ make show

### Running the Coordinator locally

To run the Coordinator on your local machine, use the command:
To run the Coordinator on your local machine, you can use the
`example-config.toml` file:

```shell
$ python xain_fl/cli.py -f test_array.npy
```

For more information about the CLI and its arguments, run:

```shell
$ python xain_fl/cli.py --help
coordinator --config example-config.toml
# or
python xain_fl --config example-config.toml
```

### Run the Coordinator from a Docker image
Expand All @@ -92,13 +89,29 @@ To run the coordinator's development image, first build the Docker image:
$ docker build -t xain-fl-dev -f Dockerfile.dev .
```

Then run the image, mounting the directory as a Docker volume, and call the
entrypoint:
Then run the image, mounting the directory as a Docker volume:

```shell
$ docker run -v $(pwd):/app -v '/app/xain_fl.egg-info' xain-fl-dev coordinator
```

The command above uses a default configuration but you can also use a
custom config file and custom initial weights.

For instance, if you have a `./custom_config.toml` and
`./custom_initial_weights.npy` files that you'd like to use, you can
mount them in the container and run the coordinator with them:

```shell
docker run \
-v $(pwd)/custom_config.toml:/custom_config.toml \
-v $(pwd)/custom_initial_weights.npy:/custom_initial_weights.npy \
-v $(pwd):/app \
-v '/app/xain_fl.egg-info' \
xain-fl-dev \
coordinator --config /custom_config.toml
```

#### Release image

To run the coordinator's release image, first build it:
Expand Down
12 changes: 4 additions & 8 deletions docker-compose-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,20 +48,16 @@ services:
"
xain-fl-dev:
environment:
MINIO_ACCESS_KEY: minio
MINIO_SECRET_KEY: minio123
build:
context: .
dockerfile: Dockerfile.dev
command: sh -c "coordinator -f test_array.npy --storage-endpoint http://minio-dev:9000 --storage-key-id $${MINIO_ACCESS_KEY} --storage-secret-access-key $${MINIO_SECRET_KEY} --storage-bucket xain-fl-aggregated-weights"
volumes:
# don't use the local egg-info, if one exists
- /app/xain_fl.egg-info
- ./xain_fl:/app/xain_fl
- /app/xain_fl.egg-info # don't use the local egg-info, if one exists
- ${PWD}/xain_fl:/app/xain_fl
- ${PWD}/setup.py:/app/setup.py
- ${PWD}/README.md:/app/README.md
- ${PWD}/test_array.npy:/app/test_array.npy
- ${PWD}/docker/dev/xain-fl.toml:/xain-fl.toml
- ${PWD}/docker/dev/initial_weights.npy:/initial_weights.npy
networks:
- xain-fl-dev
ports:
Expand Down
File renamed without changes.
29 changes: 29 additions & 0 deletions docker/dev/xain-fl.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[server]
# Address to listen on for incoming gRPC connections
host = "0.0.0.0"
# Port to listen on for incoming gRPC connections
port = 50051

[ai]
# Path to a file containing a numpy ndarray to use a initial model weights.
initial_weights = "/initial_weights.npy"
# Number of global rounds the model is going to be trained for. This
# must be a positive integer.
rounds = 1
# Number of local epochs per round
epochs = 1
# Minimum number of participants to be selected for a round.
min_participants = 1
# Fraction of total clients that participate in a training round. This
# must be a float between 0 and 1.
fraction_participants = 1.0

[storage]
# URL to the storage service to use
endpoint = "http://minio-dev:9000"
# Name of the bucket for storing the aggregated models
bucket = "xain-fl-aggregated-weights"
# AWS access key ID to use to authenticate to the storage service
access_key_id = "minio"
# AWS secret access to use to authenticate to the storage service
secret_access_key = "minio123"
29 changes: 29 additions & 0 deletions example-config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[server]
# Address to listen on for incoming gRPC connections
host = "localhost"
# Port to listen on for incoming gRPC connections
port = 50051

[ai]
# Path to a file containing a numpy ndarray to use a initial model weights.
initial_weights = "./docker/dev/initial_weights.npy"
# Number of global rounds the model is going to be trained for. This
# must be a positive integer.
rounds = 1
# Number of local epochs per round
epochs = 1
# Minimum number of participants to be selected for a round.
min_participants = 1
# Fraction of total clients that participate in a training round. This
# must be a float between 0 and 1.
fraction_participants = 1.0

[storage]
# URL to the storage service to use
endpoint = "http://minio-dev:9000"
# Name of the bucket for storing the aggregated models
bucket = "xain-fl-aggregated-weights"
# AWS access key ID to use to authenticate to the storage service
access_key_id = "minio"
# AWS secret access to use to authenticate to the storage service
secret_access_key = "minio123"
5 changes: 4 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@
"structlog==19.2.0", # Apache License 2.0
"xain-proto==0.3.0", # Apache License 2.0
"boto3==1.10.48", # Apache License 2.0
"toml==0.10.0", # MIT
"schema~=0.7", # MIT
"idna==2.8", # BSD
]

dev_require = [
Expand Down Expand Up @@ -96,5 +99,5 @@
"docs": docs_require,
"dev": dev_require + tests_require + docs_require,
},
entry_points={"console_scripts": ["coordinator=xain_fl.cli:main"]},
entry_points={"console_scripts": ["coordinator=xain_fl.__main__"]},
)
10 changes: 8 additions & 2 deletions tests/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@

import numpy as np

from xain_fl.coordinator.store import Store, StoreConfig
from xain_fl.config import StorageConfig
from xain_fl.coordinator.store import Store


class FakeS3Resource:
Expand Down Expand Up @@ -71,7 +72,12 @@ class TestStore(Store):
#
# pylint: disable=super-init-not-called
def __init__(self):
self.config = StoreConfig("endpoint_url", "access_key_id", "secret_access_key", "bucket")
self.config = StorageConfig(
endpoint="endpoint",
access_key_id="access_key_id",
secret_access_key="secret_access_key",
bucket="bucket",
)
self.s3 = FakeS3Resource()

def assert_wrote(self, round: int, weights: np.ndarray):
Expand Down
Loading

0 comments on commit 5a60096

Please sign in to comment.