This repository has been archived by the owner on Aug 30, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PB-159: remove weights from gRPC messages
References: https://xainag.atlassian.net/browse/PB-159 Needs to be merged along with: - https://github.com/xainag/xain-proto/pull/25 - https://github.com/xainag/xain-sdk/pull/88 - #298 Summary: Remove the weights from the gRPC messages. From now on, weights will be exchanged via s3 buckets. The sequence diagram below illustrate this new behavior. At the beginning of a round (1) the selected participants send a `StartTrainingRound` request, and the coordinator response with the same `StartTrainingRoundResponse` that does not contain the global weights anymore. Instead, the participant fetches these weights from the store (2). S3 buckets are key-value stores, and the key for global weights is the round number. Then, the participant trains. Once done, it uploads its local weights to the S3 bucket (3). The key is `<participant_id>/<round_number>`. Finally (4), the participant sends it's `EndTrainingRequest`. Before answering, the coordinator retrieves the local weights the participant has uploaded. _**Important note**: At the moment, the participants don't know their ID, because the coordinator does send it to them. Thus, they currently generate a random ID when they start, and send it to the coordinator so that it can retrieve the participant's weights. This is why the `EndTrainingRoundRequest` currently has a `participant_id` field._ ``` P C Store 1. | StartTrainingRoundRequest | | | -----------------------------> | | | StartTrainingRoundResponse | | | <----------------------------- | | | | | | Get global weights (key="round") | 2. | ------------------------------------------------------> | | Global weights | | <------------------------------------------------------ | | | | | [train...] | | | | | 3. | Set local weights (key="participant/round") | | ------------------------------------------------------> | | Ok | | <------------------------------------------------------ | | | | 4. | EndTrainingRoundRequest | | | -----------------------------> | Get local weights (key="participant/round") | | ---------------------> | | | Local weights | | EndTrainingRoundResponse | <--------------------> | | <----------------------------- | | ``` At the end of the round, the coordinator writes the weights to the s3 bucket, using the next upcoming round number as key (see the sequence diagram below). ``` P C Store | EndTrainingRoundRequest | | | -----------------------------> | Get local weights (key="participant/round") | | ---------------------> | | | Local weights | | EndTrainingRoundResponse | <--------------------> | | <----------------------------- | | | | | | | Set global weights (key="round + 1") | | ---------------------> | | | Ok | | | <--------------------> | ``` Implementation notes: - Initially, we thought we would be using different buckets for the local and global weights. But for now, we use the same bucket for local and global weights for now - We currently store the global weights under different keys. It turns out that this brings un-necessary complexity so we'll probably simplify this in the future - For now, the coordinator doesn't send any storage information to the participants. Thus, the participants need to be configured with the storage information. In the future, the `StartTrainingRoundResponse` could contain the endpoint url, bucket name, etc.
- Loading branch information