refactor MaskObject into vector and scalar parts #524

finiteprods · 2020-09-18T08:04:25Z

The changes introduced for the generalised scalar extension had the unfortunate side effect of adding unwelcome duplication to the code base - effectively, a pair of MaskObjects were passed through the system, either a (masked model, masked scalar) pair, or their corresponding masks. This in turn lead to the need for a pair of Aggregations, a pair of MaskDicts, and so on. Worse, one is a vector and the other is a scalar, but the types don't reflect this at all.

This merge request introduces a new, more type-safe MaskObject:

pub struct MaskObject {
    pub vector: MaskMany,
    pub scalar: MaskOne,
}

consisting of a MaskMany (the new name for the old MaskObject) and a MaskOne (containing exactly one data value):

pub struct MaskOne {
    pub data: BigUint,
    pub config: MaskConfig,
}

As a result, the scalar-related code is better encapsulated away, for the most part. We get back the tidier code (roughly similar to before #496) - in particular, the new scaling correction step of the generalised extension becomes an implementation detail of the unmasking, as it probably should.

A separate MaskOne type also allows a MaskConfig to be associated with a scalar - previously, scalars could only be masked according to the same masking configuration as the weights. Although currently, masking configurations are very geared towards the masking of weights, this is nevertheless a necessary first step in achieving more flexibility in the masking of scalars.

a few extra details...

The above refactoring also allows the existing unit tests to more properly cover the scalar extension code. This in turn revealed an issue with (probably) the masking of scalars, as unmasked models no longer match (within the tolerance we expect) the original model weights. As such, the tolerance checks of the masking tests are temporarily relaxed (commit). These should be reverted once the issue has been resolved.
The change in structure of MaskObject necessitates changes in its (de)serialization. As with other types we serialize, a corresponding "buffer" type MaskObjectBuffer is introduced, and similarly MaskManyBuffer. But currently, a MaskOne is serialized by converting it into a MaskMany which was convenient to some extent for purposes of code reuse, but perhaps sub-optimal in other ways. For example, a MaskOne can be encoded more compactly than a MaskMany - there is always a single data value, so there is no need for a LENGTH field. This is an optimization that can be tackled later.
Some comments in reviews (pairing of MaskConfigs into a type, more consistent naming) will also be dealt with in another PR, probably in conjunction with 1. above.

Todos:

tests still to fix
conflicts / rebase
complete the More to follow

little-dude

Very nice. My only concern is the naming, which I find slightly inconsistent sometimes.

little-dude · 2020-09-18T08:16:16Z

rust/xaynet-core/src/mask/masking.rs

@@ -76,10 +94,10 @@ impl Into<MaskObject> for Aggregation {
 #[allow(clippy::len_without_is_empty)]
 impl Aggregation {
    /// Creates a new, empty aggregator for masks or masked models.
-    pub fn new(config: MaskConfig, object_size: usize) -> Self {
+    pub fn new(config_many: MaskConfig, config_one: MaskConfig, object_size: usize) -> Self {


Not sure if it's worth it but maybe we could have a

struct AggregationConfig { model: MaskConfig, scalar: MaskConfig, }

I think this would further increase tidiness, thanks. will add this in a later PR if that's ok.

So this could be tackled with #524 (comment) right?

little-dude · 2020-09-18T08:21:08Z

rust/xaynet-core/src/mask/masking.rs

-        if self.object.config != mask.config || self.object_size != mask.data.len() {
-            return Err(UnmaskingError::MaskMismatch);
+        if self.nb_models > self.object.scalar.config.model_type.max_nb_models() {
+            return Err(UnmaskingError::TooManyScalars);


Can self.object.vector.config.model_type.max_nb_models() and self.object.scalar.config.model_type.max_nb_models() actually differ? Seems like the max is actually capped to the min of these values. More generally, can the scalar masking config and model masking config have incompatibilities?

I agree this could probably done in one check. more generally, since the scalar could be masked with a different config to weights, we should do the same check to make sure it admits all of them for aggregating. will address separately.

little-dude · 2020-09-18T08:22:33Z

rust/xaynet-core/src/mask/masking.rs


    #[error("too many models were aggregated for the current unmasking configuration")]
    TooManyModels,

-    #[error("the model to aggregate is incompatible with the current aggregated model")]
+    #[error("too many scalars were aggregated for the current unmasking configuration")]
+    TooManyScalars,


Is it worth distingushing between TooManyModels and TooManyScalars? Could they be unified into an UnmaskingError::TooManyMasks for instance?

I guess distinguishing the cases has the advantage when we get errors, e.g. to narrow down which mask config contributed to the error.

little-dude · 2020-09-18T08:35:09Z

rust/xaynet-core/src/mask/masking.rs

+        if self.object.vector.config != mask.vector.config
+            || self.object_size != mask.vector.data.len()
+        {
+            return Err(UnmaskingError::MaskManyMismatch);


The naming is a bit inconsistent between model/scalar vs many/one. One one side we have TooManyScalars and TooManyModels, and here MaskManyMismatch and MaskOneMismatch. There are other places where that is the case. Is the motivation for introducing the many/one terminology to be broader than the ML specific term "model"? I like the idea but I think we should have a clear separation of where we use a specific terminology. For instance we could say that in xaynet-core there shouldn't be any mention of "model", while in xaynet-client and xaynet-server which are domain specific, we should only use the model/scalar terminology (it would even be better if we could find a more specific term than "scalar" imo).

On the same topic, I have a slight preference for vector/scalar over many/one. It's not domain specific but fits better the nature of the data we're referring to. Wdyt?

yes I agree the naming isn't the best at the moment. OT1H I had introduced new terminology, OTOH I was also trying to not deviate too much from the existing names here for these errors (which were already not named the best either!). Let me revise this separately if that's ok and we can continue the discussion there. And yes, the idea of Many vs One was to more abstractly capture the structure of masked models / scalars and model masks / scalar masks. Vector / Scalar would also have worked but scalar clashes with the more domain specific use of the term, as you also noted.

Robert-Steiner

Really nice refactoring👍
This will make the extracting of the mask dictionary into Redis much easier

rust/xaynet-client/src/participant.rs

codecov · 2020-09-28T07:07:57Z

Codecov Report

Merging #524 into master will increase coverage by 0.28%.
The diff coverage is 72.83%.

@@            Coverage Diff             @@
##           master     #524      +/-   ##
==========================================
+ Coverage   56.82%   57.10%   +0.28%     
==========================================
  Files          65       65              
  Lines        3233     3299      +66     
==========================================
+ Hits         1837     1884      +47     
- Misses       1396     1415      +19

Impacted Files	Coverage Δ
...aynet-client/src/mobile_client/participant/sum2.rs	`22.22% <0.00%> (+3.47%)`	⬆️
...net-client/src/mobile_client/participant/update.rs	`33.33% <0.00%> (-2.39%)`	⬇️
rust/xaynet-server/src/state_machine/requests.rs	`80.95% <ø> (-1.27%)`	⬇️
rust/xaynet-server/src/storage/impls.rs	`64.28% <ø> (ø)`
rust/xaynet-core/src/mask/object/mod.rs	`50.00% <51.72%> (+5.55%)`	⬆️
rust/xaynet-core/src/message/payload/update.rs	`88.88% <71.42%> (-2.26%)`	⬇️
rust/xaynet-core/src/mask/masking.rs	`71.96% <74.69%> (-8.72%)`	⬇️
rust/xaynet-core/src/message/payload/sum2.rs	`86.20% <80.00%> (-2.48%)`	⬇️
...t/xaynet-server/src/state_machine/phases/update.rs	`64.28% <83.33%> (+2.89%)`	⬆️
...ust/xaynet-server/src/state_machine/phases/sum2.rs	`78.26% <85.71%> (-0.08%)`	⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29d23e0...aac5e1f. Read the comment docs.

Robert-Steiner

The changes look good to me

rust/xaynet-server/src/state_machine/phases/sum2.rs

little-dude

👍 Very nice cleanup. Could we just tidy the git history a bit before we merge?

fmt

…ect serialization implementation of mask object serialization is completed here the prior incomplete implementation lead to the failing tests above

note: these tests should be tightened once the source of the masking issue has been resolved

fixes a bug in aggregation for the case where the 1st mask object is added to an empty aggregation the bug was revealed by the failing tests mentioned above

fmt

…-> MaskMany fmt imports remove stale comment remove commented out code

little-dude reviewed Sep 18, 2020

View reviewed changes

Robert-Steiner reviewed Sep 21, 2020

View reviewed changes

rust/xaynet-client/src/participant.rs Outdated Show resolved Hide resolved

finiteprods added 6 commits September 28, 2020 09:17

rename MaskObject to MaskMany

6e7d562

MaskOne struct; MaskObject wrapper; masking logic

028b89b

adjust update message; From/ToBytes impl for MaskOne

0c59644

change Aggregation type to wrap MaskObject instead of MaskMany

c607c83

validate_aggregate and aggregate

a385900

validate_unmask and unmask incl correction

8667dcc

finiteprods force-pushed the scalar-composite branch from dcd5883 to 0b8dd67 Compare September 28, 2020 08:58

finiteprods marked this pull request as ready for review September 28, 2020 09:28

Robert-Steiner reviewed Sep 28, 2020

View reviewed changes

rust/xaynet-server/src/state_machine/phases/sum2.rs Outdated Show resolved Hide resolved

little-dude self-requested a review September 28, 2020 12:53

little-dude approved these changes Sep 28, 2020

View reviewed changes

finiteprods added 7 commits September 28, 2020 15:56

changed MaskDict type

ec15aa1

fmt

adjust tests to compile

3b8a9ff

fix tests: serialize, deserialize, decode_invalid_seed_dict; mask obj…

951cbda

…ect serialization implementation of mask object serialization is completed here the prior incomplete implementation lead to the failing tests above

temp relax masking tests to suppress test fails

3710bc2

note: these tests should be tightened once the source of the masking issue has been resolved

fix tests: update_to_sum2, full_round; aggregation bugfix

2cfcd00

fixes a bug in aggregation for the case where the 1st mask object is added to an empty aggregation the bug was revealed by the failing tests mentioned above

fix doctests; suppress failing masking assert for now

5b45b05

fmt

make MaskObject (De)Serializable for redis tests; rename: MaskObject …

aac5e1f

…-> MaskMany fmt imports remove stale comment remove commented out code

finiteprods force-pushed the scalar-composite branch from 76add1b to aac5e1f Compare September 28, 2020 13:57

finiteprods merged commit dd335b5 into master Sep 28, 2020

finiteprods deleted the scalar-composite branch September 28, 2020 14:38

finiteprods mentioned this pull request Oct 1, 2020

fix extended coverage of masking tests #542

Merged

finiteprods mentioned this pull request Oct 23, 2020

masking configuration pair #575

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor MaskObject into vector and scalar parts #524

refactor MaskObject into vector and scalar parts #524

finiteprods commented Sep 18, 2020 •

edited

Loading

little-dude left a comment

little-dude Sep 18, 2020

finiteprods Sep 28, 2020

little-dude Sep 28, 2020

little-dude Sep 18, 2020

finiteprods Sep 28, 2020

little-dude Sep 18, 2020

finiteprods Sep 28, 2020

little-dude Sep 18, 2020

finiteprods Sep 28, 2020 •

edited

Loading

Robert-Steiner left a comment

codecov bot commented Sep 28, 2020 •

edited

Loading

Robert-Steiner left a comment

little-dude left a comment

refactor MaskObject into vector and scalar parts #524

refactor MaskObject into vector and scalar parts #524

Conversation

finiteprods commented Sep 18, 2020 • edited Loading

a few extra details...

little-dude left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

finiteprods Sep 28, 2020 • edited Loading

Choose a reason for hiding this comment

Robert-Steiner left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 28, 2020 • edited Loading

Codecov Report

Robert-Steiner left a comment

Choose a reason for hiding this comment

little-dude left a comment

Choose a reason for hiding this comment

finiteprods commented Sep 18, 2020 •

edited

Loading

finiteprods Sep 28, 2020 •

edited

Loading

codecov bot commented Sep 28, 2020 •

edited

Loading