Introduce swap-or-not shuffle #576

vbuterin · 2019-02-06T12:49:08Z

See #563 for discussion.

Here is a more efficient implementation for shuffling an entire set; it can live here until we come up with an explicit "efficient implementation" doc:

def shuffle(list_size, seed):
    indices = list(range(list_size))
    for round in range(90):
        hash_bytes = b''.join([
            hash(seed + round.to_bytes(1, 'little') + (i).to_bytes(4, 'little'))
            for i in range((list_size + 255) // 256)
        ])
        pivot = bytes_to_int(hash(seed + int_to_bytes1(round))[0:8]) % list_size

        powers_of_two = [1, 2, 4, 8, 16, 32, 64, 128]
            
        for i, index in enumerate(indices):
            flip = (pivot - index) % list_size
            hash_pos = index if index > flip else flip
            byte = hash_bytes[hash_pos // 8]
            if byte & powers_of_two[hash_pos % 8]:
                indices[i] = flip
    return indices

See #563 for discussion.

hwwhww

Great work 👍

If my suggestions are applied, the code will be runnable.

specs/core/0_beacon-chain.md

Co-Authored-By: vbuterin <v@buterin.com>

djrtwo · 2019-02-07T02:38:58Z

should we use little endian for consistency? (sorry @vbuterin)

vbuterin · 2019-02-07T03:55:46Z

Sure, sounds fine to me.

djrtwo · 2019-02-07T04:32:21Z

done

hwwhww · 2019-02-07T11:26:56Z

@vbuterin @djrtwo

Updated to the latest optimization version, would you like to check the docstring again?
For the shuffling tests, not sure if it's necessary, what do you think about parameterizing the round number 90? i.e.,def shuffle(list_size: int, seed: Bytes32, rounds: int=90) -> List[int], so we can test other numbers?

djrtwo · 2019-02-07T18:13:21Z

we lost the get_permuted_index to optimizations? We'll likely want to make get_permuted_index exposed in the spec at some point.

(1) -- docstring looks good
(2) -- rounds as param looks good

hwwhww · 2019-02-07T21:17:01Z

@djrtwo
Fair enough. 😄
Reverted and refactored it.

specs/core/0_beacon-chain.md

Co-Authored-By: vbuterin <v@buterin.com>

hwwhww · 2019-02-08T09:02:09Z

To the implementers: 70e482b is the optimized version for shuffling the whole validator registry.

protolambda · 2019-02-12T23:59:34Z

I'm working on a new implementation experiment-repo, written in Go. For the new "swap-or-not" shuffling algorithm. Love optimizing this stuff (and don't have a ETH 2.0 client to work on more general stuff 😞 ).
Although the algorithm for set-shuffling by Vitalik looked promising, it still allocated O(n) hashes every round (n = list size, talking about "hash_bytes" in original).
I optimized it to only have 1 single hash (!!!) on the stack, since you can go through indices with the same hash (there are windows) consecutively.
I probably still have some off-by-1 errors in the shuffling itself, but it's looking good so far.

Here are the preliminary benchmark results (ran locally on a laptop, more about comparison):

BenchmarkPermuteIndex4M-8                               100000        157231 ns/op
BenchmarkPermuteIndex40K-8                              100000        157304 ns/op
BenchmarkPermuteIndex400-8                              100000        156259 ns/op
BenchmarkPermuteIndexComparison40K-8                         2    6247616777 ns/op
BenchmarkPermuteIndexComparison400-8                       200	    64324372 ns/op
BenchmarkShuffleList4M-8                                      10    1667117128 ns/op
BenchmarkShuffleList40K-8                                   1000      16748448 ns/op
BenchmarkShuffleList400-8                                  50000        390820 ns/op


 4M = 4000000
40K =   40000
400 =     400

BenchmarkPermuteIndexN = simply check how many times you can permute the index in a list sized N
BenchmarkPermuteIndexComparisonN = do the index-wise permutations, but for every list item, i.e. N times. For comparison with BenchmarkShuffleListN
BenchmarkShuffleListN = shuffle a complete list of N items.

Note how slow index-wise permutations are if you do a complete list shuffle with it, compared to shuffling the list collectively.
Shuffling 4 million items collectively, is more efficient (> 3x) than shuffling a list of 1/100th the size individually.
Note how shuffling individual indices isn't affected much (if at all) by the list-size.

Repo here: https://github.com/protolambda/eth2-shuffle
Warning: still work-in-progress, no guarantees on correctness (likely a off-by-1 somewhere, gonna test later)

protolambda · 2019-02-13T17:38:01Z

Posting some relevant gitter chatter here:

@tersec
regarding the new shuffling algorithm, https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf specifically in/around figure 3 states that the choice from the domain should be uniform, but mod list_size won't be?

@protolambda

Had my doubts first too, but it helps if you print out the pairs for any pivot + listsize, you can see a mirror pattern (e.g. pivot = 5: 05, 14, 23) on each side of the pivot. And for each pivot, there's a different pairing for any given index. (Afaik)

And pivot choice (your mod list_size concern) is indeed not perfect, but is close to uniform for large enough random numbers

2^64/4000000=~4.6e12 list_sizes. And only 1 incomplete. So most of the time it's uniform.
And even more if you would take a 2^256 int from the hash

Ok, looks like that's exactly what Python's int.from_bytes does. So, the fraction of time that it's not uniform is ridiculously low. It may even be worth it to consider taking only 64 bits of the hash, for speed. 99.999999...% is still good.

@tersec
Fair enough, reasonable tradeoff

@protolambda
Interesting, it's both 64 bits and 256 bits 😂 The spec has a [0:8], but this is not reflected in the set-shuffle implementation in the PR. I prefer the 64 bits, it's sufficient and fast

Now, the issue is:
Post above by @vbuterin says:

pivot = int.from_bytes(hash(seed + round.to_bytes(1, 'little')), 'little') % list_size

Spec currently says:

pivot = bytes_to_int(hash(seed + int_to_bytes1(round))[0:8]) % list_size

There's two inconsistency issues:

spec is not explicit about little endianness (although it is globally)
version by Vitalik, that every implementer team is copying for efficiency, does not have the [0:8]: i.e. only using 64 bits of the hash, for efficiency (without much loss in it being uniform)

Some test-vectors would be really useful.

djrtwo · 2019-02-13T17:45:40Z

bytes_to_int uses little in the definition.
spec avoids big int arithmetic so the vitalik version should be [0:8]. Fixing now

djrtwo · 2019-02-13T17:52:10Z

Here is an issue for reworking the test vectors to the swap or not -- ethereum/eth2.0-test-generators#10

Introduce swap-or-not shuffle

c58410e

See #563 for discussion.

hwwhww reviewed Feb 6, 2019

View reviewed changes

specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved

specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved

specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved

djrtwo reviewed Feb 6, 2019

View reviewed changes

specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved

hwwhww and others added 4 commits February 6, 2019 18:33

Update specs/core/0_beacon-chain.md

37b41a2

Co-Authored-By: vbuterin <v@buterin.com>

Update specs/core/0_beacon-chain.md

4ec721f

Co-Authored-By: vbuterin <v@buterin.com>

Update specs/core/0_beacon-chain.md

6a5b754

Co-Authored-By: vbuterin <v@buterin.com>

n -> len(values)

47b00f3

big to little in shuffle

b3db7b0

vbuterin and others added 4 commits February 6, 2019 23:29

shuffle -> get_permuted_index

65255e5

Update 0_beacon-chain.md

9251471

Add vbuterin's optimization and some formatting

70e482b

amend

aa9f9fc

hwwhww added 3 commits February 8, 2019 05:08

Revert and refactor

859bf62

Add bytes_to_int

911e4f1

Fix type hinting

89b9894

hwwhww approved these changes Feb 7, 2019

View reviewed changes

hwwhww mentioned this pull request Feb 7, 2019

convert int_to_bytes to little endian #564

Merged

djrtwo reviewed Feb 8, 2019

View reviewed changes

specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved

djrtwo and others added 2 commits February 7, 2019 21:51

Update specs/core/0_beacon-chain.md

f797826

Co-Authored-By: vbuterin <v@buterin.com>

SHUFFLE_ROUND_COUNT as global constant

1c6ccac

djrtwo merged commit 87dc8a6 into dev Feb 8, 2019

djrtwo deleted the vbuterin-patch-5 branch February 8, 2019 03:57

djrtwo mentioned this pull request Feb 8, 2019

Swap-or-not shuffle #563

Closed

terencechain mentioned this pull request Feb 8, 2019

Update Shuffle Function to Swap-or-not prysmaticlabs/prysm#1529

Closed

ChihChengLiang mentioned this pull request Feb 9, 2019

swap-or-not shuffle ethereum/eth2.0-test-generators#10

Closed

hwwhww mentioned this pull request Feb 12, 2019

Implement swap-or-not shuffle ethereum/trinity#272

Closed

terencechain mentioned this pull request Feb 13, 2019

Remove note for replacing shuffling algo #613

Closed

cyberbono3 mentioned this pull request Feb 19, 2019

Update Shuffle Function to Swap-or-not prysmaticlabs/prysm#1643

Closed

schroedingerscode mentioned this pull request Feb 26, 2019

Swap-or-Not Shuffle - (Spec PR #576) Consensys/teku#388

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce swap-or-not shuffle #576

Introduce swap-or-not shuffle #576

vbuterin commented Feb 6, 2019 •

edited by djrtwo

Loading

hwwhww left a comment

djrtwo commented Feb 7, 2019

vbuterin commented Feb 7, 2019

djrtwo commented Feb 7, 2019

hwwhww commented Feb 7, 2019 •

edited

Loading

djrtwo commented Feb 7, 2019

hwwhww commented Feb 7, 2019

hwwhww commented Feb 8, 2019

protolambda commented Feb 12, 2019

protolambda commented Feb 13, 2019

djrtwo commented Feb 13, 2019

djrtwo commented Feb 13, 2019

Introduce swap-or-not shuffle #576

Introduce swap-or-not shuffle #576

Conversation

vbuterin commented Feb 6, 2019 • edited by djrtwo Loading

hwwhww left a comment

Choose a reason for hiding this comment

djrtwo commented Feb 7, 2019

vbuterin commented Feb 7, 2019

djrtwo commented Feb 7, 2019

hwwhww commented Feb 7, 2019 • edited Loading

djrtwo commented Feb 7, 2019

hwwhww commented Feb 7, 2019

hwwhww commented Feb 8, 2019

protolambda commented Feb 12, 2019

protolambda commented Feb 13, 2019

djrtwo commented Feb 13, 2019

djrtwo commented Feb 13, 2019

vbuterin commented Feb 6, 2019 •

edited by djrtwo

Loading

hwwhww commented Feb 7, 2019 •

edited

Loading