Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce swap-or-not shuffle #576

Merged
merged 15 commits into from
Feb 8, 2019
Merged

Introduce swap-or-not shuffle #576

merged 15 commits into from
Feb 8, 2019

Conversation

vbuterin
Copy link
Contributor

@vbuterin vbuterin commented Feb 6, 2019

See #563 for discussion.

Here is a more efficient implementation for shuffling an entire set; it can live here until we come up with an explicit "efficient implementation" doc:

def shuffle(list_size, seed):
    indices = list(range(list_size))
    for round in range(90):
        hash_bytes = b''.join([
            hash(seed + round.to_bytes(1, 'little') + (i).to_bytes(4, 'little'))
            for i in range((list_size + 255) // 256)
        ])
        pivot = bytes_to_int(hash(seed + int_to_bytes1(round))[0:8]) % list_size

        powers_of_two = [1, 2, 4, 8, 16, 32, 64, 128]
            
        for i, index in enumerate(indices):
            flip = (pivot - index) % list_size
            hash_pos = index if index > flip else flip
            byte = hash_bytes[hash_pos // 8]
            if byte & powers_of_two[hash_pos % 8]:
                indices[i] = flip
    return indices

Copy link
Contributor

@hwwhww hwwhww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 👍

If my suggestions are applied, the code will be runnable.

specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved
specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved
specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved
specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved
hwwhww and others added 4 commits February 6, 2019 18:33
Co-Authored-By: vbuterin <v@buterin.com>
Co-Authored-By: vbuterin <v@buterin.com>
Co-Authored-By: vbuterin <v@buterin.com>
@djrtwo
Copy link
Contributor

djrtwo commented Feb 7, 2019

should we use little endian for consistency? (sorry @vbuterin)

@vbuterin
Copy link
Contributor Author

vbuterin commented Feb 7, 2019

Sure, sounds fine to me.

@djrtwo
Copy link
Contributor

djrtwo commented Feb 7, 2019

done

@hwwhww
Copy link
Contributor

hwwhww commented Feb 7, 2019

@vbuterin @djrtwo

  1. Updated to the latest optimization version, would you like to check the docstring again?
  2. For the shuffling tests, not sure if it's necessary, what do you think about parameterizing the round number 90? i.e.,def shuffle(list_size: int, seed: Bytes32, rounds: int=90) -> List[int], so we can test other numbers?

@djrtwo
Copy link
Contributor

djrtwo commented Feb 7, 2019

we lost the get_permuted_index to optimizations? We'll likely want to make get_permuted_index exposed in the spec at some point.

(1) -- docstring looks good
(2) -- rounds as param looks good

@hwwhww
Copy link
Contributor

hwwhww commented Feb 7, 2019

@djrtwo
Fair enough. 😄
Reverted and refactored it.

specs/core/0_beacon-chain.md Outdated Show resolved Hide resolved
@djrtwo djrtwo merged commit 87dc8a6 into dev Feb 8, 2019
@djrtwo djrtwo deleted the vbuterin-patch-5 branch February 8, 2019 03:57
@hwwhww
Copy link
Contributor

hwwhww commented Feb 8, 2019

To the implementers: 70e482b is the optimized version for shuffling the whole validator registry.

@djrtwo djrtwo mentioned this pull request Feb 8, 2019
@protolambda
Copy link
Collaborator

I'm working on a new implementation experiment-repo, written in Go. For the new "swap-or-not" shuffling algorithm. Love optimizing this stuff (and don't have a ETH 2.0 client to work on more general stuff 😞 ).
Although the algorithm for set-shuffling by Vitalik looked promising, it still allocated O(n) hashes every round (n = list size, talking about "hash_bytes" in original).
I optimized it to only have 1 single hash (!!!) on the stack, since you can go through indices with the same hash (there are windows) consecutively.
I probably still have some off-by-1 errors in the shuffling itself, but it's looking good so far.

Here are the preliminary benchmark results (ran locally on a laptop, more about comparison):

BenchmarkPermuteIndex4M-8                               100000        157231 ns/op
BenchmarkPermuteIndex40K-8                              100000        157304 ns/op
BenchmarkPermuteIndex400-8                              100000        156259 ns/op
BenchmarkPermuteIndexComparison40K-8                         2    6247616777 ns/op
BenchmarkPermuteIndexComparison400-8                       200	    64324372 ns/op
BenchmarkShuffleList4M-8                                      10    1667117128 ns/op
BenchmarkShuffleList40K-8                                   1000      16748448 ns/op
BenchmarkShuffleList400-8                                  50000        390820 ns/op


 4M = 4000000
40K =   40000
400 =     400

BenchmarkPermuteIndexN = simply check how many times you can permute the index in a list sized N
BenchmarkPermuteIndexComparisonN = do the index-wise permutations, but for every list item, i.e. N times. For comparison with BenchmarkShuffleListN
BenchmarkShuffleListN = shuffle a complete list of N items.

  1. Note how slow index-wise permutations are if you do a complete list shuffle with it, compared to shuffling the list collectively.
    Shuffling 4 million items collectively, is more efficient (> 3x) than shuffling a list of 1/100th the size individually.
  2. Note how shuffling individual indices isn't affected much (if at all) by the list-size.

Repo here: https://github.com/protolambda/eth2-shuffle
Warning: still work-in-progress, no guarantees on correctness (likely a off-by-1 somewhere, gonna test later)

@protolambda
Copy link
Collaborator

Posting some relevant gitter chatter here:

@tersec
regarding the new shuffling algorithm, https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf specifically in/around figure 3 states that the choice from the domain should be uniform, but mod list_size won't be?

@protolambda

Had my doubts first too, but it helps if you print out the pairs for any pivot + listsize, you can see a mirror pattern (e.g. pivot = 5: 05, 14, 23) on each side of the pivot. And for each pivot, there's a different pairing for any given index. (Afaik)

And pivot choice (your mod list_size concern) is indeed not perfect, but is close to uniform for large enough random numbers

2^64/4000000=~4.6e12 list_sizes. And only 1 incomplete. So most of the time it's uniform.
And even more if you would take a 2^256 int from the hash

Ok, looks like that's exactly what Python's int.from_bytes does. So, the fraction of time that it's not uniform is ridiculously low. It may even be worth it to consider taking only 64 bits of the hash, for speed. 99.999999...% is still good.

@tersec
Fair enough, reasonable tradeoff

@protolambda
Interesting, it's both 64 bits and 256 bits 😂 The spec has a [0:8], but this is not reflected in the set-shuffle implementation in the PR. I prefer the 64 bits, it's sufficient and fast

Now, the issue is:
Post above by @vbuterin says:

pivot = int.from_bytes(hash(seed + round.to_bytes(1, 'little')), 'little') % list_size

Spec currently says:

pivot = bytes_to_int(hash(seed + int_to_bytes1(round))[0:8]) % list_size

There's two inconsistency issues:

  • spec is not explicit about little endianness (although it is globally)
  • version by Vitalik, that every implementer team is copying for efficiency, does not have the [0:8]: i.e. only using 64 bits of the hash, for efficiency (without much loss in it being uniform)

Some test-vectors would be really useful.

@djrtwo
Copy link
Contributor

djrtwo commented Feb 13, 2019

  • bytes_to_int uses little in the definition.
  • spec avoids big int arithmetic so the vitalik version should be [0:8]. Fixing now

@djrtwo
Copy link
Contributor

djrtwo commented Feb 13, 2019

Here is an issue for reworking the test vectors to the swap or not -- ethereum/eth2.0-test-generators#10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants