Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BooleanBitfield needs to be made sane #22

Closed
paulhauner opened this issue Sep 22, 2018 · 38 comments
Closed

BooleanBitfield needs to be made sane #22

paulhauner opened this issue Sep 22, 2018 · 38 comments
Labels
good first issue Good for newcomers good for bounty Good target for a gitcoin bounty help wanted Extra attention is needed

Comments

@paulhauner
Copy link
Member

paulhauner commented Sep 22, 2018

There is an implementation of a Boolean Bitfield here:

https://github.com/sigp/lighthouse/tree/master/boolean-bitfield

It (kinda) does the job for now, but it really needs some work done. If you spend some time looking at it I think you'll soon find out what I mean. As an example;

  • There is a possibility of overflows: we return the number of bits as a usize, however there can theoretically be usize number of bytes meaning we can have 8 * usize bits.
  • It keeps track of the number of true bits as you flip bits on and off. I don't think this is ideal as most cases where we want to know the number of true bits, we'll be receiving some serialized bytes from somewhere else (e.g., p2p nodes) and will need to calculate it manually.

On top of these two points, there's likely many chances for optimization.

Required Functionality

Get

get(n: usize) -> Result<bool, Error>

Get value at index n.

Error if bit out-of-bounds (OOB) of underlying bytes.

Set

set(n: usize, val: bool) -> Result<(bool, Error>

Set bit at index n. Returns the previous value if successful.

Error if bit is OOB of underlying bytes.

Highest Set Bit

highest_set_bit() -> Option<usize>

Returns the index of the highest set bit. Some(n) if a bit set set, None otherwise.

Note: this is useful because we need to reject messages if an unnecessary bit is set (e.g. if there are 10 voters and the 11th bit is set

Number of Underlying Bytes

num_bytes() -> usize

Returns the length of the underlying bytes.

_Note: useful to reject bitfields that are larger than required (e.g., if there are eight voters and two bytes -- only one byte is necessary). _

Number of Set Bits

num_set_bits() -> usize

Returns the total number of set bits (i.e., how many peeps voted).

Note: I'm not 100% sure we'll use this but I suspect we will.

@paulhauner paulhauner added help wanted Extra attention is needed good first issue Good for newcomers labels Sep 22, 2018
paulhauner added a commit that referenced this issue Sep 22, 2018
@ralexstokes
Copy link
Contributor

What exactly does "unbounded" mean? Is there more info in the spec you could point me to?

Is the idea that there is a bitvector but its size can change (up to) every incoming network message?

@paulhauner
Copy link
Member Author

paulhauner commented Sep 25, 2018

Hey, thanks for jumping on this one :)

For background, this was one of the first things built in the repo before it had aspirations of becoming a client. A lot has changed since then so I'm totally to a re-design it. If you can come up with a cleaner/more-standard API I have no objections to changing it.

Unbounded (I think) was me trying to express that it's dynamically extendable, not based on some fixed array.

Now that I know more about the spec, I don't think dynamically growing it is so important. Here are the primary use-cases for it:

  1. An AttestationRecord comes in off the network and we scan the bitfield to determine which validators have voted on it (or, at least should have voted on it).
  2. A validator takes some AttestationRecord, adds their signature to the aggregate signature and flips their corresponding bit to true.
  3. A validator creates a new AttestationRecord and initializes the bitfield to be some length and sets their bit to true. Note: the AttestationRecord must always be big enough to represent the whole committee, you can't discard trailing 0 bytes.

I've considered making the BooleanBitfield store a slice, instead of a Vec. This would reduce some copy operations when validating blocks off the network; you can just read directly from the serialized bytes. It would however make it a tiny bit harder to instantiate (you'd need the manage the lifetime of the underlying bytes). Food for thought.

For resources:

AFAIK there are no other resources. I've been trying to generate specification documents if they don't exist -- up to you if you're interested in that :)

FYI I am about to merge some changes to the bitfield crate onto master. They're not very significant, the most significant being a change to the order of bytes during SSZ serialization.

I hope this helps, let me know if I can help :)

@paulhauner
Copy link
Member Author

FYI, I have merged those changes onto master. I'll leave the bitfield crate alone now :)

@ralexstokes
Copy link
Contributor

Great -- So it seems like there are two cases to consider before we can specify the nature of this bitfield:

  1. this client gets an attestation record off the wire -- in this case, I'm assuming someone else has already determined the size and the most this client does is to read which bits are set (or not set) and possibly write its own bit (which it knows from some other protocol state orthogonal to this bitfield impl). this case is p straightforward bc we are just using the work someone else did.

  2. the client needs to generate an attestation record. in which case it needs to know the committee size to properly size the bitfield. am I correct to assume that the client knows the committee size before it needs to create the attestation record? in that case, it will just pass in the size as a param and we can loop to the concerns in (1).

from looking at the spec, there is a minimum committee size and there seems to be a maximum (as the total number of validators is capped) but it's not immediately clear how that backs into a maximum committee size. I would suggest we use a BigInteger type like this https://github.com/rust-num/num-bigint so we don't have to think about overflows and stuff. It can always be replaced with a more optimized impl if its a bottleneck. Thoughts on this approach?

@rawfalafel
Copy link

rawfalafel commented Oct 2, 2018

am I correct to assume that the client knows the committee size before it needs to create the attestation record?

yep, the client can derive the committee size by looking at CrystallizedState.shard_and_committee_for_slots.

I would suggest we use a BigInteger type like this https://github.com/rust-num/num-bigint so we don't have to think about overflows and stuff.

with the current spec, the committee size (and therefore the size of the bitfield) is fixed at 128. usize should be sufficient for now. Just looked at the shuffling function again, looks like the max committee size is about 409, so usize is definitely not fine. Not familiar with Rust, but a byte array seems like the simplest implementation.

@paulhauner
Copy link
Member Author

Hey,
Thanks for the input @rawfalafel! :)

I'm not exactly sure what the limit for a committee is, but I believe it's larger than 409. 10% of present ETH supply staking means a committee size of 305, so I would hope that the 409 maximum isn't the case -- how did you determine that please? (For reference, here's where I got those numbers: ethereum/beacon_chain#103)

@ralexstokes I would also be tempted to use a BigInteger type in normal circumstances, however this bitfield is pretty much guaranteed to either be born from, or die as a byte-array. This makes me lean towards backing it up with a byte array.

Another reason that I think a byte-backed impl would be useful is that we need to do some checks on the actual byte-array when processing a block. We need to know:

  • Are the bitfield bytes too long? I.e., are there more bitfield bytes than required to represent the number of attesters for this shard?
  • If attester_count % 8 != 0, are any of the "spare" bits set? If so, block is invalid.

FYI, both of these conditions are required so that the hash of a Block/AttestationRecord continues to provide a unique identity of that object.

If it helps, here an example of a bitfield being using in the wild:

/*
* The bitfield must be no longer than the minimum required to represent each validator in the
* attestation indicies for this slot and shard id.
*/
if a.attester_bitfield.num_bytes() !=
bytes_for_bits(attestation_indices.len())
{
return Err(AttestationValidationError::BadBitfieldLength);
}
/*
* If there are excess bits in the bitfield because the number of a validators in not a
* multiple of 8, reject this attestation record.
*
* Allow extra set bits would permit mutliple different byte layouts (and therefore hashes) to
* refer to the same AttesationRecord.
*/
if a.attester_bitfield.len() > attestation_indices.len() {
return Err(AttestationValidationError::InvalidBitfieldEndBits)
}

Presently I'm leaning towards the bitfield implementation containing two things:

  1. A struct that contains a slice (i.e., pointer to some bytes) that gives us a bitfield-like API.
  2. A wrapper struct that holds some bytes and an instance of (1) - the bitfield-like API.

(1) would allow us to deal with existing byte arrays without performing a memory copy. (2) would allow us to re-use the code from (1) whilst neatly storing the actual bytes somewhere.

Thoughts? Happy to jump on a call if that is an easier method of comms?

Thanks!

@rawfalafel
Copy link

Oh whoops, I missed a digit! I got 4096 validators per committee.

I'm assuming 2²² maximum validators and 2⁶ slots per cycle. In addition, there are 2⁴ (SHARD_COUNT // CYCLE_LENGTH) max committees per slot.

2²² / 2⁶ / 2⁴ = 2¹² = 4096 validators.

@paulhauner
Copy link
Member Author

paulhauner commented Oct 3, 2018 via email

@ralexstokes
Copy link
Contributor

Thanks @paulhauner and @rawfalafel for the input.

I have a pretty straightforward translation of the Python implementation you can see in this playground: https://play.rust-lang.org/?gist=01e011922837f2ca057244bfdeb28558&version=stable&mode=debug&edition=2015

If we like this, then I can add a wrapper type that owns a Vec and uses the BitFieldView to get/set bits.

From there, you can wrap a higher-level type that manages "voting" like in the Python implementation but I'll leave that for a different issue.

@ralexstokes
Copy link
Contributor

@paulhauner looking over the use cases you mentioned above, i'll need to change the linked gist a bit but its straightforward to do

can you highlight which methods from the existing crate we want to keep as the external API?

seems like we want to be able to get and (possibly) set bits, count the number of set bits and record the expected length of the attester set...

i'm imaging a fairly immutable usage where this code is essentially just a nice wrapper over some bytes we either compute or get off the wire -- in this case we wouldn't be setting bits, resizing the backing array or needing to compute the length on demand (it is known at construction time). if there is some use case i'm not aware of plz fill me in

@paulhauner
Copy link
Member Author

paulhauner commented Oct 8, 2018

Cool, looks good!

Sorry if this is obvious, but we'll have to return Result instead of panicking. General ethos of Lighthouse is never panic.

I updated this issue description with a list of functions. I'm not attached to any of the variable names used there -- I just used them because they were short. I like your use of index and generally prefer long descriptive names.

Re: bit_count, I can't think of a case where we need it but it's cheap and might be useful. Totally down to leave it there.

I agree that we don't need to support re-sizing the backing array. Let's assume that number of bits is known at initialization time and that the vec is initialized to the minimum amount of bytes required to store those bits, with all bytes set to 0.

I hope this helps, happy to answer more questions! :)

@ralexstokes
Copy link
Contributor

@paulhauner so I wasn't thinking far enough ahead and the Bitfield struct i was going to make that wraps a BitfieldView is not going to work out as we don't want a Rust struct to have a reference to an internal field. A way I usually see this worked around is that you just keep usize indexes into the backing storage instead of references. It isn't immediately clear to me that this is going to be that ergonomic so I'm inclined to consider other options.

It seems we have two cases: 1. we want to make a zeroed bitfield and allocate storage and 2. we want to use the bitfield logic but refer to an already allocated storage e.g. from bytes off the wire. You have any thoughts on what this looks like in legal Rust? It seems like the straightforward thing is that we have two different structs for each case (and perhaps pull the common logic out into common functions). This seems fine but if you can think of a way to avoid the duplication I'm all for it. Thoughts?

@ralexstokes
Copy link
Contributor

@paulhauner the idea w/ bit_count was that it would act as a mask for any "spare bits" but the more I think about it, you probably want to mix in some of the validation concerns to this struct, rather than have the validation happen upstream and just use an instance of Bitfield for some bit-twiddling over already validated data. Given that we are getting arbitrary inputs off the wire I think your view is more appropriate so in that case I would think we can remove bit_count.

My question now is -- how do you want to handle a panic on out of bounds access? If we don't check the index ahead of time like I have it, then the code will just panic on the invalid array access to backing_bytes. If we do check it and don't know the correct bit_count at construction time then the best we can do what I already have. Thoughts on this??

@paulhauner
Copy link
Member Author

You raised an excellent point about read/write structs. I just went on a rather long and magical journey across the internet and found std::borrow::Cow.

CoW is copy-on-write. In a nutshell, reference whilst reading and then copy once a write is required. A.K.A lazy-copy.

Check out my example:

https://play.rust-lang.org/?gist=91146bacb1a76acc537667d5e9093f9b&version=stable&mode=debug&edition=2015

Great question about panics. I think we should make get() return Option<usize> and set() return Result<(), BitfieldError>.

If you get() an out-of-bounds bit, you get None. If you set() and out-of-bounds bit you get Err(BitfieldError::OutofBounds).

The get() functionality I described mirrors std::vec::Vec whereas set() doesn't.. maybe it's not-so-Rusty but I'm hesitant to introduce any panics. If you can find a good reason not to return Result from set() I'm up for it.

Thanks!!!

@paulhauner
Copy link
Member Author

Also, I forgot to mention: that binary notation with the underscores is cool. I didn't know that.

@ralexstokes
Copy link
Contributor

ralexstokes commented Oct 17, 2018 via email

@paulhauner paulhauner added the good for bounty Good target for a gitcoin bounty label Oct 17, 2018
@gitcoinbot
Copy link

gitcoinbot commented Oct 19, 2018

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 7 months, 1 week from now.
Please review their action plans below:

1) ralexstokes has been approved to start work.

Happy to discuss alternatives but I have already started work on this issue and would like to collect the bounty if there is one we are adding...

Paul and I have been talking through various options and we feel like we have a good direction to go in -- I was going to author that as a PR in the next few days...

Learn more on the Gitcoin Issue Details page.

@mkosowsk
Copy link

@ralexstokes you have been approved to work on this bounty, please submit the PR at your earliest convenience. Thanks!

@gitcoinbot
Copy link

@ralexstokes Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • warning (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@ralexstokes
Copy link
Contributor

@gitcoinbot there is an open PR at #62 .

@gitcoinbot
Copy link

@ralexstokes Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • warning (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@ralexstokes
Copy link
Contributor

@gitcoinbot there is an open PR at #62. we are all busy at devcon ;)

@gitcoinbot
Copy link

@ralexstokes Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • warning (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@ralexstokes
Copy link
Contributor

@gitcoinbot there is an open PR at #62.

@gitcoinbot
Copy link

@ralexstokes Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • warning (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@ralexstokes
Copy link
Contributor

@gitcoinbot there is an open PR at #62.

@gitcoinbot
Copy link

@ralexstokes Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@ralexstokes
Copy link
Contributor

@gitcoinbot there is an open PR at #62

@gitcoinbot
Copy link

@ralexstokes Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@ralexstokes
Copy link
Contributor

@gitcoinbot after talking through some design decisions, i just need to wrangle a few more tests in #62

@spm32
Copy link

spm32 commented Nov 21, 2018

Great, thanks for the update @ralexstokes!

@paulhauner
Copy link
Member Author

@gitcoinbot this has been completed! Bounty can be paid out.

Thank you :)

@gitcoinbot
Copy link

@ralexstokes Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

1 similar comment
@gitcoinbot
Copy link

@ralexstokes Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@ralexstokes
Copy link
Contributor

@gitcoinbot this issue has been resolved and the PR merged.

see this comment: #22 (comment)

@gitcoinbot
Copy link

⚡️ A tip worth 480.00000 DAI (480.0 USD @ $1.0/DAI) has been granted to @ralexstokes for this issue from @. ⚡️

Nice work @ralexstokes! Your tip has automatically been deposited in the ETH address we have on file.

@mkosowsk
Copy link

mkosowsk commented Dec 4, 2018

@ralexstokes you have been paid out, sorry about the delay. Thanks!

@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This Bounty has been completed.

Additional Tips for this Bounty:

  • tipped 480.0000 DAI worth 480.0 USD to ralexstokes.

divagant-martian pushed a commit to divagant-martian/lighthouse that referenced this issue Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers good for bounty Good target for a gitcoin bounty help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants
@rawfalafel @ralexstokes @paulhauner @mkosowsk @spm32 @gitcoinbot and others